Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: FW: Aggregating values back to a certain row

From	Nick Cox <[email protected]>
To	"[email protected]" <[email protected]>
Subject	Re: st: FW: Aggregating values back to a certain row
Date	Thu, 20 Mar 2014 13:47:38 +0000

More technique:

clear

input invest_id firm_id value  year
1 1 1.5  2001
2 1 2.7  2001
3 1 3.4  2002
4 2 0.9  1998
5 2 5.5  2003
end

collapse (count) count=value (sum) sum=value, by(firm_id year)

tsset firm_id year
tsfill

replace count = 0 if missing(sum)
replace sum = 0 if missing(sum)

bysort firm_id (year) : gen prev2  = max(0, sum[_n-1]) + max(0, sum[_n-2])

Nick
[email protected]


On 20 March 2014 13:34, Nick Cox <[email protected]> wrote:
> This is a nice idea, although the example data showed that
>
> 1. two or more items can be observed for the same firm in the same
> year (still, what that implies is just a prior -collapse-, which might
> be a good move any way)
>
> 2. there can be gaps (again, there are remedies)
>
> Nick
> [email protected]
>
>
> On 20 March 2014 13:14, Jeph Herrin <[email protected]> wrote:
>>
>> If the window is really a moderate 5 or so years, you might get by with
>> 'reshaping' the data first:
>>
>>  forv y=1/5 {
>>           bys firm_id (year): gen value_`y'=value[_n-`y'] ///
>>                      if inrange(year[_n-`y'],year,year-5)
>>  }
>>  egen value_agg = rowtotal(value_1-value_5)
>>
>>
>> You will have (at most 5) new variables populated on each row, containing
>> the values from the last (at most) 5 years. Then just aggregate across the
>> row.
>>
>> J
>>
>>
>>
>> On 3/19/2014 4:03 PM, Nick Cox wrote:
>>>
>>> This kind of question is often asked here.
>>>
>>> Your code can't easily be rescued. A loop over observations doesn't
>>> really mix with a -while- loop. Also note, for example, that -if-
>>> qualifiers are illegal and indeed make no sense in defining -local-s.
>>>
>>> You don't spell it out, and it's not even tacit in your code, but I am
>>> guessing that all calculations must be for the same firm.
>>>
>>> Consider this:
>>>
>>> 1. Initialise all aggregate values to 0.
>>>
>>> gen firm_value_agg = 0
>>>
>>> 2. Loop over all the observations.
>>>
>>> quietly forvalues n=1/`=_N' {
>>>
>>> 3. Summarize value if the firm is the same as that in the present
>>> observation and investment was in the previous 5 years.
>>>
>>> Here previous five years is interpreted as meaning for 2013 (say)
>>> 2013, 2012, 2011, 2010, 2009.
>>>
>>>            su value if firm == firm[`n'] & inrange(year, year[`n'] - 4,
>>> year), meanonly
>>>
>>> If you want to exclude the present year, it would be
>>>
>>>            su value if firm == firm[`n'] & inrange(year, year[`n'] - 5,
>>> year[`n'] - 1), meanonly
>>>
>>> The -meanonly- option is helpful to speed up what is likely to be slow
>>> code. Despite its name, -meanonly- does include calculation of the
>>> sum.
>>>
>>> 4. Replace the aggregated value in the current observation.
>>>
>>>            replace firm_value_agg = r(sum) in `n'
>>> }
>>>
>>> Counts would mean using r(N) not r(sum).
>>>
>>> There's a lengthier sermon in
>>> http://www.stata-journal.com/sjpdf.html?articlenum=pr0033 which you
>>> should cite if you find it useful. (Many people will cite a paper for
>>> just mentioning one idea they allude to, but not a paper that saved
>>> them hard work in data management.)
>>>
>>> Nick
>>> [email protected]
>>>
>>>
>>> On 19 March 2014 19:03, Daniel Stefan Hain <[email protected]> wrote:
>>>>
>>>> Dear Stata users,
>>>>
>>>> I currently have a problem I find myself unable to solve.
>>>>
>>>> I have longitudinal data on investments made by companies, each
>>>> investment a single row. It looks sort of like:
>>>>
>>>> invest_id             firm_id                 value
>>>> year
>>>> 1                              1                              1,5
>>>> 2001
>>>> 2                              1                              2,7
>>>> 2001
>>>> 3                              1                              3,4
>>>> 2002
>>>> 4                              2                              0,9
>>>> 1998
>>>> 5                              2                              5,5
>>>> 2003
>>>>
>>>> I want to introduce a "investment experience", which aggregates the
>>>> number (or value) of previous investments over the last X (lets say 5) years
>>>> previous to the current one. Firms have very different investment activity,
>>>> so might the one have 100, the next only one per year. I up to now tried
>>>> different versions of something like
>>>>
>>>>
>>>> local N =_N
>>>> forvalues n=1/`N' {
>>>>                  local i=1
>>>>                  local t=0
>>>>         while t=0 {
>>>>                                  replace firm_value_agg = firm_value_agg
>>>> + value[_n - `i']) if `n'=_n
>>>>                                  local i = `i' + 1
>>>>                                  local t = 1             if year[_n] - 5
>>>> == year[_n - `i']
>>>>                  }
>>>> }
>>>>
>>>> Here I try to write a loop that goes over every row (first forvalues),
>>>> and then (in theory) aggregates previous investment values in
>>>> "firm_value_agg", as long till it comes to a row where the year is smaller
>>>> than the current year minus 5 (since I want to aggregate the last 5 years).
>>>> I tried some different versions, but always get some error messages like "no
>>>> weights allowed", or the code just produces unintended results. For every
>>>> piece of advice how to make this work I would be very grateful.
>>>>
>>>> Best wishes
>>>>
>>>>
>>>> Daniel S. Hain
>>>>
>>>> Ph.D. Fellow, M.Sc. Economics, Dipl.-Wirt.-Ing.
>>>> Department of Business and Management | IKE | DRUID | EIS
>>>> T: (+45) 9940 2724  | Email: [email protected]
>>>> Web: www.ike.aau.dk | www.eis-all.dk | www.innoresource.org
>>>> Aalborg University | Fibigerstræde 11, Room 91 | 9220 Aalborg, Denmark
>>>>
>>>>
>>>> *
>>>> *   For searches and help try:
>>>> *   http://www.stata.com/help.cgi?search
>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>
>>>
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>> *   http://www.ats.ucla.edu/stat/stata/
>>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: FW: Aggregating values back to a certain row
  - From: Daniel Stefan Hain <[email protected]>
- Re: st: FW: Aggregating values back to a certain row
  - From: Nick Cox <[email protected]>
- Re: st: FW: Aggregating values back to a certain row
  - From: Jeph Herrin <[email protected]>
- Re: st: FW: Aggregating values back to a certain row
  - From: Nick Cox <[email protected]>

Prev by Date: Re: st: FW: Aggregating values back to a certain row
Next by Date: Re: st: FW: Aggregating values back to a certain row
Previous by thread: Re: st: FW: Aggregating values back to a certain row
Next by thread: Re: st: FW: Aggregating values back to a certain row
Index(es):
- Date
- Thread