Hi Nick,
Thank you very much for the suggested solution.
I tried writing my own do file based on your suggestions to my
previous post, but I couldn't get the code to work.
My main problem was that while I knew how to tag the first observation
of a group, I did not know how to successfully tag the last
observation. gsort appears to offer a good solution that I was not
aware of.
Thanks,
Marietta
On Tue, Aug 12, 2008 at 1:51 PM, Nick Cox <[email protected]> wrote:
> If I understand your question correctly, it yields to a minor variation
> on the technique documented in
>
> SJ-7-3 pr0033 . . . . . . . . . . . . . . Stata tip 51: Events in
> intervals
> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N.
> J. Cox
> Q3/07 SJ 7(3):440--443 (no
> commands)
> tip for counting or summarizing irregularly spaced
> events in intervals
>
> and explained in my posting replying to your previous question:
>
> <http://www.hsph.harvard.edu/cgi-bin/lwgate/STATALIST/archives/statalist
> .0808/date/article-20.html>
>
> local N = _N
> gsort Company_No - Date
> gen avex = .
> qui forval i = 1/`N' {
> egen tag = tag(Investor_id) ///
> if Company_no == Company_no[`i'] & ///
> inrange(Date, Date[`i'] - 90, Date[`i'])
>
> su x if tag, meanonly
>
> replace avex = r(mean) in `i'
>
> drop tag
> }
>
> The obvious variations on the previous example are using -summarize,
> meanonly- to get the mean and specifying the previous 90 days within an
> -inrange()- condition. (This code includes the current date; modify
> according to taste.)
>
> The not-so-obvious variation is using -egen, tag()- to tag the _last_
> relevant observation for each distinct Investor_id.
>
> -egen, tag()- tags just one of any repetitions of each distinct value of
> the variable specified. It's not documented, but the observation tagged
> is in fact the _first_ such occurrence in the dataset in the current
> sort order. Hence use -gsort- first to reverse date order in each panel,
> so that tagging catches the last such occurrence in time.
>
> Nick
> [email protected]
>
> Marietta Jones
>
> I have a new problem related to the dataset I presented in my original
> email (see below).
>
> For each observation in the dataset, I would like to calculate the
> mean of variable x for the same company_no on a rolling prior 90 day
> basis.
> The extra complexity arises by the fact that there may be more than
> one observations of variable x for the same investor_id for a given
> company_no.
> I would only like to include the last observation per investor_id for
> a given company_no in the calculation of the mean.
>
>>> I have the following dataset:
>>>
>>> Company_no Date Investor_id
>>> 1 03/01/2000 1
>>> 1 04/09/2000 1
>>> 1 12/12/2001 2
>>> 2 13/12/2000 4
>>> 2 07/08/2001 7
>>> 3 09/08/2000 4
>>> 3 19/03/2001 4
>>> 3 02/05/2001 5
>>> 3 03/12/2001 6
>>>
>>>
>>> For each observation, I would like to calculate the number of
> distinct
>>> investors owning shares in a given company in the previous 12 months.
>>> The new variable (No_investors_past_12_months) will look as follows:
>>>
>>> Company_no Date Investor_id
> No_investors_past_12_months
>>> 1 03/01/2000 1 1
>>> 1 04/09/2000 1 1
>>> 1 12/12/2001 2 1
>>> 2 13/12/2000 4 1
>>> 2 07/08/2001 7 2
>>> 3 09/08/2000 4 1
>>> 3 19/03/2001 4 1
>>> 3 02/05/2001 5 2
>>> 3 03/12/2001 6 3
>>>
>>> For example, for company 2 on 07/08/2001 there are two investors who
>>> have owned shares of this company within the past 12 months.
>>>
>>> The idea is that the same investor should be counted only once and we
>>> should count all observations over the past 12 months, including the
>>> current observation date. If there are no observations over the past
>>> 12 months, the value of the new variable should equal 1.
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/