If I understand your question correctly, it yields to a minor variation
on the technique documented in
SJ-7-3 pr0033 . . . . . . . . . . . . . . Stata tip 51: Events in
intervals
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N.
J. Cox
Q3/07 SJ 7(3):440--443 (no
commands)
tip for counting or summarizing irregularly spaced
events in intervals
and explained in my posting replying to your previous question:
<http://www.hsph.harvard.edu/cgi-bin/lwgate/STATALIST/archives/statalist
.0808/date/article-20.html>
local N = _N
gsort Company_No - Date
gen avex = .
qui forval i = 1/`N' {
egen tag = tag(Investor_id) ///
if Company_no == Company_no[`i'] & ///
inrange(Date, Date[`i'] - 90, Date[`i'])
su x if tag, meanonly
replace avex = r(mean) in `i'
drop tag
}
The obvious variations on the previous example are using -summarize,
meanonly- to get the mean and specifying the previous 90 days within an
-inrange()- condition. (This code includes the current date; modify
according to taste.)
The not-so-obvious variation is using -egen, tag()- to tag the _last_
relevant observation for each distinct Investor_id.
-egen, tag()- tags just one of any repetitions of each distinct value of
the variable specified. It's not documented, but the observation tagged
is in fact the _first_ such occurrence in the dataset in the current
sort order. Hence use -gsort- first to reverse date order in each panel,
so that tagging catches the last such occurrence in time.
Nick
[email protected]
Marietta Jones
I have a new problem related to the dataset I presented in my original
email (see below).
For each observation in the dataset, I would like to calculate the
mean of variable x for the same company_no on a rolling prior 90 day
basis.
The extra complexity arises by the fact that there may be more than
one observations of variable x for the same investor_id for a given
company_no.
I would only like to include the last observation per investor_id for
a given company_no in the calculation of the mean.
>> I have the following dataset:
>>
>> Company_no Date Investor_id
>> 1 03/01/2000 1
>> 1 04/09/2000 1
>> 1 12/12/2001 2
>> 2 13/12/2000 4
>> 2 07/08/2001 7
>> 3 09/08/2000 4
>> 3 19/03/2001 4
>> 3 02/05/2001 5
>> 3 03/12/2001 6
>>
>>
>> For each observation, I would like to calculate the number of
distinct
>> investors owning shares in a given company in the previous 12 months.
>> The new variable (No_investors_past_12_months) will look as follows:
>>
>> Company_no Date Investor_id
No_investors_past_12_months
>> 1 03/01/2000 1 1
>> 1 04/09/2000 1 1
>> 1 12/12/2001 2 1
>> 2 13/12/2000 4 1
>> 2 07/08/2001 7 2
>> 3 09/08/2000 4 1
>> 3 19/03/2001 4 1
>> 3 02/05/2001 5 2
>> 3 03/12/2001 6 3
>>
>> For example, for company 2 on 07/08/2001 there are two investors who
>> have owned shares of this company within the past 12 months.
>>
>> The idea is that the same investor should be counted only once and we
>> should count all observations over the past 12 months, including the
>> current observation date. If there are no observations over the past
>> 12 months, the value of the new variable should equal 1.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/