I agree with Neil's suggestion of -egen, mean() by(year)- for yearly
averages.
For the average of the previous three years, there are several ways to
do it. I don't see that Neil's solution takes into consideration that
the periods should overlap.
Here is one way to do it:
gen threeyearaverage = .
qui forval y = 1982/2006 {
local y1 = `y' - 3
local y2 = `y' - 1
su ROE if inrange(year, `y1', `y2'), meanonly
replace threeyearaverage = r(mean) if year == `y'
}
This is an average across observations, not years. If you want the
latter, it would be
gen threeyearaverage = .
qui forval y = 1982/2006 {
su ROE if year == `= `y' - 3', meanonly
local mean3 = r(mean)
su ROE if year == `= `y' - 2', meanonly
local mean2 - r(mean)
su ROE if year == `= `y' - 1', meanonly
replace threeyearaverage = (`mean3' + `mean2' + r(mean)) / 3 if
year == `y'
}
Another different way to do it would be to -collapse-, work on the
collapsed dataset, and -merge- back in again.
I'll put in here that -round(,)- can be useful in similar problems.
Nick
[email protected]
Neil Shephard
Fabian Brenner wrote:
>
> I have several observations called "ROE" for the "years" (from 1979 to
2006). There is a different number of observations for each year.
>
> My data look like this:
> "year" "ROE" "Average" "threeyearaverage"
>
> 79 12 ? ?
> 79 9 ? ?
> 79 2 ? ?
> 80 3 ? ?
> 81 20 ? ?
> 81 5 ? ?
> 82 3 ? ?
> 82 6 ? ?
> 82 9 ? ?
> 82 8 ? ?
> . . .
> . . .
> . . .
>
> I want to compute the average of the observations for each year, e.g.
for 1979: (12+9+2)/3 (I tried to sort the observations and to divide the
sum by _n but it didn't work...)
>
bysort year : egen average = mean(ROE)
> In a second step I want to get the average ROE for the past three
years ("threeyearaverage") (beginning in 1982), e.g. for 1982 it should
be the average of the ROE in 1979 plus the average ROE in 1980 plus
average ROE in 1981, divided by 3.
>
Thats an inappropriate way of calculating the three year average, as it
fails to account for the fact that there are different numbers
observations from each year, thus the weights aren't equal. This is
covered in most basic statistics books. You therefore have two options,
a) use weights; b) use the raw data. Since -egen newvar = mean()-
doesn't allow weights I'd be inclined to go with b).
You therefore need to generate a variable that bins your data...
gen year3 = .
replace year3 = 1 if(year >= 79 & year <= 82)
replace year3 = 2 if(year >= 83 & year <= 85)
replace year3 = 3 if(year >= 86 & year <= 88)
....
bysort year3 : egen threeyearaverage = mean(ROE)
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/