Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: RE: st: RE: RE: Calculating moving windows over time with conditions
From
Nick Cox <[email protected]>
To
"'[email protected]'" <[email protected]>
Subject
RE: RE: st: RE: RE: Calculating moving windows over time with conditions
Date
Fri, 4 Feb 2011 14:29:10 +0000
Your question about -by:-. The answer is this: Take away the -by:- and what difference does it make? None, because the calculation is a row sum for each observation. The answer is the same however you do it, whether observation by observation; or observation by observation within blocks of observations. Where the data came from is irrelevant.
You already have a time series solution, meaning one based on L. etc., in a previous reply.
Nick
[email protected]
[email protected]
Thanks again, Nick.
Sorry about the typo.
My third argument was of course:
bysort id (year): gen var_x_3yrs = (var_x + lag_var_x + lag2_var_x) ;
If I use this argument, or omit "by" as it is unneccessary as you say:
gen var_x_3yrs = (var_x + lag_var_x + lag2_var_x) ;
Would this produce an acceptable solution, or is it plain wrong? Remember that I have only one observation of each unique id per year if the id is indeed in that year.
If this is wrong, I'll have to go into time series commands.
All the best,
Erik.
-----Forwarded by Erik Aadland/people/BISTIFT on 02/04/2011 03:12PM -----
To: "'[email protected]'" <[email protected]>
From: Nick Cox <[email protected]>
Sent by: [email protected]
Date: 02/04/2011 03:01PM
Subject: RE: st: RE: RE: Calculating moving windows over time with conditions
You are recoding the wrong variable in your third statement.
Once the variables are created your last -by:- is unnecessary.
I guess you are seeking something more like
tsset id year
gen var_x_3yrs =
var_x +
cond(L.var_x < ., L.var_x, 0) +
cond(L2.var_x < ., L2.var_x, 0)
-- but that is not guaranteed to work the way you want if there are gaps. In many ways you are likely to get better results by averaging non-missing values and multiplying up by 3.
Nick
[email protected]
[email protected]
Thank you very much for your help and input.
If I don't get it right, I'll try to go for the time series commands.
I just created the following code. Does this look acceptable?
sort id year ;
bysort id: gen lag_var_x = var_x[_n-1] if year==year[_n-1]+1 ;
recode var_x (. = 0) ;
bysort id: gen lag2_var_x = var_x[_n-2] if year==year[_n-2]+2 ;
recode lag2_var_x (. = 0) ;
bysort individual_id (year): gen var_x_3yrs = (var_x + lag_var_x + lag2_var_x) ;
From: Nick Cox <[email protected]>
Commenting now on the code,
0. Your basic structure is
by id year:
There is only one observation in each of those combinations. You need
by id (year):
1. A key thing is that -egen-'s "functions" do not behave at all like Stata's functions. Thus you must refer to just _one_ function on the right-hand side of an = sign.
The syntax of -egen- is given in the help.
egen [type] newvar = fcn(arguments) [if] [in] [, options]
So the minimal call is
egen newvar = fcn(arguments)
There is no scope for more than one -fcn()- call.
2. -if- is allowed just once in any Stata command. -if- never appears _inside_ anything else.
3. You could use -cond(,)- as part of an expression to express branching. In this case, it would get messy almost beyond belief.
I'd back off from this approach and use L. directly as Johannes suggested or -rolling- or -mvsumm- (SSC) as I suggested earlier.
Nick
[email protected]
Nick Cox
Consider also using -rolling- or -mvsumm- (SSC). Writing your own code for problems like this is instructive, but not necessary.
[email protected]
I have an unbalanced panel dataset in which I need to calculate a 3 year moving window for a variable for each actor in the dataset.
I have already calculated the annual total sum for the variable for each year (var_x). I have tagged individuals by year and removed all observations but one per year.
Now I need to sum the annual totals up for each actor by year in 3 year moving windows. As the dataset is unbalanced, I need to make sure that observation _n-1 is indeed the year before _n, and not several years prior to _n. I don't get it quite right. I use stata 10.
Here is the code so far:
sort id year ;
egen tag_id_year = tag(id year) ;
keep if tag_id_year == 1;
sort id year ;
bysort id year: egen var_3yrs = total(var_x) & total(var_x[_n-1]if year==year[_n-1]+1) & total(var_x[_n-2]if year==year[_n-2]+2) ;
I have also tried:
bysort id year: egen var_3yrs = total(var_x) + total(var_x[_n-1]if year==year[_n-1]+1) + total(var_x[_n-2]if year==year[_n-2]+2) ;
And:
bysort id year: egen var_3yrs = total(var_x + var_x[_n-1]if year==year[_n-1]+1 + var_x[_n-2]if year==year[_n-2]+2) ;
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/