Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: atribute values between lines of a variable/standardize data
From
Nick Cox <[email protected]>
To
[email protected]
Subject
Re: st: atribute values between lines of a variable/standardize data
Date
Wed, 30 Mar 2011 13:25:44 +0100
Here are three ways to do it.
1.
egen mean_y = mean(y / (year == 2003 | year == 2004)), by(group)
2.
egen mean_y = mean(cond(year == 2003 | year == 2004, y, .)), by(group)
3.
egen mean_y = mean(y) if year == 2003 | year == 2004, by(group)
bysort group (mean_y) : replace mean_y = mean_y[1]
Let's take these backwards:
#3 "spreads" the non-missing results of the first command to replace
missings. It hinges delicately on the sort order: sorting non-missings
to first position in each group is needed so that we can use the first
observation in each group.
#2 and #1 hinge on the fact that Stata ignores missings in calculating
means, but in the absence of an -if- condition assigns those means to
all values.
#2 and #1 also hinge on the fact that -egen, mean()- can take
_expressions_, which can be (much) more complicated than variable
names.
#1 is a trick I stumbled on a few weeks ago, and it appears not to be
widely known. The trick is that dividing by zero produces missing,
which is exactly what is needed when it happens.
#1 will be written up as a Tip for the Staa Journal.
Nick
On Wed, Mar 30, 2011 at 12:44 PM, Lucas Ferreira Mation
<[email protected]> >
My data is divided into several groups, with many observations in
> each. For each group, I want to "standardize" my data based on a
> specific subset of observations (in this case, divide the actual
> values of Y by the means of a specific subgroup of Y). How can I do
> that?
> In the example bellow, for each group, I need to "standardize" the
> values of Y based on the average of Y of the years 2003 and 2004. I
> managed to create such means for those observations, but I don´t know
> how to extend that value to the rest of the observations of that
> subgroup.
>
> "
> input year str20 group Y
> 2001 G1 57
> 2002 G1 61
> 2003 G1 54
> 2004 G1 60
> 2005 G1 64
> 2001 G2 1543
> 2002 G2 1700
> 2003 G2 1532
> 2004 G2 1659
> 2005 G2 1800
> end
> egen denominator=mean(Y) if(year==2003 | year==2004), by(group)
> *this creates the desired mean(denominator for the "standardization")
> *but only for the observations in years 2003 and 2004.
> *how do I attribute that to the rest of the observations in that
> group? Having this, I would run:
> gen Y_standardized=Y/denominator
>
> "
> In my actual data is quite long with many groups and many observations
> (months) per group.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/