Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Nick Cox <n.j.cox@durham.ac.uk> |
To | "'statalist@hsphsun2.harvard.edu'" <statalist@hsphsun2.harvard.edu> |
Subject | RE: st: RE: correct egen command to calculate means - please help |
Date | Wed, 18 May 2011 19:30:13 +0100 |
Here's #2. Set up dopey example clear set obs 4 egen id = seq(), to(2) gen parent = 3 - id gen y = 10 * _n . l +------------------+ | id parent y | |------------------| 1. | 1 2 10 | 2. | 2 1 20 | 3. | 1 2 30 | 4. | 2 1 40 | +------------------+ . save data file data.dta saved Get the means and rename . collapse y, by(id) . rename y mean . rename id parent Merge back . merge 1:m parent using data parent was byte now float Result # of obs. ----------------------------------------- not matched 0 matched 4 (_merge==3) ----------------------------------------- . list +---------------------------------------+ | parent mean id y _merge | |---------------------------------------| 1. | 1 20 2 40 matched (3) | 2. | 2 30 1 30 matched (3) | 3. | 1 20 2 20 matched (3) | 4. | 2 30 1 10 matched (3) | +---------------------------------------+ Nick n.j.cox@durham.ac.uk Nick Cox There are various ways to do it. Here's one: egen group = group(ID) su group, meanonly gen mean = . qui forval i = 1/`r(max)' { su ID if group == `i', meanonly local ID = r(mean) su Value if group == `i', meanonly replace mean = r(mean) if ParentID == `ID' } Another would to -collapse- to a set of means on ID and then -merge- back matching with ParentID. Nick n.j.cox@durham.ac.uk Ophelie Desmarais Thanks a lot for the reply. Sorry for not providing more information: Let's say I have this dataset: UniqueID ID ParentID Value 1 1 10 2 1 20 3 2 1 5 4 2 2 5 Basically, I want to add another variable (say "mean_value") that calculates the mean for (10+20)/2 and assigns the value to UniqueID=3 (since UniqueID=3's ParentID is 1 and this links to UniqueID=1 and UniqueID=2's ID variable). For UniqueID=4, similarly, the mean_value would be = 5 (the average value of the observations where ID=ParentID=3). egen mean_value=mean(Value), by(ParentID) would give me 5 and 5 for UniqueID=3 and UniqueID=4, which is not what I try to get. Again, thanks for your kind answer. o. On 18 May 2011 19:44, Nick Cox <n.j.cox@durham.ac.uk> wrote: > I don't understand your pseudocode. It looks like > > Y = mean(X) where A = B > > to me. > > Nor is it "obvious" that > > egen MeanRevenueParent=mean(Revenue), by(ParentIndustry) > > is wrong. From your word description it sounds exactly right to me! > > Please give a worked example of what you want for a minimal dataset. > > Note that explanations that assume proficiency in Excel _and_ Stata catch a much smaller set of people than explanations that assume proficiency in Stata. > > (Can we assume that income == revenue?) > > Nick > n.j.cox@durham.ac.uk > > Ophelie Desmarais > > I have tried to find an answer for this on the web but to no avail. > > Simplified, I have following dataset where the unit of observation is > the industry (IndustryID) with the respective revenue. > > IndustryID, ParentIndustryID, Revenue > > > The ParentIndustryID links to IndustryID and reflects the hierachical > structure of the industries. For each industry, I want to add a > variable reflecting the mean income of the ParentIndustryID. How do I > do this in Stata? In Excel, this would be simply using AVERAGEIFS. > egen MeanRevenueParent=mean(Revenue), by(ParentIndustry) obviously > does not give the right mean. In terms of pseudo code, this should be > something like: > > MeanRevenueParent = Mean(Revenue) WHERE ParentIndustry=IndustryID > > Could anyone help me out? I don't want to switch around with Excel all the time. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/