Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: RE: correct egen command to calculate means - please help
From
Nick Cox <[email protected]>
To
"'[email protected]'" <[email protected]>
Subject
RE: st: RE: correct egen command to calculate means - please help
Date
Wed, 18 May 2011 19:30:13 +0100
Here's #2.
Set up dopey example
clear
set obs 4
egen id = seq(), to(2)
gen parent = 3 - id
gen y = 10 * _n
. l
+------------------+
| id parent y |
|------------------|
1. | 1 2 10 |
2. | 2 1 20 |
3. | 1 2 30 |
4. | 2 1 40 |
+------------------+
. save data
file data.dta saved
Get the means and rename
. collapse y, by(id)
. rename y mean
. rename id parent
Merge back
. merge 1:m parent using data
parent was byte now float
Result # of obs.
-----------------------------------------
not matched 0
matched 4 (_merge==3)
-----------------------------------------
. list
+---------------------------------------+
| parent mean id y _merge |
|---------------------------------------|
1. | 1 20 2 40 matched (3) |
2. | 2 30 1 30 matched (3) |
3. | 1 20 2 20 matched (3) |
4. | 2 30 1 10 matched (3) |
+---------------------------------------+
Nick
[email protected]
Nick Cox
There are various ways to do it. Here's one:
egen group = group(ID)
su group, meanonly
gen mean = .
qui forval i = 1/`r(max)' {
su ID if group == `i', meanonly
local ID = r(mean)
su Value if group == `i', meanonly
replace mean = r(mean) if ParentID == `ID'
}
Another would to -collapse- to a set of means on ID and then -merge- back matching with ParentID.
Nick
[email protected]
Ophelie Desmarais
Thanks a lot for the reply. Sorry for not providing more information:
Let's say I have this dataset:
UniqueID ID ParentID Value
1 1 10
2 1 20
3 2 1 5
4 2 2 5
Basically, I want to add another variable (say "mean_value") that
calculates the mean for (10+20)/2 and assigns the value to UniqueID=3
(since UniqueID=3's ParentID is 1 and this links to UniqueID=1 and
UniqueID=2's ID variable). For UniqueID=4, similarly, the mean_value
would be = 5 (the average value of the observations where
ID=ParentID=3).
egen mean_value=mean(Value), by(ParentID) would give me 5 and 5 for
UniqueID=3 and UniqueID=4, which is not what I try to get.
Again, thanks for your kind answer.
o.
On 18 May 2011 19:44, Nick Cox <[email protected]> wrote:
> I don't understand your pseudocode. It looks like
>
> Y = mean(X) where A = B
>
> to me.
>
> Nor is it "obvious" that
>
> egen MeanRevenueParent=mean(Revenue), by(ParentIndustry)
>
> is wrong. From your word description it sounds exactly right to me!
>
> Please give a worked example of what you want for a minimal dataset.
>
> Note that explanations that assume proficiency in Excel _and_ Stata catch a much smaller set of people than explanations that assume proficiency in Stata.
>
> (Can we assume that income == revenue?)
>
> Nick
> [email protected]
>
> Ophelie Desmarais
>
> I have tried to find an answer for this on the web but to no avail.
>
> Simplified, I have following dataset where the unit of observation is
> the industry (IndustryID) with the respective revenue.
>
> IndustryID, ParentIndustryID, Revenue
>
>
> The ParentIndustryID links to IndustryID and reflects the hierachical
> structure of the industries. For each industry, I want to add a
> variable reflecting the mean income of the ParentIndustryID. How do I
> do this in Stata? In Excel, this would be simply using AVERAGEIFS.
> egen MeanRevenueParent=mean(Revenue), by(ParentIndustry) obviously
> does not give the right mean. In terms of pseudo code, this should be
> something like:
>
> MeanRevenueParent = Mean(Revenue) WHERE ParentIndustry=IndustryID
>
> Could anyone help me out? I don't want to switch around with Excel all the time.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/