Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: RE: correct egen command to calculate means - please help

From	Nick Cox <[email protected]>
To	"'[email protected]'" <[email protected]>
Subject	RE: st: RE: correct egen command to calculate means - please help
Date	Wed, 18 May 2011 19:30:13 +0100

Here's #2. 

Set up dopey example 

clear
set obs 4
egen id = seq(), to(2)
gen parent = 3 - id 
gen y = 10 * _n

. l

     +------------------+
     | id   parent    y |
     |------------------|
  1. |  1        2   10 |
  2. |  2        1   20 |
  3. |  1        2   30 |
  4. |  2        1   40 |
     +------------------+

. save data
file data.dta saved

Get the means and rename 

. collapse y, by(id)

. rename y mean 

. rename id parent

Merge back 

. merge 1:m parent using data 
parent was byte now float

    Result                           # of obs.
    -----------------------------------------
    not matched                             0
    matched                                 4  (_merge==3)
    -----------------------------------------

. list 

     +---------------------------------------+
     | parent   mean   id    y        _merge |
     |---------------------------------------|
  1. |      1     20    2   40   matched (3) |
  2. |      2     30    1   30   matched (3) |
  3. |      1     20    2   20   matched (3) |
  4. |      2     30    1   10   matched (3) |
     +---------------------------------------+

Nick 
[email protected] 

Nick Cox

There are various ways to do it. Here's one:  

egen group = group(ID)
su group, meanonly 

gen mean = . 

qui forval i = 1/`r(max)' { 
	su ID if group == `i', meanonly 
	local ID = r(mean) 
	su Value if group == `i', meanonly
	replace mean = r(mean) if ParentID == `ID' 
}
	
Another would to -collapse- to a set of means on ID and then -merge- back matching with ParentID. 

Nick 
[email protected] 

Ophelie Desmarais

Thanks a lot for the reply. Sorry for not providing more information:

Let's say I have this dataset:

UniqueID          ID                        ParentID             Value
1                     1                                                   10
2                     1                                                    20
3                     2                          1                        5
4                     2                          2                        5

Basically, I want to add another variable (say "mean_value") that
calculates the mean for (10+20)/2 and assigns the value to UniqueID=3
(since UniqueID=3's ParentID is 1 and this links to UniqueID=1 and
UniqueID=2's ID variable). For UniqueID=4, similarly, the mean_value
would be = 5 (the average value of the observations where
ID=ParentID=3).

egen mean_value=mean(Value), by(ParentID) would give me 5 and 5 for
UniqueID=3 and UniqueID=4, which is not what I try to get.

Again, thanks for your kind answer.

o.


On 18 May 2011 19:44, Nick Cox <[email protected]> wrote:
> I don't understand your pseudocode. It looks like
>
> Y = mean(X) where A = B
>
> to me.
>
> Nor is it "obvious" that
>
> egen MeanRevenueParent=mean(Revenue), by(ParentIndustry)
>
> is wrong. From your word description it sounds exactly right to me!
>
> Please give a worked example of what you want for a minimal dataset.
>
> Note that explanations that assume proficiency in Excel _and_ Stata catch a much smaller set of people than explanations that assume proficiency in Stata.
>
> (Can we assume that income == revenue?)
>
> Nick
> [email protected]
>
> Ophelie Desmarais
>
> I have tried to find an answer for this on the web but to no avail.
>
> Simplified, I have following dataset where the unit of observation is
> the industry (IndustryID) with the respective revenue.
>
> IndustryID, ParentIndustryID, Revenue
>
>
> The ParentIndustryID links to IndustryID and reflects the hierachical
> structure of the industries. For each industry, I want to add a
> variable reflecting the mean income of the ParentIndustryID. How do I
> do this in Stata? In Excel, this would be simply using AVERAGEIFS.
> egen MeanRevenueParent=mean(Revenue), by(ParentIndustry) obviously
> does not give the right mean. In terms of pseudo code, this should be
> something like:
>
> MeanRevenueParent = Mean(Revenue) WHERE ParentIndustry=IndustryID
>
> Could anyone help me out? I don't want to switch around with Excel all the time.


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: correct egen command to calculate means - please help
  - From: Ophelie Desmarais <[email protected]>
- st: RE: correct egen command to calculate means - please help
  - From: Nick Cox <[email protected]>
- Re: st: RE: correct egen command to calculate means - please help
  - From: Ophelie Desmarais <[email protected]>
- RE: st: RE: correct egen command to calculate means - please help
  - From: Nick Cox <[email protected]>

Prev by Date: RE: st: RE: correct egen command to calculate means - please help
Next by Date: RE: st: creating variable summarizing for each individual properties of other members of a group at t-1
Previous by thread: RE: st: RE: correct egen command to calculate means - please help
Index(es):
- Date
- Thread