Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: RE: a question on averaging in Stata
From
Nick Cox <[email protected]>
To
[email protected]
Subject
Re: st: RE: a question on averaging in Stata
Date
Wed, 8 Feb 2012 14:55:05 +0000
The device used in the FAQ to calculate maximum values clearly isn't
good for work for medians. You are trying to calculate the median over
original value if I want this
zero if I don't want this
and those zeros may affect the result. With maxima over ages the zeros
won't (usually) do that.
This is quick and dirty but illustrates a more general technique
generate median = .
by year, sort: gen pid = _n
summarize pid, meanonly
quietly forvalues i = 1/`r(max)' {
egen work = median(idio / (pid != `i')), by(year)
replace median = work if pid == `i'
drop work
}
What is crucial here is that -median()- takes an expression, which can
be more complicated than a variable name, and that
idio / (pid != `i')
is -idio- when -pid- is not the current identifier and missing
otherwise. So, -egen- will ignore the missings.
For more discussion see
Nicholas J. Cox. 2011. Speaking Stata: Compared with .... Stata
Journal 11(2): 305-314.
Abstract. Many problems in data management center on relating values
to values in other observations, either within a dataset as a whole or
within groups such as panels. This column reviews some basic Stata
techniques helpful for such tasks, including the use of subscripts,
summarize, by:, sum(), cond(), and egen. Several techniques exploit
the fact that logical expressions yield 1 when true and 0 when false.
Dividing by zero to yield missings is revealed as a surprisingly
valuable device.
Advice on "Thanks in advance" is included in the FAQ.
Nick
On Wed, Feb 8, 2012 at 2:30 PM, [email protected] <[email protected]> wrote:
> thanks a lot for your feedback. The information was very useful. I have one additional question that relates to estimating a group median excluding observation i. I have looked at the article that you have referred to, but I got stuck with writing the code for the case of medians.
>
>
> Again I have a panel data with items i observed over several years t for variable x. I need to estimate the median of this variable for each year. However I have to estimate a specific median: for each item i I have to estimate
> the median value of x but excluding the observation for item i itself: i.e. the median over the other objects (if I could label them
> -i).
> I found this technically more challenging compared to the estimation of means. I have started with the following code - I used as example one of the codes that you have shared with us in your article. But I cannot find a way to isolate item i from the median calculation.
>
> Could you please help me with that? I would like to thank you in advance.
> generate maxvar = .
> by year, sort: gen pid = _n
> summarize pid
> . quietly forvalues i = 1/`r(max)' {
> . generate include = 1 if pid != `i'
> . egen work = median(idio * include), by(year)
> . replace maxvar = work if pid == `i'
> . drop include work
> . }
Von: Nick Cox <[email protected]>
> This is a FAQ.
>
> FAQ . . Creating variables recording prop. of the other members of a group
> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
> 4/05 How do I create variables summarizing for each
> individual properties of the other members of a
> group?
> http://www.stata.com/support/faqs/data/members.html
>
> but the question also yields easily to Stata logic. The starting point is the idea that the total for everybody else is just the total minus my value.
>
> The average of every other item is
>
> (sum of others) / (count of others)
>
> which is in the simplest case
>
> (sum of all - this value) / (count of all - 1)
>
> -- although careful code would need to take account of the possibility that each value is missing.
>
> That is then
>
> egen sum = total(x), by(group)
> egen count = count(x), by(group)
>
> and then the average is
>
> gen mean = (sum - cond(missing(x), 0, x) / (count - !missing(x))
>
> If any value is missing, then we need to subtract 0 (not missing!) from the total to get the total of others.
>
> If any value is missing, then we need to subtract 0 (not 1!) from the count to get the count of others.
[email protected]
> I have a panel data with items i observed over several years t for variable x.
>
> I have to estimate a specific average: for each item i I have to take the mean value of x excluding the observations for the item i itself;i.e. the average over the other objects (if I could label them -i).
>
> Is this possible in Stata?
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/