Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: RE: calculating mean without own observation
From
Nick Cox <[email protected]>
To
[email protected]
Subject
Re: st: RE: calculating mean without own observation
Date
Mon, 23 May 2011 17:16:26 +0100
Expanding this a bit:
There is more than one way to do this, which should be fine by everybody.
Phil Clayton outlined a solution looping over observations, which is
direct, but which will be _very_ slow for large datasets.
The FAQ below emphasises the -egen- route rather (too) heavily.
A direct route which will usually be fast goes something like this.
1. It is a good idea to segregate missings.
gen byte touse = !missing(value)
2. Then we get totals:
bysort touse category : gen total = sum(value) if touse
by touse category : replace total = total[_N]
3. Then we get counts of non-missings
by touse category : gen count = _N if touse
4. Now the finish is in sight
gen mean_others = (total - value) / (count - 1)
5. If we wanted to assign the mean of others to observations with
missing values, we could do this:
bysort category (touse) : replace mean_others = mean_others[_N]
An advantage of this approach is that it generalises easily:
6. Want to average an expression, not just a variable? Plug it in the
same place as the variable name.
7. Want to add -if- and/or -in- qualifiers? Build them in to the
-touse- definition
gen byte touse = !missing(value) & foo == 42 & bar < 1000
However, means are easy. Other statistics can be much more awkward.
The FAQ has more.
Nick
On Mon, May 23, 2011 at 4:02 PM, Nick Cox <[email protected]> wrote:
> This is an FAQ.
>
> FAQ . . Creating variables recording prop. of the other members of a group
> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
> 4/05 How do I create variables summarizing for each
> individual properties of the other members of a
> group?
> http://www.stata.com/support/faqs/data/members.html
>
> Once you know to look for references to (e.g.) "members.html" in the Statalist archive, you can find several related discussions.
>
> In this case, something like
>
> egen total = total(value), by(category)
> egen n = count(value), by(category)
>
> gen totalMINUSi = total - cond(missing(value), 0, value)
> gen meanMINUSi = totalMINUSi / (n - !missing(value))
>
> Incidentally, this cannot be done with a simple -if- precisely because values for other observations are involved in the calculation. But it can be approached directly.
>
> Nick
> [email protected]
>
> Guo Xu
>
> How do I calculate a mean (or any other summary statistic) excluding
> the *current* observation?
>
> For example, I have following data:
>
> i value category
> 1 5 1
> 2 5 1
> 3 10 1
> 4 2 2
> 5 2 2
>
> I would like to calculate the mean for each category (egen value_mean
> = mean(value), by(category)), but exclude the i-th observation:
> For i=1, for example, the mean value by category 1 would be (5+10)/2.
> For i=3, it would be (5+5)/2. For i=4, it would be 2/1.
>
> I guess there must be some simple *if* condition for this
> manipulation, but I failed to find it - would be most grateful for
> help.
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/