Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: _N in by-groups
From
[email protected]
To
[email protected]
Subject
Re: st: _N in by-groups
Date
Sat, 20 Aug 2011 00:40:45 +1000
I too am confused regarding when _N is or isn't influenced
by the ?by :- prefix.
I would like to remove a single outlier from each group within
the following data set...
input group var1
1 4
1 5
1 81
2 2
2 3
2 3
2 72
end
I would then like to calculate the mean for each group (with the outliers
gone).
I assumed that the following code would do the trick?
by group (var1), sort: egen average = mean(var1) if var1 != var1[_N]
When the mean was calculated ? it did so following the ?by :- prefix
(i.e. _N = 3 for group 1). But following the ?if- option, _N was
calculated from the whole data set (i.e. _N = 7).
I got around this problem by generating/sorting a byte tag, however, I still
don?t understand WHY and HOW Stata does this.
Could I have dealt with the above using a single line of code?
Cheers,
Mike (beginner Stata 8)
* So _N, as it were, never sees the -by:- and is not influenced
by it.
** If a Stata command has by-groups, it seems like _N is interpreted
sometimes as the number of observations in the by-group and sometimes
as the number of observations in the data set.
*** If you use the -by :- prefix it is always defined as the number of
observations within each by-group. Stata would be a pretty lousy
program if such a scalar randomly changed meaning...
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/