Eric VonDohlen
> I have a continuous variable X, which I would like to:
>
> (a) sort in ascending or descending order;
> (b) rank the sorted X into some specified number of groups;
> (c) report the mean of X (or some other statistic) by group.
Jayesh Kumar replied and pointed to -gsort- for (a). Fine.
On (b) and (c) Jayesh suggested
> *This will create percentiles, you can choose your own number of
groups.
> *for ranking purpose:
> by year:gen a=_n
> bysort year: egen b=max(a)
> gen percentile_year=((a/b)*100)
> *for reporting summary statistics:
> bysort percentile_year: summarize year
This is an interesting approach, but it needs to be
followed by some fixes and a couple of warnings. I don't
think it is general enough to be the best answer
to Eric's question.
A small fix is that the first command depends on observations
being in the right -sort- order, so the -bysort- is
needed on that (and not needed on the second):
bysort year: gen a = _n
by year: egen b = max(a)
gen percentile_year = ((a/b)*100)
As a matter of Stata style only, this can be condensed to
bysort year : gen percentile_year = (_n/_N) * 100
The first major problem is that whatever is of interest
should be sorted within each -year- (if not, the
assignment of percentiles is quite arbitrary).
bysort year (whatever) : gen percentile_year = (_n/_N) * 100
Two other major problems:
* No adjustment for ties. Tied values will get
assigned to different percentiles.
* This works best when there is an equal number
of observations within each group, but not otherwise.
(Suppose there were 4 observations in each -year-.
-percentile_year- would take on values 25, 50, 75, 100.)
A more general answer to Eric question's is to use
-xtile- and then -summarize-, -tabstat-, etc. (and
to read the manual; new users are expected to read the
manual like everybody else!).
Nick
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/