Jeph Herrin
> According to the documentation, egen's -rank- function should,
> with -field- or -track- switches, give me consecutive ranks. Yet
> when I try:
>
> . egen rank=rank(var1), track
>
> I get non-consecutive ranks:
>
> . tab rank
>
> track rank |
> of (beta1) | Freq. Percent Cum.
> ------------+-----------------------------------
> 1 | 5 0.83 0.83
> 6 | 5 0.83 1.66
> 11 | 6 1.00 2.66
> 17 | 8 1.33 3.99
> 25 | 8 1.33 5.32
> 33 | 4 0.67 5.99
> .
> .
> .
>
> I've been looking at this for half an hour and finally figured
> either there's something wrong with egen's rank, or there's
> something
> wrong with me.
Melony E. S. Sorbero
> Based on your table, it looks like ties are included in determining
the next
> value in the ranking. You have 5 observations tied with a rank of 1,
so the
> next ranking that appears is 6, and so on.
I think Melony is correct.
The documentation is terse, but I don't think either it or the code is
in error.
Let's consider ranking values 1, 2, 2, 2, 3.
1. The default of -egen, rank()- is to say
Value 1 has rank 1 (statistical convention: lowest value has lowest
rank).
Values 2, 2, 2 must have the same rank, but it should be assigned
preserving
the sum of the ranks which would otherwise have been allocated, i.e.
(2 + 3 + 4)
implies a rank of 3. This "correction for ties" is used in various
nonparametric
procedures, such as Spearman rank correlation.
Value 3 has rank 5.
2. The option -egen, rank() track- modifies this in how ties are
treated:
Value 1 has rank 1 (rule in track events: lowest value (i.e. lowest
time) has lowest rank).
Values 2, 2, 2 must have the same rank, but it should be assigned
according
to how many observations have lower values. (Analogue: in sports that
I
know of, not many and not well, these would all be second "equal".
Of course, many sports have procedures for breaking ties
and/or sufficiently precise timing or scoring that ties don't arise,
but that doesn't
affect the principle.)
Value 3 has rank 5.
The terminology of -track- and -field- was introduced (in STB-51 in
1999)
because the authors were not aware of standard alternatives. Is it
misleading?
I think what Jeph may be looking for is the -unique- option.
Nick
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/