Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: display identifiers accounting for duplicate obs
From
Nick Cox <[email protected]>
To
[email protected]
Subject
Re: st: display identifiers accounting for duplicate obs
Date
Fri, 4 May 2012 22:23:28 +0100
. search rank
would have pointed to -egen- (and much else besides).
Apart from the question of how to calculate ranks in Stata, Tashi left
often the question of how ranks are defined in any case when
duplicates (meaning, ties) are present. The default in Stata uses a
rule that will be familiar to students of rank correlation: ties are
ranked equally and the average rank is preserved. I don't know that
this rule is used anywhere outside statistics.
-egen, rank()- has various options in addition to that default. When
Richard Goldstein and I were writing what were then extensions to
-egen-
STB-52 dm72.1 . . . . . . . . . . . . Alternative ranking procedures: update
(help altrank if installed) . . . . . . . . N. J. Cox and R. Goldstein
11/99 p.2; STB Reprints Vol 9, p.51
incorporated into Stata 7.0 egen rank() function
STB-51 dm72 . . . . . . . . . . . . . . . . . Alternative ranking procedures
(help altrank, lbleqrnk if installed) . . . N. J. Cox and R. Goldstein
9/99 pp.5--7; STB Reprints Vol 9, pp.48--51
incorporated into Stata 7.0 egen rank() function
I spent some time looking for systematic treatments of different
ranking rules in various literatures and was surprised to find
nothing, so the names "field", "track" and "unique" were introduced
faute de mieux. I am still interested in relevant literature
references.
http://press.princeton.edu/titles/9661.html
looks interesting, but I have yet to read it.
Nick
On Fri, May 4, 2012 at 9:33 PM, Ronnie Babigumira <[email protected]> wrote:
> sorry that should have been
>
> egen rhits = rank(-hits)
On Friday, May 4, 2012 at 10:32 PM, Ronnie Babigumira wrote:
>> egen rhits = rank(hits)?
On Friday, May 4, 2012 at 10:27 PM, tashi lama wrote:
>
>> > I can't come up with this solution despite spending quite some thought and time. The problem in hand sounds fairly straigh forward
>> >
>> > I have a dataset like following
>> >
>> > hits
>> >
>> > 1
>> > 2
>> > 3
>> > 4
>> > 4
>> > 5
>> > 6
>> > 6
>> >
>> > and I want to generate variable rank. Notice, if there were no duplicate obs, i would have said
>> >
>> >
>> > gsort -hits
>> >
>> > gen rank=_n and rank column would have given the ranks of the obs. That is what i want.
>> >
>> >
>> > However, there are some duplicate obs and i tried doing
>> >
>> > gsort -hits
>> >
>> > gen rank=cond(hits[_n-1]==hits[_n], _n-1, _n) which would give me
>> >
>> >
>> > hits rank
>> >
>> > 6 1
>> >
>> > 6 1
>> >
>> > 5 3
>> >
>> > 4 4
>> >
>> > 4 4
>> >
>> > 3 6
>> >
>> > 2 7
>> >
>> > 1 8 and that is not what I want.
>> >
>> >
>> >
>> > I looked at commands like generate, duplicates and I didn't see much relevant to my problem.
>> >
>> >
>> >
>> > Could someone give me a lead where to look at or which command should I dig in ? Thanks a lot.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/