Dear Elan,
this will mark top 3 from the auto dataset:
(mpg is your diagnosis variable, rank will be the diagnosis rank)
version 10.0
clear
sysuse auto
tempvar freq
generate byte `freq'=1
sort mpg
collapse (count) `freq', by(mpg)
gsort -`freq' mpg
list, sepby(`freq')
generate rank=_n
keep if rank<=3
drop `freq'
sort mpg
tempfile top
save `"`top'"'
sysuse auto
sort mpg
merge mpg using `"`top'"'
drop _merge
generate top=!missing(rank)
list make top rank
Hope this helps. This can certainly be optimized. But it is easy to
explain: we first create the statistics per group (diagnosis) then
keep the top-N, create the rank (1 - most frequent), then attach these
variables to the original dataset. For the auto dataset, the most
frequent MilesPerGallon is 18, followed by 19 and 14.
Best Sergiy.
On Mon, Nov 16, 2009 at 4:25 PM, Cohen, Elan <[email protected]> wrote:
> Hi all,
>
> I have a string variable dx that represents a patient's diagnosis (about 5,000 unique values). I'd like to create a "top 10 flag" that equals 1 if dx is one of the top 10 most frequent diagnoses and 0 otherwise.
>
> I'm not even sure where to begin. If someone could point me in the right direction, I'd be grateful. Stata 10, Windows XP
>
> Thank you,
>
> - Elan
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/