[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Create a flag variable for 10 most frequent values

From	Sergiy Radyakin <[email protected]>
To	[email protected]
Subject	Re: st: Create a flag variable for 10 most frequent values
Date	Mon, 16 Nov 2009 17:39:09 -0500

Dear Elan,

this will mark top 3 from the auto dataset:
(mpg is your diagnosis variable, rank will be the diagnosis rank)

	version 10.0
	clear

	sysuse auto
	tempvar freq
	generate byte `freq'=1
	sort mpg
	collapse (count) `freq', by(mpg)

	gsort -`freq' mpg
	list, sepby(`freq')

	generate rank=_n
	keep if rank<=3
	drop `freq'

	sort mpg
	tempfile top
	save `"`top'"'

	sysuse auto
	sort mpg
	merge mpg using `"`top'"'
                drop _merge

	generate top=!missing(rank)

	list make top rank

Hope this helps. This can certainly be optimized. But it is easy to
explain: we first create the statistics per group (diagnosis) then
keep the top-N, create the rank (1 - most frequent), then attach these
variables to the original dataset. For the auto dataset, the most
frequent MilesPerGallon is 18, followed by 19 and 14.

Best Sergiy.



On Mon, Nov 16, 2009 at 4:25 PM, Cohen, Elan <[email protected]> wrote:
> Hi all,
>
> I have a string variable dx that represents a patient's diagnosis (about 5,000 unique values).  I'd like to create a "top 10 flag" that equals 1 if dx is one of the top 10 most frequent diagnoses and 0 otherwise.
>
> I'm not even sure where to begin.  If someone could point me in the right direction, I'd be grateful.  Stata 10, Windows XP
>
> Thank you,
>
> - Elan
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: Create a flag variable for 10 most frequent values
  - From: "Cohen, Elan" <[email protected]>

Prev by Date: st: Large Datasets Panel Data Logit Limits
Next by Date: Re: st: AW: Create a flag variable for 10 most frequent values
Previous by thread: st: AW: Create a flag variable for 10 most frequent values
Next by thread: Re: st: AW: Create a flag variable for 10 most frequent values
Index(es):
- Date
- Thread