<>
Good point! I always make up my own dataset according to the description in
the initial post, and in this case, my dataset may have been too simple.
Still, Elan can -merge- back with the original dataset, with "diagnosis" as
her key.
***
sysuse auto, clear
keep mpg
bys mpg: egen mycount=count(mpg)
//collapse to one per group
bys mpg: keep if _n==1
//-sort- on count var
sort mycount
//take the last ten
gen byte mostfreq=inrange(_n,`=_N-9',_N)
//and back as we were
expand mycount
merge m:m mpg /*
*/ using "C:\Program Files (x86)\Stata11\auto.dta", /*
*/ nogenerate nolabel nonotes
***
You need to substitute the path to your auto dataset in the last line...
HTH
Martin
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Sergiy Radyakin
Sent: Dienstag, 17. November 2009 00:03
To: [email protected]
Subject: Re: st: AW: Create a flag variable for 10 most frequent values
suppose you have data with two vars: name and diagnosis (or make and mpg)
and you want to add "top10" dummy to that.
You keep one person for each diagnosis
After you -expand- there will be N persons with the same name?
Can you show this with auto.dta?
S.R.
On Mon, Nov 16, 2009 at 5:36 PM, Martin Weiss <[email protected]> wrote:
<>
What do you want to know? I collapse (fineprint: no hyphens around it as I
use -keep- to do it) the thing to be able to -sort- on "mycount" and
assign
the flag that Elan requested. Once that is done, I want my original data
back, so I -expand- it back to its former glory. Any suggestions for
improvements are welcome...
HTH
Martin
-----Ursprüngliche Nachricht-----
Von: [email protected]
[mailto:[email protected]] Im Auftrag von Sergiy
Radyakin
Gesendet: Montag, 16. November 2009 23:33
An: [email protected]
Betreff: Re: st: AW: Create a flag variable for 10 most frequent values
Martin, could you please explain how -expand- is used here?
Best, Sergiy
On Mon, Nov 16, 2009 at 5:14 PM, Martin Weiss <[email protected]>
wrote:
<>
Here is a strategy:
*************
clear*
//construct data
set obs 10000
gen dx=1+int(100*runiform())
//see freqs
ta dx
//use ben jann`s -fre-
capture which fre
if _rc ssc install fre
fre dx, desc
//get counts next to "dx"s
bys dx: egen mycount=count(dx)
//collapse to one per group
bys dx: keep if _n==1
//-sort- on count var
sort mycount
//take the last ten
gen byte mostfreq=inrange(_n,`=_N-9',_N)
//and back as we were
expand mycount
//see result
ta myc mostfreq
*************
HTH
Martin
-----Ursprüngliche Nachricht-----
Von: [email protected]
[mailto:[email protected]] Im Auftrag von Cohen, Elan
Gesendet: Montag, 16. November 2009 22:25
An: '[email protected]'
Betreff: st: Create a flag variable for 10 most frequent values
Hi all,
I have a string variable dx that represents a patient's diagnosis (about
5,000 unique values). I'd like to create a "top 10 flag" that equals 1
if
dx is one of the top 10 most frequent diagnoses and 0 otherwise.
I'm not even sure where to begin. If someone could point me in the right
direction, I'd be grateful. Stata 10, Windows XP
Thank you,
- Elan
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/