<>
Why is "18", which is the most frequent "mpg" value, assigned a "0" for
"top10" in your example? Your code seems to flag the highest values (my
initial mistake), and not the most frequent ones...
HTH
Martin
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Nick Winter
Sent: Dienstag, 17. November 2009 00:57
To: [email protected]
Subject: Re: st: AW: Create a flag variable for 10 most frequent values
No collapsing, no merging, no -egen-:
sysuse auto
bysort mpg: gen top10=(_n==1)
replace top10 = sum(top10)
sum top10, meanonly
replace top10 = (top10>=(`r(max)'-9))
On 11/16/2009 6:37 PM, Martin Weiss wrote:
> <>
>
> Good point! I always make up my own dataset according to the description
in
> the initial post, and in this case, my dataset may have been too simple.
> Still, Elan can -merge- back with the original dataset, with "diagnosis"
as
> her key.
>
> ***
> sysuse auto, clear
> keep mpg
>
> bys mpg: egen mycount=count(mpg)
>
> //collapse to one per group
> bys mpg: keep if _n==1
> //-sort- on count var
> sort mycount
> //take the last ten
> gen byte mostfreq=inrange(_n,`=_N-9',_N)
> //and back as we were
> expand mycount
>
> merge m:m mpg /*
> */ using "C:\Program Files (x86)\Stata11\auto.dta", /*
> */ nogenerate nolabel nonotes
> ***
>
>
> You need to substitute the path to your auto dataset in the last line...
>
> HTH
> Martin
>
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Sergiy Radyakin
> Sent: Dienstag, 17. November 2009 00:03
> To: [email protected]
> Subject: Re: st: AW: Create a flag variable for 10 most frequent values
>
> suppose you have data with two vars: name and diagnosis (or make and mpg)
>
> and you want to add "top10" dummy to that.
> You keep one person for each diagnosis
> After you -expand- there will be N persons with the same name?
> Can you show this with auto.dta?
> S.R.
>
>
>
>
> On Mon, Nov 16, 2009 at 5:36 PM, Martin Weiss <[email protected]>
wrote:
>> <>
>>
>> What do you want to know? I collapse (fineprint: no hyphens around it as
I
>> use -keep- to do it) the thing to be able to -sort- on "mycount" and
> assign
>> the flag that Elan requested. Once that is done, I want my original data
>> back, so I -expand- it back to its former glory. Any suggestions for
>> improvements are welcome...
>>
>>
>>
>> HTH
>> Martin
>>
>>
>> -----Ursprüngliche Nachricht-----
>> Von: [email protected]
>> [mailto:[email protected]] Im Auftrag von Sergiy
> Radyakin
>> Gesendet: Montag, 16. November 2009 23:33
>> An: [email protected]
>> Betreff: Re: st: AW: Create a flag variable for 10 most frequent values
>>
>> Martin, could you please explain how -expand- is used here?
>> Best, Sergiy
>>
>> On Mon, Nov 16, 2009 at 5:14 PM, Martin Weiss <[email protected]>
> wrote:
>>> <>
>>>
>>> Here is a strategy:
>>>
>>>
>>> *************
>>> clear*
>>>
>>> //construct data
>>> set obs 10000
>>> gen dx=1+int(100*runiform())
>>>
>>> //see freqs
>>> ta dx
>>> //use ben jann`s -fre-
>>> capture which fre
>>> if _rc ssc install fre
>>> fre dx, desc
>>>
>>> //get counts next to "dx"s
>>> bys dx: egen mycount=count(dx)
>>>
>>> //collapse to one per group
>>> bys dx: keep if _n==1
>>> //-sort- on count var
>>> sort mycount
>>> //take the last ten
>>> gen byte mostfreq=inrange(_n,`=_N-9',_N)
>>> //and back as we were
>>> expand mycount
>>>
>>> //see result
>>> ta myc mostfreq
>>> *************
>>>
>>>
>>>
>>> HTH
>>> Martin
>>>
>>>
>>> -----Ursprüngliche Nachricht-----
>>> Von: [email protected]
>>> [mailto:[email protected]] Im Auftrag von Cohen, Elan
>>> Gesendet: Montag, 16. November 2009 22:25
>>> An: '[email protected]'
>>> Betreff: st: Create a flag variable for 10 most frequent values
>>>
>>> Hi all,
>>>
>>> I have a string variable dx that represents a patient's diagnosis (about
>>> 5,000 unique values). I'd like to create a "top 10 flag" that equals 1
> if
>>> dx is one of the top 10 most frequent diagnoses and 0 otherwise.
>>>
>>> I'm not even sure where to begin. If someone could point me in the
right
>>> direction, I'd be grateful. Stata 10, Windows XP
>>>
>>> Thank you,
>>>
>>> - Elan
>>>
>>> *
>>> * For searches and help try:
>>> * http://www.stata.com/help.cgi?search
>>> * http://www.stata.com/support/statalist/faq
>>> * http://www.ats.ucla.edu/stat/stata/
>>>
>>>
>>> *
>>> * For searches and help try:
>>> * http://www.stata.com/help.cgi?search
>>> * http://www.stata.com/support/statalist/faq
>>> * http://www.ats.ucla.edu/stat/stata/
>>>
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/statalist/faq
>> * http://www.ats.ucla.edu/stat/stata/
>>
>>
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/statalist/faq
>> * http://www.ats.ucla.edu/stat/stata/
>>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
--
--------------------------------------------------------------
Nicholas Winter 434.924.6994 t
Assistant Professor 434.924.3359 f
Department of Politics [email protected] e
University of Virginia faculty.virginia.edu/nwinter w
PO Box 400787, 100 Cabell Hall
Charlottesville, VA 22904
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/