Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: AW: Create a flag variable for 10 most frequent values


From   "Martin Weiss" <[email protected]>
To   <[email protected]>
Subject   RE: st: AW: Create a flag variable for 10 most frequent values
Date   Tue, 17 Nov 2009 01:10:43 +0100

<>

Why is "18", which is the most frequent "mpg" value, assigned a "0" for
"top10" in your example? Your code seems to flag the highest values (my
initial mistake), and not the most frequent ones...


HTH
Martin


-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Nick Winter
Sent: Dienstag, 17. November 2009 00:57
To: [email protected]
Subject: Re: st: AW: Create a flag variable for 10 most frequent values

No collapsing, no merging, no -egen-:

sysuse auto
bysort mpg: gen top10=(_n==1)
replace top10 = sum(top10)
sum top10, meanonly
replace top10 = (top10>=(`r(max)'-9))


On 11/16/2009 6:37 PM, Martin Weiss wrote:
> <>
> 
> Good point! I always make up my own dataset according to the description
in
> the initial post, and in this case, my dataset may have been too simple.
> Still, Elan can -merge- back with the original dataset, with "diagnosis"
as
> her key.
> 
> ***
> sysuse auto, clear
> keep mpg
> 
> bys mpg: egen mycount=count(mpg)
> 
> //collapse to one per group
> bys mpg: keep if _n==1
> //-sort- on count var
> sort mycount
> //take the last ten
> gen byte mostfreq=inrange(_n,`=_N-9',_N)
> //and back as we were
> expand mycount
> 
> merge m:m mpg /* 
>  */  using "C:\Program Files (x86)\Stata11\auto.dta",  /* 
>  */ nogenerate nolabel nonotes
> ***
> 
> 
> You need to substitute the path to your auto dataset in the last line...
> 
> HTH
> Martin
> 
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Sergiy Radyakin
> Sent: Dienstag, 17. November 2009 00:03
> To: [email protected]
> Subject: Re: st: AW: Create a flag variable for 10 most frequent values
> 
> suppose you have data with two vars: name and diagnosis (or make and mpg)
> 
> and you want to add "top10" dummy to that.
> You keep one person for each diagnosis
> After you -expand- there will be N persons with the same name?
> Can you show this with auto.dta?
> S.R.
> 
> 
> 
> 
> On Mon, Nov 16, 2009 at 5:36 PM, Martin Weiss <[email protected]>
wrote:
>> <>
>>
>> What do you want to know? I collapse (fineprint: no hyphens around it as
I
>> use -keep- to do it) the thing to be able to -sort- on "mycount" and
> assign
>> the flag that Elan requested. Once that is done, I want my original data
>> back, so I -expand- it back to its former glory. Any suggestions for
>> improvements are welcome...
>>
>>
>>
>> HTH
>> Martin
>>
>>
>> -----Ursprüngliche Nachricht-----
>> Von: [email protected]
>> [mailto:[email protected]] Im Auftrag von Sergiy
> Radyakin
>> Gesendet: Montag, 16. November 2009 23:33
>> An: [email protected]
>> Betreff: Re: st: AW: Create a flag variable for 10 most frequent values
>>
>> Martin, could you please explain how -expand- is used here?
>> Best, Sergiy
>>
>> On Mon, Nov 16, 2009 at 5:14 PM, Martin Weiss <[email protected]>
> wrote:
>>> <>
>>>
>>> Here is a strategy:
>>>
>>>
>>> *************
>>> clear*
>>>
>>> //construct data
>>> set obs 10000
>>> gen dx=1+int(100*runiform())
>>>
>>> //see freqs
>>> ta dx
>>> //use ben jann`s -fre-
>>> capture which fre
>>> if _rc ssc install fre
>>> fre dx, desc
>>>
>>> //get counts next to "dx"s
>>> bys dx: egen mycount=count(dx)
>>>
>>> //collapse to one per group
>>> bys dx: keep if _n==1
>>> //-sort- on count var
>>> sort mycount
>>> //take the last ten
>>> gen byte mostfreq=inrange(_n,`=_N-9',_N)
>>> //and back as we were
>>> expand mycount
>>>
>>> //see result
>>> ta myc mostfreq
>>> *************
>>>
>>>
>>>
>>> HTH
>>> Martin
>>>
>>>
>>> -----Ursprüngliche Nachricht-----
>>> Von: [email protected]
>>> [mailto:[email protected]] Im Auftrag von Cohen, Elan
>>> Gesendet: Montag, 16. November 2009 22:25
>>> An: '[email protected]'
>>> Betreff: st: Create a flag variable for 10 most frequent values
>>>
>>> Hi all,
>>>
>>> I have a string variable dx that represents a patient's diagnosis (about
>>> 5,000 unique values).  I'd like to create a "top 10 flag" that equals 1
> if
>>> dx is one of the top 10 most frequent diagnoses and 0 otherwise.
>>>
>>> I'm not even sure where to begin.  If someone could point me in the
right
>>> direction, I'd be grateful.  Stata 10, Windows XP
>>>
>>> Thank you,
>>>
>>> - Elan
>>>
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/statalist/faq
>>> *   http://www.ats.ucla.edu/stat/stata/
>>>
>>>
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/statalist/faq
>>> *   http://www.ats.ucla.edu/stat/stata/
>>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>>
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>>
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
> 
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/

-- 
--------------------------------------------------------------
Nicholas Winter                                 434.924.6994 t
Assistant Professor                             434.924.3359 f
Department of Politics                  [email protected] e
University of Virginia          faculty.virginia.edu/nwinter w
PO Box 400787, 100 Cabell Hall
Charlottesville, VA 22904

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2025 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index