On Mon, Nov 16, 2009 at 6:56 PM, Nick Winter <[email protected]> wrote:
> No collapsing, no merging, no -egen-:
>
> sysuse auto
> bysort mpg: gen top10=(_n==1)
> replace top10 = sum(top10)
> sum top10, meanonly
> replace top10 = (top10>=(`r(max)'-9))
>
and not a solution to the problem posed.
Rather it is a solution to a different problem. It assumes that the
dataset is collapsed already, and it gives you the "highest" values,
not the most frequent ones. For the auto.dta (and 3 top, instead of 10
top) your program produces:
.....
66. | Honda Civic 28 0 |
67. | Chev. Chevette 29 0 |
68. | Dodge Colt 30 0 |
69. | Mazda GLC 30 0 |
70. | Toyota Corolla 31 0 |
71. | Plym. Champ 34 1 |
72. | Datsun 210 35 1 |
73. | Subaru 35 1 |
74. | VW Diesel 41 1 |
+---------------------------------+
clearly 34, 35 and 41 are the _highest_ values of the mpg, but not the
most _frequent_ ones.
Sergiy
>
> On 11/16/2009 6:37 PM, Martin Weiss wrote:
>>
>> <>
>>
>> Good point! I always make up my own dataset according to the description
>> in
>> the initial post, and in this case, my dataset may have been too simple.
>> Still, Elan can -merge- back with the original dataset, with "diagnosis"
>> as
>> her key.
>>
>> ***
>> sysuse auto, clear
>> keep mpg
>>
>> bys mpg: egen mycount=count(mpg)
>>
>> //collapse to one per group
>> bys mpg: keep if _n==1
>> //-sort- on count var
>> sort mycount
>> //take the last ten
>> gen byte mostfreq=inrange(_n,`=_N-9',_N)
>> //and back as we were
>> expand mycount
>>
>> merge m:m mpg /* */ using "C:\Program Files (x86)\Stata11\auto.dta", /*
>> */ nogenerate nolabel nonotes
>> ***
>>
>>
>> You need to substitute the path to your auto dataset in the last line...
>>
>> HTH
>> Martin
>>
>> -----Original Message-----
>> From: [email protected]
>> [mailto:[email protected]] On Behalf Of Sergiy Radyakin
>> Sent: Dienstag, 17. November 2009 00:03
>> To: [email protected]
>> Subject: Re: st: AW: Create a flag variable for 10 most frequent values
>>
>> suppose you have data with two vars: name and diagnosis (or make and mpg)
>>
>> and you want to add "top10" dummy to that.
>> You keep one person for each diagnosis
>> After you -expand- there will be N persons with the same name?
>> Can you show this with auto.dta?
>> S.R.
>>
>>
>>
>>
>> On Mon, Nov 16, 2009 at 5:36 PM, Martin Weiss <[email protected]>
>> wrote:
>>>
>>> <>
>>>
>>> What do you want to know? I collapse (fineprint: no hyphens around it as
>>> I
>>> use -keep- to do it) the thing to be able to -sort- on "mycount" and
>>
>> assign
>>>
>>> the flag that Elan requested. Once that is done, I want my original data
>>> back, so I -expand- it back to its former glory. Any suggestions for
>>> improvements are welcome...
>>>
>>>
>>>
>>> HTH
>>> Martin
>>>
>>>
>>> -----Ursprüngliche Nachricht-----
>>> Von: [email protected]
>>> [mailto:[email protected]] Im Auftrag von Sergiy
>>
>> Radyakin
>>>
>>> Gesendet: Montag, 16. November 2009 23:33
>>> An: [email protected]
>>> Betreff: Re: st: AW: Create a flag variable for 10 most frequent values
>>>
>>> Martin, could you please explain how -expand- is used here?
>>> Best, Sergiy
>>>
>>> On Mon, Nov 16, 2009 at 5:14 PM, Martin Weiss <[email protected]>
>>
>> wrote:
>>>>
>>>> <>
>>>>
>>>> Here is a strategy:
>>>>
>>>>
>>>> *************
>>>> clear*
>>>>
>>>> //construct data
>>>> set obs 10000
>>>> gen dx=1+int(100*runiform())
>>>>
>>>> //see freqs
>>>> ta dx
>>>> //use ben jann`s -fre-
>>>> capture which fre
>>>> if _rc ssc install fre
>>>> fre dx, desc
>>>>
>>>> //get counts next to "dx"s
>>>> bys dx: egen mycount=count(dx)
>>>>
>>>> //collapse to one per group
>>>> bys dx: keep if _n==1
>>>> //-sort- on count var
>>>> sort mycount
>>>> //take the last ten
>>>> gen byte mostfreq=inrange(_n,`=_N-9',_N)
>>>> //and back as we were
>>>> expand mycount
>>>>
>>>> //see result
>>>> ta myc mostfreq
>>>> *************
>>>>
>>>>
>>>>
>>>> HTH
>>>> Martin
>>>>
>>>>
>>>> -----Ursprüngliche Nachricht-----
>>>> Von: [email protected]
>>>> [mailto:[email protected]] Im Auftrag von Cohen, Elan
>>>> Gesendet: Montag, 16. November 2009 22:25
>>>> An: '[email protected]'
>>>> Betreff: st: Create a flag variable for 10 most frequent values
>>>>
>>>> Hi all,
>>>>
>>>> I have a string variable dx that represents a patient's diagnosis (about
>>>> 5,000 unique values). I'd like to create a "top 10 flag" that equals 1
>>
>> if
>>>>
>>>> dx is one of the top 10 most frequent diagnoses and 0 otherwise.
>>>>
>>>> I'm not even sure where to begin. If someone could point me in the
>>>> right
>>>> direction, I'd be grateful. Stata 10, Windows XP
>>>>
>>>> Thank you,
>>>>
>>>> - Elan
>>>>
>>>> *
>>>> * For searches and help try:
>>>> * http://www.stata.com/help.cgi?search
>>>> * http://www.stata.com/support/statalist/faq
>>>> * http://www.ats.ucla.edu/stat/stata/
>>>>
>>>>
>>>> *
>>>> * For searches and help try:
>>>> * http://www.stata.com/help.cgi?search
>>>> * http://www.stata.com/support/statalist/faq
>>>> * http://www.ats.ucla.edu/stat/stata/
>>>>
>>> *
>>> * For searches and help try:
>>> * http://www.stata.com/help.cgi?search
>>> * http://www.stata.com/support/statalist/faq
>>> * http://www.ats.ucla.edu/stat/stata/
>>>
>>>
>>> *
>>> * For searches and help try:
>>> * http://www.stata.com/help.cgi?search
>>> * http://www.stata.com/support/statalist/faq
>>> * http://www.ats.ucla.edu/stat/stata/
>>>
>>
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/statalist/faq
>> * http://www.ats.ucla.edu/stat/stata/
>>
>>
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/statalist/faq
>> * http://www.ats.ucla.edu/stat/stata/
>
> --
> --------------------------------------------------------------
> Nicholas Winter 434.924.6994 t
> Assistant Professor 434.924.3359 f
> Department of Politics [email protected] e
> University of Virginia faculty.virginia.edu/nwinter w
> PO Box 400787, 100 Cabell Hall
> Charlottesville, VA 22904
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/