[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: AW: Create a flag variable for 10 most frequent values

From	Nick Winter <[email protected]>
To	[email protected]
Subject	Re: st: AW: Create a flag variable for 10 most frequent values
Date	Mon, 16 Nov 2009 19:31:23 -0500

Ugh.  How silly of me.

I think this does it, though it doesn't deal with values that are tiedfor 10th:


sysuse auto
bysort mpg: gen n=_N
bysort n mpg: gen top10=(_n==1)
replace top10 = sum(top10)
sum top10, meanonly
replace top10 = (top10>=(`r(max)'-9))

To flag the top 10, along with any additional that have the samefrequency as the 10th:


bysort mpg: gen n=_N
bysort n mpg: gen tag=(_n==1)
replace tag = sum(tag)
sum tag , meanonly
gen top10ties = (tag>=(`r(max)'-9))
sum n if tag==(`r(max)'-9), meanonly
replace top10ties = 1 if n==`r(max)'


table mpg top10
table mpg top10ties


On 11/16/2009 7:10 PM, Martin Weiss wrote:

<>

Why is "18", which is the most frequent "mpg" value, assigned a "0" for
"top10" in your example? Your code seems to flag the highest values (my
initial mistake), and not the most frequent ones...


HTH
Martin


-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Nick Winter
Sent: Dienstag, 17. November 2009 00:57
To: [email protected]
Subject: Re: st: AW: Create a flag variable for 10 most frequent values

No collapsing, no merging, no -egen-:

sysuse auto
bysort mpg: gen top10=(_n==1)
replace top10 = sum(top10)
sum top10, meanonly
replace top10 = (top10>=(`r(max)'-9))


On 11/16/2009 6:37 PM, Martin Weiss wrote:

<>

Good point! I always make up my own dataset according to the description

in

the initial post, and in this case, my dataset may have been too simple.
Still, Elan can -merge- back with the original dataset, with "diagnosis"

as

her key.

***
sysuse auto, clear
keep mpg

bys mpg: egen mycount=count(mpg)

//collapse to one per group
bys mpg: keep if _n==1
//-sort- on count var
sort mycount
//take the last ten
gen byte mostfreq=inrange(_n,`=_N-9',_N)
//and back as we were
expand mycount

merge m:m mpg /**/ using "C:\Program Files (x86)\Stata11\auto.dta", /**/ nogenerate nolabel nonotes

***


You need to substitute the path to your auto dataset in the last line...

HTH
Martin

-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Sergiy Radyakin
Sent: Dienstag, 17. November 2009 00:03
To: [email protected]
Subject: Re: st: AW: Create a flag variable for 10 most frequent values

suppose you have data with two vars: name and diagnosis (or make and mpg)

and you want to add "top10" dummy to that.
You keep one person for each diagnosis
After you -expand- there will be N persons with the same name?
Can you show this with auto.dta?
S.R.




On Mon, Nov 16, 2009 at 5:36 PM, Martin Weiss <[email protected]>

wrote:

<>

What do you want to know? I collapse (fineprint: no hyphens around it as

use -keep- to do it) the thing to be able to -sort- on "mycount" and

assign

the flag that Elan requested. Once that is done, I want my original data
back, so I -expand- it back to its former glory. Any suggestions for
improvements are welcome...



HTH
Martin


-----Ursprüngliche Nachricht-----
Von: [email protected]
[mailto:[email protected]] Im Auftrag von Sergiy

Radyakin

Gesendet: Montag, 16. November 2009 23:33
An: [email protected]
Betreff: Re: st: AW: Create a flag variable for 10 most frequent values

Martin, could you please explain how -expand- is used here?
Best, Sergiy

On Mon, Nov 16, 2009 at 5:14 PM, Martin Weiss <[email protected]>

wrote:

<>

Here is a strategy:


*************
clear*

//construct data
set obs 10000
gen dx=1+int(100*runiform())

//see freqs
ta dx
//use ben jann`s -fre-
capture which fre
if _rc ssc install fre
fre dx, desc

//get counts next to "dx"s
bys dx: egen mycount=count(dx)

//collapse to one per group
bys dx: keep if _n==1
//-sort- on count var
sort mycount
//take the last ten
gen byte mostfreq=inrange(_n,`=_N-9',_N)
//and back as we were
expand mycount

//see result
ta myc mostfreq
*************



HTH
Martin


-----Ursprüngliche Nachricht-----
Von: [email protected]
[mailto:[email protected]] Im Auftrag von Cohen, Elan
Gesendet: Montag, 16. November 2009 22:25
An: '[email protected]'
Betreff: st: Create a flag variable for 10 most frequent values

Hi all,

I have a string variable dx that represents a patient's diagnosis (about
5,000 unique values).  I'd like to create a "top 10 flag" that equals 1

if

dx is one of the top 10 most frequent diagnoses and 0 otherwise.

I'm not even sure where to begin.  If someone could point me in the

right

direction, I'd be grateful.  Stata 10, Windows XP

Thank you,

- Elan

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


--
--------------------------------------------------------------
Nicholas Winter                                 434.924.6994 t
Assistant Professor                             434.924.3359 f
Department of Politics                  [email protected] e
University of Virginia          faculty.virginia.edu/nwinter w
PO Box 400787, 100 Cabell Hall
Charlottesville, VA 22904

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: Create a flag variable for 10 most frequent values
  - From: "Cohen, Elan" <[email protected]>
- Re: st: AW: Create a flag variable for 10 most frequent values
  - From: Sergiy Radyakin <[email protected]>
- Re: st: AW: Create a flag variable for 10 most frequent values
  - From: Sergiy Radyakin <[email protected]>
- RE: st: AW: Create a flag variable for 10 most frequent values
  - From: "Martin Weiss" <[email protected]>
- Re: st: AW: Create a flag variable for 10 most frequent values
  - From: Nick Winter <[email protected]>
- RE: st: AW: Create a flag variable for 10 most frequent values
  - From: "Martin Weiss" <[email protected]>

Prev by Date: st: RE: Large Datasets Panel Data Logit Limits
Next by Date: Re: st: AW: Create a flag variable for 10 most frequent values
Previous by thread: RE: st: AW: Create a flag variable for 10 most frequent values
Next by thread: Re: st: AW: Create a flag variable for 10 most frequent values
Index(es):
- Date
- Thread