Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Nick Cox <njcoxstata@gmail.com> |
To | "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |
Subject | Re: st: generating dummy variables based on freq of duplicate values |
Date | Tue, 20 Aug 2013 10:16:26 +0100 |
Or bys patientid: replace highfreq= _N>4 Nick njcoxstata@gmail.com On 20 August 2013 10:11, Willard van Ooij <w.van.ooij@marktmonitor.com> wrote: > I may be missing something, but isn't this solution much easier? > > gen highfreq=0 > bys patientid: replace highfreq=1 if _N>4 > > But this only works if Yerik want just 2 groups, a high and low frequency group. Eric A. Booth > Take a look at -help egen-, particularly the cut() function. Here's one way to get what you are asking about: > > ********************!begin example > clear > set obs 500 > g patientid = trunc(runiform()*50) > > bys patientid: egen freq = count(patientid) su freq > > egen freqcat = cut(freq), at(0 4 10 30) lab ta freqcat, miss > > ta freqcat, g(cat_) > > su cat_? > ********************!end example > On Mon, Aug 19, 2013 at 8:55 PM, Yerik Kaslow <yerik.kaslow@gmail.com> wrote: >> I am working w a dataset for clinical trials. My data has patient IDs >> which often repeat; everytime they participate in a trial, they are >> recorded. I want to group the patient IDs into high frequency and low >> frequency participants, based on the frequency they are involved with >> the clinical trials. I am trying to write syntax to create a dummy >> variable based on frequency of duplicate patientIDs. >> >> EG: >> Patient ID 6523 appears 2 times >> Patient ID 7634 appears 10 times >> Patient ID 8798 appears 4 times >> Patient ID 9032 appears 21 times >> >> I would like to write syntax such that any patient ID with a frequency >> of <= 4 (or any other value I choose) is assigned value of 0...low >> frequency patient in this case. Likewise, any patient ID with a >> frequency of >=5 is assigned a value of 1...high frequency patient. >> >> How would I write syntax to say, assign a value of 1/0 based on the >> number of the same patient IDs in the data? * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/