Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: generating dummy variables based on freq of duplicate values
From
Willard van Ooij <[email protected]>
To
"[email protected]" <[email protected]>
Subject
RE: st: generating dummy variables based on freq of duplicate values
Date
Tue, 20 Aug 2013 11:11:38 +0200
I may be missing something, but isn't this solution much easier?
gen highfreq=0
bys patientid: replace highfreq=1 if _N>4
But this only works if Yerik want just 2 groups, a high and low frequency group.
Willard
-----Oorspronkelijk bericht-----
Van: [email protected] [mailto:[email protected]] Namens Eric A. Booth
Verzonden: dinsdag 20 augustus 2013 6:29
Aan: [email protected]
Onderwerp: Re: st: generating dummy variables based on freq of duplicate values
<>
Take a look at -help egen-, particularly the cut() function. Here's one way to get what you are asking about:
********************!begin example
clear
set obs 500
g patientid = trunc(runiform()*50)
bys patientid: egen freq = count(patientid) su freq
egen freqcat = cut(freq), at(0 4 10 30) lab ta freqcat, miss
ta freqcat, g(cat_)
su cat_?
********************!end example
- Eric
On Mon, Aug 19, 2013 at 8:55 PM, Yerik Kaslow <[email protected]> wrote:
> Hello Statalist,
>
> I apologize if this email already got sent out. I sent it w the word
> help in the first line so it may have bounced, I am sorry if this is a
> duplicate email. I'm not trying to spam the listserv, I'm just new at
> this.
>
> I am working w a dataset for clinical trials. My data has patient IDs
> which often repeat; everytime they participate in a trial, they are
> recorded. I want to group the patient IDs into high frequency and low
> frequency participants, based on the frequency they are involved with
> the clinical trials. I am trying to write syntax to create a dummy
> variable based on frequency of duplicate patientIDs.
>
> EG:
> Patient ID 6523 appears 2 times
> Patient ID 7634 appears 10 times
> Patient ID 8798 appears 4 times
> Patient ID 9032 appears 21 times
>
> I would like to write syntax such that any patient ID with a frequency
> of <= 4 (or any other value I choose) is assigned value of 0...low
> frequency patient in this case. Likewise, any patient ID with a
> frequency of >=5 is assigned a value of 1...high frequency patient.
>
> How would I write syntax to say, assign a value of 1/0 based on the
> number of the same patient IDs in the data?
>
> Thank you,
>
> Yerik
>
> --
> Yerik Kaslow
> [email protected]
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/