Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: RE: limitations of "generate" with missing data
From
"Fernando Rios Avila" <[email protected]>
To
<[email protected]>
Subject
RE: st: RE: limitations of "generate" with missing data
Date
Mon, 11 Apr 2011 18:19:17 -0400
Apologies, it was a typo
> set obs 1000
> gen r=runiform()
> replace r=. if runiform()>.5
> gen r2=r>0.7 if r!=.
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Nick Cox
Sent: Monday, April 11, 2011 6:18 PM
To: [email protected]
Subject: Re: st: RE: limitations of "generate" with missing data
What is r1?
On Mon, Apr 11, 2011 at 11:09 PM, Fernando Rios Avila <[email protected]>
wrote:
> Hi Michael,
> The limitation is not with generate. But rather with the way u are
> creating your dummy variable I think this should do the trick
>
> set obs 1000
> gen r=runiform()
> replace r=. if runiform()>.5
> gen r2=r>0.7 if r1!=.
>
>
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Michael
> Costello
> Sent: Monday, April 11, 2011 6:01 PM
> To: statalist
> Subject: st: limitations of "generate" with missing data
>
> Statalisters,
>
> I recently ran into a problem with the following dataset:
>
> . tab gread_comp_score_pcnt, m
> gread_comp_ |
> score_pcnt | Freq. Percent Cum.
> ------------+-----------------------------------
> 0 | 150 7.50 7.50
> .2 | 85 4.25 11.75
> .4 | 97 4.85 16.60
> .6 | 82 4.10 20.70
> .8 | 72 3.60 24.30
> 1 | 15 0.75 25.05
> . | 1,499 74.95 100.00
> ------------+-----------------------------------
> Total | 2,000 100.00
>
> The high number of "missing" is by design, a by-product of a
> horizontally structured dataset that I have yet to rectify.
>
> When I run the command:
> gen gread_comp_score_pcnt80= (gread_comp_score_pcnt>.79) I am left
> with
>
> . tab gread_comp_score_pcnt80, m
> gread_comp_ |
> score_pcnt8 |
> 0 | Freq. Percent Cum.
> ------------+-----------------------------------
> 0 | 414 20.70 20.70
> 1 | 1,586 79.30 100.00
> ------------+-----------------------------------
> Total | 2,000 100.00
>
> As you can see, the 87 values above .79 were set to 1, but so were all
> the missing values!! I have toyed with the code a bit, trying
> variations such as . gen gread_comp_score_pcnt80=
> (gread_comp_score_pcnt>.79 &
> gread_comp_score_pcnt!=.)
> but that converts all the missing to 0's, which is only marginally better.
>
> So the question is, is there some way to use a single, precise line of
> code to create eighty-seven 1's, four hundred fourteen 0's and 1499
> Missing values in one dummy variable? I know I can do it with several
> lines of code, but I'm looking for something more concise, as it needs
> to run many hundreds of times.
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/