Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: limitations of "generate" with missing data


From   Michael Costello <[email protected]>
To   statalist <[email protected]>
Subject   st: limitations of "generate" with missing data
Date   Mon, 11 Apr 2011 18:01:14 -0400

Statalisters,

I recently ran into a problem with the following dataset:

. tab  gread_comp_score_pcnt, m
gread_comp_ |
 score_pcnt |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |        150        7.50        7.50
         .2 |         85        4.25       11.75
         .4 |         97        4.85       16.60
         .6 |         82        4.10       20.70
         .8 |         72        3.60       24.30
          1 |         15        0.75       25.05
          . |      1,499       74.95      100.00
------------+-----------------------------------
      Total |      2,000      100.00

The high number of "missing" is by design, a by-product of a
horizontally structured dataset that I have yet to rectify.

When I run the command:
gen gread_comp_score_pcnt80= (gread_comp_score_pcnt>.79)
I am left with

. tab  gread_comp_score_pcnt80, m
gread_comp_ |
score_pcnt8 |
          0 |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |        414       20.70       20.70
          1 |      1,586       79.30      100.00
------------+-----------------------------------
      Total |      2,000      100.00

As you can see, the 87 values above .79 were set to 1, but so were all
the missing values!!  I have toyed with the code a bit, trying
variations such as
. gen gread_comp_score_pcnt80= (gread_comp_score_pcnt>.79 &
gread_comp_score_pcnt!=.)
but that converts all the missing to 0's, which is only marginally better.

So the question is, is there some way to use a single, precise line of
code to create eighty-seven 1's, four hundred fourteen  0's and 1499
Missing values in one dummy variable?  I know I can do it with several
lines of code, but I'm looking for something more concise, as it needs
to run many hundreds of times.

Thanks for your help,
-Michael
--
Michael Costello
MS Candidate, Statistics 2011
202-246-1627
Linked In
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index