Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Ommit missing observations from sum, det?

From	Nick Cox <[email protected]>
To	[email protected]
Subject	Re: st: Ommit missing observations from sum, det?
Date	Mon, 9 Jul 2012 16:40:25 +0100

r(p99) can't be said to define a quartile.

That aside, Stata's fault here is that it is doing precisely what you asked.

Missing values (not observations; an observation in Stata is the
entire case, record, or row of your data) count as greater than any
non-missing value and so satisfy your inequality. This is very well
documented e.g.

FAQ     . . . . . . . . . . . . . . . . Logical expressions and missing values
        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . W. Gould
        2/03    Why is x > 1000 true when x contains missing value?
                http://www.stata.com/support/faqs/data/values.html


So either you need to add an extra condition to exclude missings

.... & initial_length < .

or (easier) just use -xtile- which automatically ignores missings.

. sysuse auto, clear
(1978 Automobile Data)

. xtile mpg_q = mpg, n(4)

. tab mpg_q

4 quantiles |
     of mpg |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |         27       36.49       36.49
          2 |         11       14.86       51.35
          3 |         22       29.73       81.08
          4 |         14       18.92      100.00
------------+-----------------------------------
      Total |         74      100.00

. replace mpg = . in 1/5
(5 real changes made, 5 to missing)

. xtile mpg_q2 = mpg, n(4)

. tab mpg_q2

4 quantiles |
     of mpg |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |         25       36.23       36.23
          2 |         10       14.49       50.72
          3 |         20       28.99       79.71
          4 |         14       20.29      100.00
------------+-----------------------------------
      Total |         69      100.00

. tab mpg_q2, missing

4 quantiles |
     of mpg |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |         25       33.78       33.78
          2 |         10       13.51       47.30
          3 |         20       27.03       74.32
          4 |         14       18.92       93.24
          . |          5        6.76      100.00
------------+-----------------------------------
      Total |         74      100.00


On Mon, Jul 9, 2012 at 4:27 PM, Benedikt Achatz
<[email protected]> wrote:
> I am trying to seperate my data into quartiles, doing it with this code:
>
> sum initial_length, det
> gen initial_length_q=1 if initial_length <=r(p25)
> replace initial_length_q=2 if initial_length >r(p25) & initial_length <= r(p50)
> replace initial_length_q=3 if initial_length >r(p50) & initial_length <= r(p75)
> replace initial_length_q=4 if initial_length >r(p75) & initial_length <= r(p99)
> replace initial_length_q=5 if initial_length >r(p99)
>
> The problem that reveals itself to me is that if there are missing
> observations, those get put in the 99% quartile. Is there any specific
> reason behind it, and does anyone know how I could work around that?
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: Ommit missing observations from sum, det?
  - From: Benedikt Achatz <[email protected]>

Prev by Date: Re: st: Inexplicably missing values for lagged variables
Next by Date: Re: st: Inexplicably missing values for lagged variables
Previous by thread: st: Ommit missing observations from sum, det?
Index(es):
- Date
- Thread