Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Ommit missing observations from sum, det?
From
Nick Cox <[email protected]>
To
[email protected]
Subject
Re: st: Ommit missing observations from sum, det?
Date
Mon, 9 Jul 2012 16:40:25 +0100
r(p99) can't be said to define a quartile.
That aside, Stata's fault here is that it is doing precisely what you asked.
Missing values (not observations; an observation in Stata is the
entire case, record, or row of your data) count as greater than any
non-missing value and so satisfy your inequality. This is very well
documented e.g.
FAQ . . . . . . . . . . . . . . . . Logical expressions and missing values
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . W. Gould
2/03 Why is x > 1000 true when x contains missing value?
http://www.stata.com/support/faqs/data/values.html
So either you need to add an extra condition to exclude missings
.... & initial_length < .
or (easier) just use -xtile- which automatically ignores missings.
. sysuse auto, clear
(1978 Automobile Data)
. xtile mpg_q = mpg, n(4)
. tab mpg_q
4 quantiles |
of mpg | Freq. Percent Cum.
------------+-----------------------------------
1 | 27 36.49 36.49
2 | 11 14.86 51.35
3 | 22 29.73 81.08
4 | 14 18.92 100.00
------------+-----------------------------------
Total | 74 100.00
. replace mpg = . in 1/5
(5 real changes made, 5 to missing)
. xtile mpg_q2 = mpg, n(4)
. tab mpg_q2
4 quantiles |
of mpg | Freq. Percent Cum.
------------+-----------------------------------
1 | 25 36.23 36.23
2 | 10 14.49 50.72
3 | 20 28.99 79.71
4 | 14 20.29 100.00
------------+-----------------------------------
Total | 69 100.00
. tab mpg_q2, missing
4 quantiles |
of mpg | Freq. Percent Cum.
------------+-----------------------------------
1 | 25 33.78 33.78
2 | 10 13.51 47.30
3 | 20 27.03 74.32
4 | 14 18.92 93.24
. | 5 6.76 100.00
------------+-----------------------------------
Total | 74 100.00
On Mon, Jul 9, 2012 at 4:27 PM, Benedikt Achatz
<[email protected]> wrote:
> I am trying to seperate my data into quartiles, doing it with this code:
>
> sum initial_length, det
> gen initial_length_q=1 if initial_length <=r(p25)
> replace initial_length_q=2 if initial_length >r(p25) & initial_length <= r(p50)
> replace initial_length_q=3 if initial_length >r(p50) & initial_length <= r(p75)
> replace initial_length_q=4 if initial_length >r(p75) & initial_length <= r(p99)
> replace initial_length_q=5 if initial_length >r(p99)
>
> The problem that reveals itself to me is that if there are missing
> observations, those get put in the 99% quartile. Is there any specific
> reason behind it, and does anyone know how I could work around that?
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/