Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: how to force cutpoint in xtile
From
Nick Cox <[email protected]>
To
[email protected]
Subject
Re: st: how to force cutpoint in xtile
Date
Tue, 26 Jun 2012 00:36:21 +0100
The Stata version you are using is immaterial here.
The over-arching problem (for you) is that -xtile- will not split
observed values and that it declares a boundary when the appropriate
cumulative percents (here 20(20)80 %) have been passed. With these
data that bites as very unequal class frequencies.
What you can do, given that algorithm is negate the variable and apply
-xtile- going the other way
. input value freq
value freq
1. 11 11
2. 12 4
3. 13 17
4. 14 37
5. 15 7
6. 16 27
7. 17 13
8. 18 5
9. 19 14
10. 20 11
11. 21 23
12. 27 16
13. end
. expand freq
(173 observations created)
. gen negvalue = -value
. xtile nQ5 = negvalue, nq(5)
. tab nQ5
5 quantiles |
of negvalue | Freq. Percent Cum.
------------+-----------------------------------
1 | 39 21.08 21.08
2 | 43 23.24 44.32
3 | 34 18.38 62.70
4 | 37 20.00 82.70
5 | 32 17.30 100.00
------------+-----------------------------------
Total | 185 100.00
. xtile Q5 = value, nq(5)
. tab Q5
5 quantiles |
of value | Freq. Percent Cum.
------------+-----------------------------------
1 | 69 37.30 37.30
2 | 7 3.78 41.08
3 | 40 21.62 62.70
4 | 53 28.65 91.35
5 | 16 8.65 100.00
------------+-----------------------------------
Total | 185 100.00
But why are you are doing this? The data are already in a small number
of discrete values. Quintiles force 21 and 27 together, which
underlines that you are throwing away important detail.
Nick
On Mon, Jun 25, 2012 at 10:21 PM, Skiles, Martha Priedeman
<[email protected]> wrote:
> I've used -xtile- in Stata 11 successfully, but am having difficulty with it in Stata 12. I have the following variable "S0D0_links" which I'd like to quintile (5 groups), but the -xtile- function is not creating groups where I would expect. Per below, I expected the first quintile to break at 17.3 cumulative percent rather than 37.3. Can I force the cutpoint to be either closest to my 20/40/60/80/100 quintiles or always <20/<40/<60/etc?
> I am able to force it by using "cumul" to generate a cumulative percent, and then write code using "ceil(5*cumpercent)" but I hope there's a better option. My preference is to have the cutpoint create quintiles as close to 20/40/60/etc as possible.
>
> Thank you,
> Martha Skiles
>
> LOG:
> tab S0D0_links
>
> S0D0_links | Freq. Percent Cum.
> ------------+-----------------------------------
> 11 | 11 5.95 5.95
> 12 | 4 2.16 8.11
> 13 | 17 9.19 17.30
> 14 | 37 20.00 37.30
> 15 | 7 3.78 41.08
> 16 | 27 14.59 55.68
> 17 | 13 7.03 62.70
> 18 | 5 2.70 65.41
> 19 | 14 7.57 72.97
> 20 | 11 5.95 78.92
> 21 | 23 12.43 91.35
> 27 | 16 8.65 100.00
> ------------+-----------------------------------
> Total | 185 100.00
>
> . xtile Q5=S0D0_links, nq(5)
>
> . tab Q5
>
> 5 quantiles |
> of |
> S0D0_links | Freq. Percent Cum.
> ------------+-----------------------------------
> 1 | 69 37.30 37.30
> 2 | 7 3.78 41.08
> 3 | 40 21.62 62.70
> 4 | 53 28.65 91.35
> 5 | 16 8.65 100.00
> ------------+-----------------------------------
> Total | 185 100.00
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/