Rod Hunter
> I still don't appear to be out of the woods.
>
> 1.
> I applied Nick's procedure to my example:
> . centile var1 , c(75)
> --------------------------------------------------------------------
> -- Binom. Interp. --
> Variable Obs Percentile Centile [95% Conf. Interval]
>
> var1 4 75 9.75 7.45 10*
>
> * Lower (upper) confidence limit held at minimum
> (maximum) of sample
> -----------------------------------------------------------
> . ret li
> ------------------------------------------------
> scalars:
> r(n_cent) = 1
> r(N) = 4
> r(ub_1) = 10
> r(lb_1) = 7.450000000008672
> r(c_1) = 9.75
> macros:
> r(centiles) : "75"
> -------------------------------------------------
> . list var1 if var1 == `r(c_1)'
> -----------------------------------------------
> var1
> .
> -----------------------------------------------
> This does not give me the identity nor value of the largest
> observation
> within the percentage of the observations included by the
> 75th percentile.
>
> 2.
> I tried Ronnie's approach:
>
> . sum var1, d
> -------------------------------------------
> var1
> Percentiles Smallest
> 1% 7 7
> 5% 7 8
> 10% 7 9 Obs 4
> 25% 7.5 10 Sum of Wgt. 4
>
> 50% 8.5 Mean 8.5
> Largest Std. Dev. 1.290994
> 75% 9.5 7
> 90% 10 8 Variance 1.666667
> 95% 10 9 Skewness 0
> 99% 10 10 Kurtosis 1.64
> ---------------------------------------------------
> . local macro*
> . list var1 if var1 == `p75'
> -------------------------------------------------
> var1
> -------------------------------------------------
>
> 3.
> This is what I have been using:
>
> . _pctile var1,p(75)
> . gen var1_p75=r(r1)
> . tabstat var1 if var1==var1_p75, stat(max)
> -------------------------------------
> no observations
> -------------------------------------
> alternatively:
> . list var1 if var1==var1_p75
> ---------------------------------------
> var1
> ---------------------------------------
Thinking about this further, there are two possible issues here:
1. if a percentile is ever calculated by interpolation
between neighbouring order statistics, then no value is ever equal to
it.
2. the precision problem whereby binary <-> decimal
operations need care.
Try this. It seems a bit complicated for the problem;
someone may have a simpler solution.
program def nearpctile, rclass
version 7
syntax varname [if] [in], Percentiles(numlist)
marksample touse
_pctile `varlist' if `touse', p(`percentiles')
tempvar id diff
tempname whichobs
gen long `id' = _n
qui {
gen double `diff' = /*
*/ abs(`varlist' - r(r1)) if `touse'
su `diff', meanonly
tab `id' if `diff' == r(min), matrow(`whichobs')
local n = r(r)
forval i = 1 / `n' {
local ob = `whichobs'[`i',1]
local obs "`obs'`ob' "
}
}
di as txt "`obs'"
return local obs "`obs'"
end
Example:
. _pctile mpg, p(75)
. ret li
scalars:
r(r1) = 25
. l mpg if mpg == 25
+-----+
| mpg |
|-----|
44. | 25 |
55. | 25 |
61. | 25 |
72. | 25 |
73. | 25 |
+-----+
. nearpctile mpg, p(75)
44 55 61 72 73
. ret li
macros:
r(obs) : "44 55 61 72 73 "
This also seems to work in messier problems,
e.g. 75th percentile of gear_ratio.
Nick
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/