Patrick Joly <[email protected]> points out what I think is a
problem in -pctile-. He talks about both -pctile- and -xtile-
and indicates the problem is with -xtile-. However, -xtile-
calls through to -pctile- and so I believe addressing the problem
in -pctile- will resolve the problems he mentions.
I am looking into a solution for the problem. Those of you not
interested in the details can stop reading.
The following code and listing as prompted by Patrick's email,
illustrates the problem.
. set obs 100
. gen nn = _n
. pctile x_p = nn, nq(100) genp(p)
. list
+-----------------+
| nn x_p p |
|-----------------|
1. | 1 1.5 1 |
2. | 2 2.5 2 |
3. | 3 3.5 3 |
4. | 4 4.5 4 |
5. | 5 5.5 5 |
|-----------------|
6. | 6 6.5 6 |
7. | 7 8 7 |
8. | 8 8.5 8 |
9. | 9 9.5 9 |
10. | 10 10.5 10 |
<cut>
52. | 52 52.5 52 |
53. | 53 53.5 53 |
54. | 54 54.5 54 |
55. | 55 56 55 |
|-----------------|
56. | 56 57 56 |
57. | 57 57 57 |
58. | 58 58 58 |
59. | 59 59.5 59 |
60. | 60 60.5 60 |
<cut>
I looked at all 100 lines of output, and it appears that -pctile-
is not doing what it should for observations 7, 55, 56, 57, and
58. The others were okay.
-pctile-, which is implemented in ado code, calls to -_pctile-,
which is internal code, to do the final work. -_pctile- has a
limit of 20 percentiles allowed in any one call. -pctile- makes
multiple calls into -_pctile- as needed to get all the
percentiles of interest.
Here is what -_pctile- produces for these problem observations
(and some of those around it).
. _pctile nn , p(5 6 7 54 55 56 57 58 59)
. ret list
scalars:
r(r1) = 5.5
r(r2) = 6.5
r(r3) = 7.5
r(r4) = 54.5
r(r5) = 55.5
r(r6) = 56.5
r(r7) = 57.5
r(r8) = 58.5
r(r9) = 59.5
Notice that -_pctile- gets what we expect. The problem is not
with -_pctile-.
I looked at pctile.ado and I know what the problem is. The lines
local pct = 100*`i'/`nquanti'
if "`plist'"=="" { local plist "`pct'" }
else local plist "`plist',`pct'"
are building up a list of up to 20 percentiles that will then be
passed into the -percentiles()- option of -_pctile-.
The local macro that is holding the list does not contain double
precision numbers. Instead of calling something like
_pctile nn , percentiles(7)
pctile ends up calling something like
_pctile nn , percentiles(7.000000000000001)
My first couple of attempts at a patch for this problem have not
been successful. I tried
tempname prct
...
scalar `prct' = 100*`i'/`nquanti'
if "`plist'"=="" { local plist "`prct'" }
else local plist "`plist',`prct'"
and also
tempname prct
...
scalar `prct' = 100*`i'/`nquanti'
local pct : di %22.20f `prct'
if "`plist'"=="" { local plist "`pct'" }
else local plist "`plist',`pct'"
etc., but it did not solve the problem. I will investigate other
solutions. When I find one, I will apply it in an update to
Stata.
Ken Higbee [email protected]
StataCorp 1-800-STATAPC
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/