Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: CDF plot with normal probability axis
From
Nick Cox <[email protected]>
To
"[email protected]" <[email protected]>
Subject
Re: st: CDF plot with normal probability axis
Date
Thu, 14 Nov 2013 12:54:53 +0000
Quite so. Choice of plotting position is discussed at some length in
the FAQ I mentioned, which links to the literature, and in
publications on my -qplot- and its predecessor -quantil2- in the SJ
and STB respectively.
Naturally I am all in favour of using a better method rather than a
weaker one, but I remain unconvinced that minute differences in
plotting position either are discernible on a graph or have impact on
decisions about data, so long as you are not working with very small
samples. If you are, nothing much helps.
Your bigger argument is with StataCorp, who wire i / (n + 1) into
-qnorm- and (i - 0.5) / _N into -quantile- and don't allow variations.
My quantile plotting programs have always allowed user choice of a in
(i - a) / (n - 2a + 1).
Nick
[email protected]
On 14 November 2013 12:17, David Hoaglin <[email protected]> wrote:
> Nick,
>
> For plotting positions, I prefer (i - (1/3))/(n + (1/3)). John Tukey
> introduced these after analyzing the sampling distributions of the
> order statistics in a sample of n from the uniform distribution on
> (0,1). The expression above is a good approximation for the median of
> the sampling distribution of the i-th order statistic in such a sample
> (a slight modification improves the approximation when i = 1 and i =
> n). In a Q-Q plot against a distribution with c.d.f. F, the plotting
> positions (from any definition) are transformed by F-inverse. Since
> monotonic transformations preserve medians, the transformed plotting
> positions are good approximations for the medians of the sampling
> distributions of the order statistics of a sample from the chosen
> distribution.
>
> David Hoaglin
>
> On Thu, Nov 14, 2013 at 4:21 AM, Nick Cox <[email protected]> wrote:
>> -distplot- (SJ), -cdfplot- (STB originally, SSC now): as always,
>> please explain the origin of the user-written commands you refer to.
>>
>> -qplot- (SJ) can do this, roughly.
>>
>> . sysuse auto
>> (1978 Automobile Data)
>>
>> . qplot turn trunk, trscale(invnormal(@))
>>
>> . qplot turn trunk, trscale(invnormal(@)) xtitle(standard normal
>> deviate) xla(-2/2)
>>
>> The axes are the other way round from what you ask; I'd argue that is
>> better practice, or at least consistent with -qnorm-. (-ysc(log)- is
>> also possible.)
>>
>> Note that you should not expect cumulative distribution plots to do
>> this by default as they usually plot cumulative probabilities as 1/n,
>> ..., n/n and -invormal(n/n)- is -invnormal(1)- and as such
>> indeteminate.
>>
>> But it is as easy to do this pretty much from first principles. See e.g.
>>
>> http://www.stata.com/support/faqs/statistics/percentile-ranks-and-plotting-positions/index.html
>>
>> http://www.stata-journal.com/sjpdf.html?articlenum=gr0027
>>
>> http://www.stata-journal.com/sjpdf.html?articlenum=gr0032
>>
>> I will cheat slightly and use -mylabels- (SSC).
>>
>> Here is some code. Any number of possible small variations should be evident.
>>
>> sysuse auto, clear
>>
>> replace price = price/1000
>>
>> foreach v in price mpg {
>> egen y`v' = rank(`v')
>> su `v', meanonly
>> replace y`v' = invnormal((y`v' - 0.5) / r(N))
>> label var y`v' "`: var label `v''"
>> }
>>
>> mylabels 1 5 10(10)90 95 99, myscale(invnormal(@/100)) local(labels)
>>
>> twoway connect yprice price, ms(Dh) sort || ///
>> connect ympg mpg, sort ms(Th) xsc(log) yla(`labels', ang(h)) xla(5 10 20 40) ///
>> ytitle(Cumulative percent)
>>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/