Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: FW: st: uniform distribution

From	Nick Cox <[email protected]>
To	"[email protected]" <[email protected]>
Subject	Re: FW: st: uniform distribution
Date	Sat, 9 Nov 2013 15:44:27 +0000

I think this was suggested by Nikos Kakouros before he read my
comments. Either way, it was not suggested by me (Nick Cox, a
different contributor to the list) and I don't endorse it. In case
it's not clear, I consider this approach to be incorrect for the
reasons I identified earlier today.
Nick
[email protected]


On 9 November 2013 15:37, PAPANIKOLAOU P. <[email protected]> wrote:
> Dear All,
> Thank you so much to you all for providing interesting views regarding
> checking whether  the data follow the uniform distribution.
> Following through the discussion, I have noticed that Nick has put
> forward a script alongside these lines, modified to my case, which is
> presented just now.
>
> sum mpg
> gen mpg_s=(mpg-r(min)) / (r(max)-r(min)) * transform the variable into a
> normal, AND what r stands for?
> gen nick_recipe = (rank-0.5) / N  * CREATE the variable that Nick
> suggests that the data should be weighted by rank-0.5 to ensure that
> they will cause indeterminate values at the zero and one in the inverse
> normal
> gen rank_mpg_s = mpg_s / nick_recipe * weigh the data by the variable
> suggested by Nick
> gen n_mpg_s = invnormal(rank_mpg_s) * take the inverse normal of this
> adjusted variable and use this VARIABLE for testing the normality
> assumption below
> sktest n_mpg_s HTH * WHAT HTH- that Nick wrote - stands for ?
>
> Through this script, the sktest would provide valid statistical evidence
> in favour independence of observations?
>  In my case, I have got 2 variables, by running the above test, how
> would this script, if correct, ensure that it covers the independence
> assumption between the TWO variables?
>
> I would appreciate your input.
> Many thanks
> Panos
>
>
>
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Nikos
> Kakouros
> Sent: 09 November 2013 14:15
> To: [email protected]
> Subject: Re: st: uniform distribution
>
> David,
>
> Thanks! That is a very neat property.
> Of course, I had to see it in action...  ;-) set obs 50000 gen
> nnorm=rnormal(0,1) gen n_nnorm=normal(nnorm) histogram n_nnorm
>
> n_norm looks pretty uniform ;-)
>
> So it it starts non-uniform it will end up not quite so normal the other
> way around. I wonder however whether a test for a departure from
> normality for the Finv(U) can really accurately test for U's departure
> from uniformity. Will the p's be accurate?
>
> Nick Cox has, of course, in the meantime questioned the entire
> applicability of uniform distribution testing given the nature of the
> originally presented data (time series).
>
> Many thanks for explaining this nice property!
>
> Nikos
>
> On Sat, Nov 9, 2013 at 8:43 AM, David Hoaglin <[email protected]>
> wrote:
>> Nikos,
>>
>> No approximation to the binomial distribution is involved.
>>
>> The approach uses a basic property of (continuous) probability
>> distributions.  If X is an observation from a distribution whose
>> cumulative distribution function (c.d.f.) is F, then U = F(X) has a
>> uniform(0,1) distribution.  This is, I am transforming X by using the
>> c.d.f. of its own distribution.  This holds for any continuous
>> distribution, not just the normal distribution.
>>
>> The reverse of the above process starts with an observation U from
>> uniform(0,1) and transforms it by the inverse of the c.d.f. of the
>> particular distribution (call it Finv).  Then X = Finv(U) is an
>> observation from the particular distribution.  This is what Fernando
>> suggested.  Of course, he did not assume that, when compressed onto
>> the interval [0,1], mpg would have a uniform distribution.  The idea
>> is that a departure from uniformity will show up as a departure from
>> normality after transforming the uniformized data by invnorm.  A
>> little problem may arise at the ends of the interval, though:
>> theoretically, invnorm(0) = minus infinity and invnorm(1) = infinity.
>>
>> People often make "probability plots" and handle that problem by using
>
>> "plotting positions" that do not go quite as low as 0 or as high as 1.
>>  In making a probability plot (or "quantile-quantile plot") for a
>> sample of n observations vs. the uniform distribution, I would do the
>> following:
>> 1. Sort the observations from smallest to largest, index those with i
>> = 1 through i = n, and denote them by x(1), ..., x(n).
>> 2. Calculate the corresponding plotting positions from the formula
>> pp(i) = (i - (1/3))/(n + (1/3)).
>> 3. Make a scatterplot of the points (pp(i), x(i)).
>> 4. Assess departures from uniformity by comparing the pattern in that
>> plot against a straight line.
>> 5. To get a feel for how such plots look when the data are actually
>> uniform, simulate a number of samples of n from the uniform(0,1)
>> distribution and make that plot for each sample.
>> (Quantile-quantile plots for non-uniform distributions use the same
>> approach.  They use Finv(pp(i)) as horizontal coordinate of the plot.)
>>
>> David Hoaglin
>>
>> On Sat, Nov 9, 2013 at 7:58 AM, Nikos Kakouros <[email protected]>
> wrote:
>>> Fernando,
>>>
>>> That seems to work pretty well (did a run below).
>>> I'm not entirely sure why it should work though.
>>>
>>> Is it because the normal distribution in this case works as an
>>> approximation to the binomial distribution?
>>>
>>> Nikos
>>>
>>>
>>>
>>> set obs 50000
>>> gen test=runiform()
>>> sort test
>>> histogram test
>>> gen n_test=invnormal(test)
>>> histogram  n_test, normal
>>> swilk  n_test
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: FW: st: uniform distribution
  - From: Nikos Kakouros <[email protected]>

References:
- st: uniform distribution
  - From: "PAPANIKOLAOU P." <[email protected]>
- Re: st: uniform distribution
  - From: Nick Cox <[email protected]>
- Re: st: uniform distribution
  - From: Fernando Rios Avila <[email protected]>
- Re: st: uniform distribution
  - From: Nikos Kakouros <[email protected]>
- Re: st: uniform distribution
  - From: David Hoaglin <[email protected]>
- Re: st: uniform distribution
  - From: Nikos Kakouros <[email protected]>
- RE: st: uniform distribution
  - From: "PAPANIKOLAOU P." <[email protected]>
- FW: st: uniform distribution
  - From: "PAPANIKOLAOU P." <[email protected]>

Prev by Date: FW: st: uniform distribution
Next by Date: st: FORVAL loop incomplete
Previous by thread: FW: st: uniform distribution
Next by thread: Re: FW: st: uniform distribution
Index(es):
- Date
- Thread