Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: uniform distribution
From
Nick Cox <[email protected]>
To
"[email protected]" <[email protected]>
Subject
Re: st: uniform distribution
Date
Sat, 9 Nov 2013 13:31:14 +0000
Let's take this more slowly. It looks like a case of answering a
poster's question when the real problem is otherwise.
1. I would be interested to learn of examples to the contrary, but the
hypothesis of a uniform distribution (unqualified) does not seem arise
naturally. In contrast, the hypothesis that a variable is uniform on
some interval [a, b] does arise and in that case a, b are known
constants that follow from the nature of the variable.
2. Panos wants to scale values by (value - max) / (max - min) to [0,1]
which amounts to arguing that the uniform being tested for has known
extremes, namely the sample extremes. That needs a story.
3. Panos wants to plug the scaled values into -invnormal()-. However,
-invnormal(0)- and -invnormal(1)- are indeterminate. Usually when
people plug in probabilities into -invnormal()- they ensure that the
arguments belong to (0,1), e.g. by using a recipe such as (rank - 0.5)
/ sample size.
4. Panos's examples are time series
MONTH MS_COHO UK_MS
Apri 396 62986
Aug 330 67503
Dec 342 65218
Feb 348 59491.83
Jan 379 65502.33
Jul 377 68214.5
Jun 368 65511.33
Mar 419 65112.17|
May 423 66152.34
Nov 328 65107.67
Oct 347 68344.16
Sep 356 67597.34
What these variables are is not made clear, but my guess is not the
problem is not about testing uniformity of distribution at all, but
about testing for seasonality, which is a quite different problem.
Ignoring the serial order is pointless in that case; it is a vital
part of the information.
5. Regardless of whether that guess about the real problem is correct,
Panos can't assume _independence_ of observations willy-nilly; that is
an assumption that has to be justified.
Whatever the answer to (4) a P-value from e.g. Shapiro-Wilk can't be
taken very seriously here because of the fudges involved in
translating the original problem to a quite different one.
Nick
[email protected]
On 9 November 2013 12:58, Nikos Kakouros <[email protected]> wrote:
> Fernando,
>
> That seems to work pretty well (did a run below).
> I'm not entirely sure why it should work though.
>
> Is it because the normal distribution in this case works as an
> approximation to the binomial distribution?
>
> Nikos
>
>
>
> set obs 50000
> gen test=runiform()
> sort test
> histogram test
> gen n_test=invnormal(test)
> histogram n_test, normal
> swilk n_test
>
>
>
> On Fri, Nov 8, 2013 at 3:58 PM, Fernando Rios Avila <[email protected]> wrote:
>> What about standardizing the variable toward an index from 0 to 1.
>> say:
>> sum mpg
>> gen mpg_s=(mpg-r(min))/(r(max)-r(min))
>> Transform it into a normal
>> gen n_mpg_s=invnormal(mpg_s)
>> and then make a normality test of this variable
>> sktest n_mpg_s
>> HTH
>> Fernando
>>
>> On Fri, Nov 8, 2013 at 3:53 PM, Nick Cox <[email protected]> wrote:
>>> -egen, count()- on a variable just puts a constant in a variable,
>>> namely the sum of non-missing values, which is useless for your
>>> purpose.
>>>
>>> The best test of uniformity is graphical: -quantile- by accident if
>>> not design yields the appropriate graph. Otherwise think of
>>> chi-square, Kolmogorov-Smirnov, etc.
>>>
>>> For "STATA" read "Stata".
>>>
>>> Nick
>>> [email protected]
>>>
>>>
>>> On 8 November 2013 18:09, PAPANIKOLAOU P. <[email protected]> wrote:
>>>
>>>> I am a fairly new user to STATA. I have got to check whether each of
>>>> these two variables (column 2: MS_COHO; column 3: UK_MS) follow the
>>>> uniform distribution.
>>>> For each for them, I used the following code, properly adjusted:
>>>>
>>>> egen n = count (mpg) // use MS_COHO and UK_MS each time ... drop n i
>>>> surprisingly, the results were identical in both attempts, though the
>>>> script was applied to two different variables.
>>>> MONTH MS_COHO UK_MS
>>>> Apri 396 62986 |
>>>> Aug 330 67503 |
>>>> Dec 342 65218 |
>>>> Feb 348 59491.83 |
>>>> Jan 379 65502.33 |
>>>> Jul 377 68214.5 |
>>>> Jun 368 65511.33 |
>>>> Mar 419 65112.17 |
>>>> May 423 66152.34 |
>>>> Nov 328 65107.67 |
>>>> Oct 347 68344.16 |
>>>> Sep 356 67597.34
>>> *
>>> * For searches and help try:
>>> * http://www.stata.com/help.cgi?search
>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>> * http://www.ats.ucla.edu/stat/stata/
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>> * http://www.ats.ucla.edu/stat/stata/
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/