Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: uniform distribution

From	Nick Cox <[email protected]>
To	"[email protected]" <[email protected]>
Subject	Re: st: uniform distribution
Date	Sat, 9 Nov 2013 13:31:14 +0000

Let's take this more slowly. It looks like a case of answering a
poster's question when the real problem is otherwise.

1. I would be interested to learn of examples to the contrary, but the
hypothesis of a uniform distribution (unqualified) does not seem arise
naturally. In contrast, the hypothesis that a variable is uniform on
some interval [a, b] does arise and in that case a, b are known
constants that follow from the nature of the variable.

2. Panos wants to scale values by (value - max) / (max - min) to [0,1]
which amounts to arguing that the uniform being tested for has known
extremes, namely the sample extremes. That needs a story.

3. Panos wants to plug the scaled values into -invnormal()-. However,
-invnormal(0)- and -invnormal(1)- are indeterminate. Usually when
people plug in probabilities into -invnormal()- they ensure that the
arguments belong to (0,1), e.g. by using a recipe such as (rank - 0.5)
/ sample size.

4. Panos's examples are time series

MONTH  MS_COHO     UK_MS
Apri        396      62986
Aug        330      67503
Dec        342      65218
Feb        348   59491.83
Jan        379   65502.33
Jul        377    68214.5
Jun        368   65511.33
Mar        419   65112.17|
May        423   66152.34
Nov        328   65107.67
Oct        347   68344.16
Sep        356   67597.34

What these variables are is not made clear, but my guess is not the
problem is not about testing uniformity of distribution at all, but
about testing for seasonality, which is a quite different problem.
Ignoring the serial order is pointless in that case; it is a vital
part of the information.

5. Regardless of whether that guess about the real problem is correct,
Panos can't assume _independence_ of observations willy-nilly; that is
an assumption that has to be justified.

Whatever the answer to (4) a P-value from e.g. Shapiro-Wilk can't be
taken very seriously here because of the fudges involved in
translating the original problem to a quite different one.

Nick
[email protected]


On 9 November 2013 12:58, Nikos Kakouros <[email protected]> wrote:
> Fernando,
>
> That seems to work pretty well (did a run below).
> I'm not entirely sure why it should work though.
>
> Is it because the normal distribution in this case works as an
> approximation to the binomial distribution?
>
> Nikos
>
>
>
> set obs 50000
> gen test=runiform()
> sort test
> histogram test
> gen n_test=invnormal(test)
> histogram  n_test, normal
> swilk  n_test
>
>
>
> On Fri, Nov 8, 2013 at 3:58 PM, Fernando Rios Avila <[email protected]> wrote:
>> What about standardizing the variable toward an index from 0 to 1.
>> say:
>> sum mpg
>> gen mpg_s=(mpg-r(min))/(r(max)-r(min))
>> Transform it into a normal
>> gen n_mpg_s=invnormal(mpg_s)
>> and then make a normality test of this variable
>> sktest n_mpg_s
>> HTH
>> Fernando
>>
>> On Fri, Nov 8, 2013 at 3:53 PM, Nick Cox <[email protected]> wrote:
>>> -egen, count()- on a variable just puts a constant in a variable,
>>> namely the sum of non-missing values, which is useless for your
>>> purpose.
>>>
>>> The best test of uniformity is graphical: -quantile- by accident if
>>> not design yields the appropriate graph. Otherwise think of
>>> chi-square, Kolmogorov-Smirnov, etc.
>>>
>>> For "STATA" read "Stata".
>>>
>>> Nick
>>> [email protected]
>>>
>>>
>>> On 8 November 2013 18:09, PAPANIKOLAOU P. <[email protected]> wrote:
>>>
>>>> I am a fairly new user to STATA. I have got to check whether each of
>>>> these two variables (column  2: MS_COHO; column 3: UK_MS) follow the
>>>> uniform distribution.
>>>> For each for them, I used the following code, properly adjusted:
>>>>
>>>> egen n = count (mpg)  // use MS_COHO and UK_MS each time ... drop n i
>>>> surprisingly, the results were identical in both attempts, though the
>>>> script was applied to two different variables.
>>>> MONTH  MS_COHO     UK_MS
>>>> Apri        396      62986 |
>>>> Aug        330      67503 |
>>>> Dec        342      65218 |
>>>> Feb        348   59491.83 |
>>>> Jan        379   65502.33 |
>>>> Jul        377    68214.5 |
>>>> Jun        368   65511.33 |
>>>> Mar        419   65112.17 |
>>>> May        423   66152.34 |
>>>> Nov        328   65107.67 |
>>>> Oct        347   68344.16 |
>>>> Sep        356   67597.34
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>> *   http://www.ats.ucla.edu/stat/stata/
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: uniform distribution
  - From: Nikos Kakouros <[email protected]>

References:
- st: uniform distribution
  - From: "PAPANIKOLAOU P." <[email protected]>
- Re: st: uniform distribution
  - From: Nick Cox <[email protected]>
- Re: st: uniform distribution
  - From: Fernando Rios Avila <[email protected]>
- Re: st: uniform distribution
  - From: Nikos Kakouros <[email protected]>

Prev by Date: Re: st: uniform distribution
Next by Date: st: understanding weights in a -xtreg panel regression
Previous by thread: Re: st: uniform distribution
Next by thread: Re: st: uniform distribution
Index(es):
- Date
- Thread