Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: ladder question for right-skewed variable
From
Gabriel Nelson <[email protected]>
To
statalist <[email protected]>
Subject
Re: st: ladder question for right-skewed variable
Date
Fri, 26 Apr 2013 13:57:16 -0700
Thanks very much for your suggestions Nick. It makes sense that the
problem might lie within -sktest-. I won't worry any more about this
problem and just proceed with the qnorm command, as you suggested.
Thanks again.
Gabriel
On Fri, Apr 26, 2013 at 11:45 AM, Nick Cox <[email protected]> wrote:
> Three assertions based on a mix of experience and prejudice:
>
> 1. The best way to check for normality is with -qnorm-. Even if
> normality is not your reference case, asymmetry will show up clearly
> on a -qnorm- graph.
>
> 2. 90% of the time, choosing transformations boils down to whether
> three possible transformations are any use, root, logarithm or
> reciprocal.
>
> 3. So, do-it-yourself is easy:
>
> gen rtmyvar = sqrt(myvar)
> gen logmyvar = log(myvar)
> gen recmyvar = 1/myvar
>
> qnorm myvar, name(a)
> qnorm rtmyvar, name(b)
> qnorm logmyvar, name(c)
> qnorm recmyvar, name(d)
>
> Not universally known fact: Giving a name to a graph means that it
> sticks around until _you_ close it. So, you have four graphs on your
> monitor. Arrange them with your mouse so you can compare. Usually it's
> easy to pick what works best, without any formal machinery.
>
> (Yes, I know about -gladder-, but this is simpler in practice.)
>
>
> Nick
> [email protected]
>
>
> On 26 April 2013 19:20, Nick Cox <[email protected]> wrote:
>> Just to underline that kurtosis in your variable was calculated by
>> -summarize- 108. That's BIG. No wonder -sktest- can't cope.
>> Nick
>> [email protected]
>>
>>
>> On 26 April 2013 19:17, Nick Cox <[email protected]> wrote:
>>> That's not quite "no transformations appeared in the output" as
>>> -ladder- is signalling P-values for some cases.
>>>
>>> But I readily agree that -ladder- is not doing a good job here at all.
>>>
>>> In fact, I am now reminded of evident -ladder- problems shown in a
>>> recent thread starting at
>>> http://www.stata.com/statalist/archive/2013-02/msg00862.html
>>>
>>> I can't find a public email, even though I thought I posted on this,
>>> but my impression from looking at the code is that -ladder- is
>>> essentially fragile. The real problem here is within -sktest-. It can
>>> break down, it seems, for large sample sizes and/or large deviations
>>> from Gaussianity. Then it bounces back missings.
>>>
>>> I think you just need to abandon -ladder-. It's not essential. You
>>> don't need _any_ test to tell you that some transformation will help
>>> if the goal is to reduce asymmetry, and there are only a few credible
>>> alternatives.
>>>
>>> As David and I pointed out, log transformation should work quite well
>>> for your data,
>>>
>>> but but but: (my suggestion; David may not agree) why transform at
>>> all? Your solutions start with -poisson- (or, for consenting adults,
>>> -nbreg-).
>>>
>>> BTW, -ladder- is a command, not a function, and in Stata ne'er the
>>> twain shall meet.
>>>
>>> Nick
>>> [email protected]
>>>
>>>
>>> On 26 April 2013 18:55, Gabriel Nelson <[email protected]> wrote:
>>>> Thanks Nick, yes exactly, my question is why the ladder function fails
>>>> to provide any chi-square values here. I'll attach the Stata output
>>>> here:
>>>>
>>>> . ladder disp_2000
>>>>
>>>> Transformation formula chi2(2) P(chi2)
>>>> ------------------------------------------------------------------
>>>> cubic dis~2000^3 . .
>>>> square dis~2000^2 . .
>>>> identity dis~2000 . .
>>>> square root sqrt(dis~2000) . 0.000
>>>> log log(dis~2000) . 0.000
>>>> 1/(square root) 1/sqrt(dis~2000) . 0.000
>>>> inverse 1/dis~2000 . 0.000
>>>> 1/square 1/(dis~2000^2) . 0.000
>>>> 1/cubic 1/(dis~2000^3) . 0.000
>>>>
>>>> . sum disp_2000, detail
>>>>
>>>> Number displaced 2000 (if data unavailable go up
>>>> to 2003
>>>> -------------------------------------------------------------
>>>> Percentiles Smallest
>>>> 1% 1 1
>>>> 5% 2 1
>>>> 10% 3 1 Obs 1010
>>>> 25% 6 1 Sum of Wgt. 1010
>>>>
>>>> 50% 15.5 Mean 281.5297
>>>> Largest Std. Dev. 1217.168
>>>> 75% 82 9421
>>>> 90% 436.5 9505 Variance 1481497
>>>> 95% 1251 16255 Skewness 9.012044
>>>> 99% 5953 19569 Kurtosis 108.8061
>>>>
>>>> On Fri, Apr 26, 2013 at 10:47 AM, Nick Cox <[email protected]> wrote:
>>>>> Please see my answers too. You have still not given the exact -ladder-
>>>>> command you used or its output, so it is really difficult to know what
>>>>> is going on.
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
--
Gabriel Nelson
Doctoral Candidate
Dept. of Sociology
University of California- Los Angeles
http://www.soc.ucla.edu/people/graduate-student?lid=4344
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/