Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: Right skewed (positive) dependent variable
From
"Lachenbruch, Peter" <[email protected]>
To
"'[email protected]'" <[email protected]>
Subject
RE: st: Right skewed (positive) dependent variable
Date
Thu, 10 Jun 2010 11:35:38 -0700
Agreed. The point of the email was that outliers can affect the ladder routine as well as BoxCox. If I really was concerned about something like this, I'd consider a logit transformation or some such.
One doesn't want to use a bulldozer to plant a daisy...
Tony
Peter A. Lachenbruch
Department of Public Health
Oregon State University
Corvallis, OR 97330
Phone: 541-737-3832
FAX: 541-737-4001
-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Nick Cox
Sent: Thursday, June 10, 2010 9:15 AM
To: [email protected]
Subject: RE: st: Right skewed (positive) dependent variable
In principle this variable looks bounded by 0 and 100.
In practice, that may not bite here, or perhaps even often, but it's
important to note that -ladder- and friends have no intelligence to
detect bounded variables and no scope for doing something special with
such variables.
Nick
[email protected]
Lachenbruch, Peter
There is also the issue of the effect of outliers on ladder or boxcox.
I just had my class grades obtained. Here are some results
totalscore
-------------------------------------------------------------
Percentiles Smallest
1% 26.74 26.74
5% 41.942 35.95
10% 49.298 38.95 Obs 66
25% 53.424 41.942 Sum of Wgt. 66
50% 61.756 Mean 60.36615
Largest Std. Dev. 10.349
75% 68.222 73.938
90% 72.964 77.066 Variance 107.1017
95% 73.938 79.508 Skewness -.590668
99% 80.4 80.4 Kurtosis 3.668101
* The 26.74 is from a student who did not take the final and is likely
an outlier.
. ladder totalscore
Transformation formula chi2(2) P(chi2)
------------------------------------------------------------------
cubic totals~e^3 2.13 0.345
square totals~e^2 0.02 0.992
identity totals~e 5.68 0.058
square root sqrt(totals~e) 12.30 0.002
log log(totals~e) 21.22 0.000
1/(square root) 1/sqrt(totals~e) 31.54 0.000
inverse 1/totals~e 42.27 0.000
1/square 1/(totals~e^2) 61.95 0.000
1/cubic 1/(totals~e^3) . 0.000
* This suggests that the best transformation is a square to totalscore.
I don't regard this as a happy situation. So I exclude the low score.
. ladder totalscore if totalscore>30
Transformation formula chi2(2) P(chi2)
------------------------------------------------------------------
cubic totals~e^3 2.96 0.228
square totals~e^2 0.70 0.705
identity totals~e 0.77 0.681
square root sqrt(totals~e) 2.95 0.228
log log(totals~e) 6.54 0.038
1/(square root) 1/sqrt(totals~e) 11.18 0.004
inverse 1/totals~e 16.76 0.000
1/square 1/(totals~e^2) 29.23 0.000
1/cubic 1/(totals~e^3) 41.40 0.000
* Now the square and identity are about the same - I'd go with the
identity. For grading purposes, the centile command would give me a
simple way of finding cutoffs - in fact, I had gone through the grades
manually and came up with a set of letter grades that seemed to match
the centiles pretty well. In my experience, students sort themselves
into natural groups.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/