Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: Right skewed (positive) dependent variable
From
"Lachenbruch, Peter" <[email protected]>
To
"'[email protected]'" <[email protected]>
Subject
RE: st: Right skewed (positive) dependent variable
Date
Thu, 10 Jun 2010 09:05:20 -0700
There is also the issue of the effect of outliers on ladder or boxcox. I just had my class grades obtained. Here are some results
totalscore
-------------------------------------------------------------
Percentiles Smallest
1% 26.74 26.74
5% 41.942 35.95
10% 49.298 38.95 Obs 66
25% 53.424 41.942 Sum of Wgt. 66
50% 61.756 Mean 60.36615
Largest Std. Dev. 10.349
75% 68.222 73.938
90% 72.964 77.066 Variance 107.1017
95% 73.938 79.508 Skewness -.590668
99% 80.4 80.4 Kurtosis 3.668101
* The 26.74 is from a student who did not take the final and is likely an outlier.
. ladder totalscore
Transformation formula chi2(2) P(chi2)
------------------------------------------------------------------
cubic totals~e^3 2.13 0.345
square totals~e^2 0.02 0.992
identity totals~e 5.68 0.058
square root sqrt(totals~e) 12.30 0.002
log log(totals~e) 21.22 0.000
1/(square root) 1/sqrt(totals~e) 31.54 0.000
inverse 1/totals~e 42.27 0.000
1/square 1/(totals~e^2) 61.95 0.000
1/cubic 1/(totals~e^3) . 0.000
* This suggests that the best transformation is a square to totalscore. I don't regard this as a happy situation. So I exclude the low score.
. ladder totalscore if totalscore>30
Transformation formula chi2(2) P(chi2)
------------------------------------------------------------------
cubic totals~e^3 2.96 0.228
square totals~e^2 0.70 0.705
identity totals~e 0.77 0.681
square root sqrt(totals~e) 2.95 0.228
log log(totals~e) 6.54 0.038
1/(square root) 1/sqrt(totals~e) 11.18 0.004
inverse 1/totals~e 16.76 0.000
1/square 1/(totals~e^2) 29.23 0.000
1/cubic 1/(totals~e^3) 41.40 0.000
* Now the square and identity are about the same - I'd go with the identity. For grading purposes, the centile command would give me a simple way of finding cutoffs - in fact, I had gone through the grades manually and came up with a set of letter grades that seemed to match the centiles pretty well. In my experience, students sort themselves into natural groups.
Tony
Peter A. Lachenbruch
Department of Public Health
Oregon State University
Corvallis, OR 97330
Phone: 541-737-3832
FAX: 541-737-4001
-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Maarten buis
Sent: Thursday, June 10, 2010 8:51 AM
To: [email protected]
Subject: Re: st: Right skewed (positive) dependent variable
--- On Thu, 10/6/10, SURYADIPTA ROY wrote:
> However, as I look at my program now, I discover
> the source of the anomaly- my transformatrion
> was newvar=ln(1+oldvar).. that explains.
Are there 0s in your dependent variable (oldvar)?
If there are, then you really have no choice other
than go the -glm- route. There are ways of getting
a meaningfull interpretation out of a log transformed
dependent variable, but no such way exists for the
transformation log(oldvar + some constant), and
leaving the constant out is no sollution either, as
that means that he 0s will be recoded to missing
values. This may also explain your non-normality:
is there a spike at 0. If that is the case, than
there can be no transformation that will lead to
a normal distribution. In that case you could
consider modeling the zero separately using -zip-.
It is usually used for counts, but can also be
used for continuous variables in a Quasi-likelihood
kind of way, by specifying the -robust- option.
Hope this helps,
Maarten
--------------------------
Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen
Germany
http://www.maartenbuis.nl
--------------------------
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/