Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: Right skewed (positive) dependent variable

From	"Lachenbruch, Peter" <[email protected]>
To	"'[email protected]'" <[email protected]>
Subject	RE: st: Right skewed (positive) dependent variable
Date	Thu, 10 Jun 2010 11:35:38 -0700

Agreed.  The point of the email was that outliers can affect the ladder routine as well as BoxCox.  If I really was concerned about something like this, I'd consider a logit transformation or some such.  

One doesn't want to use a bulldozer to plant a daisy...

Tony

Peter A. Lachenbruch
Department of Public Health
Oregon State University
Corvallis, OR 97330
Phone: 541-737-3832
FAX: 541-737-4001


-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Nick Cox
Sent: Thursday, June 10, 2010 9:15 AM
To: [email protected]
Subject: RE: st: Right skewed (positive) dependent variable

In principle this variable looks bounded by 0 and 100. 

In practice, that may not bite here, or perhaps even often, but it's
important to note that -ladder- and friends have no intelligence to
detect bounded variables and no scope for doing something special with
such variables. 

Nick 
[email protected] 

Lachenbruch, Peter

There is also the issue of the effect of outliers on ladder or boxcox.
I just had my class grades obtained.    Here are some results

                         totalscore
-------------------------------------------------------------
      Percentiles      Smallest
 1%        26.74          26.74
 5%       41.942          35.95
10%       49.298          38.95       Obs                  66
25%       53.424         41.942       Sum of Wgt.          66

50%       61.756                      Mean           60.36615
                        Largest       Std. Dev.        10.349
75%       68.222         73.938
90%       72.964         77.066       Variance       107.1017
95%       73.938         79.508       Skewness       -.590668
99%         80.4           80.4       Kurtosis       3.668101

* The 26.74 is from a student who did not take the final and is likely
an outlier.

. ladder totalscore

Transformation         formula               chi2(2)       P(chi2)
------------------------------------------------------------------
cubic                  totals~e^3              2.13        0.345
square                 totals~e^2              0.02        0.992
identity               totals~e                5.68        0.058
square root            sqrt(totals~e)         12.30        0.002
log                    log(totals~e)          21.22        0.000
1/(square root)        1/sqrt(totals~e)       31.54        0.000
inverse                1/totals~e             42.27        0.000
1/square               1/(totals~e^2)         61.95        0.000
1/cubic                1/(totals~e^3)             .        0.000

* This suggests that the best transformation is a square to totalscore.
I don't regard this as a happy situation.  So I exclude the low score.

. ladder totalscore if totalscore>30

Transformation         formula               chi2(2)       P(chi2)
------------------------------------------------------------------
cubic                  totals~e^3              2.96        0.228
square                 totals~e^2              0.70        0.705
identity               totals~e                0.77        0.681
square root            sqrt(totals~e)          2.95        0.228
log                    log(totals~e)           6.54        0.038
1/(square root)        1/sqrt(totals~e)       11.18        0.004
inverse                1/totals~e             16.76        0.000
1/square               1/(totals~e^2)         29.23        0.000
1/cubic                1/(totals~e^3)         41.40        0.000

* Now the square and identity are about the same - I'd go with the
identity.  For grading purposes, the centile command would give me a
simple way of finding cutoffs - in fact, I had gone through the grades
manually and came up with a set of letter grades that seemed to match
the centiles pretty well.   In my experience, students sort themselves
into natural groups.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- Re: st: Right skewed (positive) dependent variable
  - From: SURYADIPTA ROY <[email protected]>
- Re: st: Right skewed (positive) dependent variable
  - From: Maarten buis <[email protected]>
- RE: st: Right skewed (positive) dependent variable
  - From: "Lachenbruch, Peter" <[email protected]>
- RE: st: Right skewed (positive) dependent variable
  - From: "Nick Cox" <[email protected]>

Prev by Date: st: Multistage sampling svyset
Next by Date: re: st: AW: Labeling variable values in Regression Tables
Previous by thread: RE: st: Right skewed (positive) dependent variable
Next by thread: Re: st: Right skewed (positive) dependent variable
Index(es):
- Date
- Thread