Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: correcting skewness of an indep variables
From
Maarten buis <[email protected]>
To
stata list <[email protected]>
Subject
RE: st: correcting skewness of an indep variables
Date
Mon, 15 Mar 2010 16:27:39 +0000 (GMT)
--- Fabio Zona asked:
> When does one need to correct the skewness of an independent
> variable? I have a logit regression and my indep variable is
> strongly skewed; do I need to correct this (by using lnskew0 )??
Never. The only thing you need to take care of is that you
think that a linear relationship between your dependent and
independent variable is a reasonable summary of that effect.
A very skewed independent variables is sometimes a sign that
the effect of that variable might be non-linear. Consider
the following dataset:
*------------- begin example ---------------
use "http://www.indiana.edu/~jslsoc/stata/spex_data/tenure01.dta", clear
spikeplot articles
*------------- end example ------------------
Do we think that moving from 0 to 1 published articles has
the same on someones academic career as moving from 60 to
61 articles. I don't believe so, there are probably
"decreasing returns to publications". So here I would
probably log transform articles, so that a percentage
increase in the number of published articles has a
constant effect. The skew here gave a hint (actualy, the
range of that variable was the first thing that triggered
my suspicion about this variable), but the argument I used
to justify the transformation has to do with the relationship
between the dependent and independent variable.
Another reason for skewness is the presence of a spike ---
that is, a single value that is very common. In that case
you could consider adding the variable linearly + a dummy
indicating whether or not an observations belongs to the
spike group or not. We would do that, if think that that
value is in some sense special (this is often the case
when that spike value is 0). Say we have data on the
proportion of the women's income in the total family income.
In more traditional countries like Germany we might expect a
spike at zero. In this case adding the proportion + dummy
could make sense.
Hope this helps,
Maarten
--------------------------
Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen
Germany
http://www.maartenbuis.nl
--------------------------
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/