Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Right skewed (positive) dependent variable
From
SURYADIPTA ROY <[email protected]>
To
[email protected]
Subject
Re: st: Right skewed (positive) dependent variable
Date
Thu, 10 Jun 2010 16:02:30 -0400
Maarten,
You are right- there is a huge amount of clustering of data very close
to 0 (but not equal to 0). For the different dependent variables, the
skewness range from 4.20 - 5.42. I believe that one motivation of my
using log(1+oldvar) transformation initially was to avoid any missing
value problem. Incidentally, for regressions with transformed dep.
variables after -ladder- , the rvfplots yielded very nice plots with
the residuals scattered nicely around the 0-line. I believe that is
what we are really after, and not the normality of the dependent
variable? Anyway, I am studying -glm- very carefully for
implementation.
Thanks to all of you for very helpful discussions and suggestions!
Suryadipta.
On Thu, Jun 10, 2010 at 11:51 AM, Maarten buis <[email protected]> wrote:
> --- On Thu, 10/6/10, SURYADIPTA ROY wrote:
>> However, as I look at my program now, I discover
>> the source of the anomaly- my transformatrion
>> was newvar=ln(1+oldvar).. that explains.
>
> Are there 0s in your dependent variable (oldvar)?
> If there are, then you really have no choice other
> than go the -glm- route. There are ways of getting
> a meaningfull interpretation out of a log transformed
> dependent variable, but no such way exists for the
> transformation log(oldvar + some constant), and
> leaving the constant out is no sollution either, as
> that means that he 0s will be recoded to missing
> values. This may also explain your non-normality:
> is there a spike at 0. If that is the case, than
> there can be no transformation that will lead to
> a normal distribution. In that case you could
> consider modeling the zero separately using -zip-.
> It is usually used for counts, but can also be
> used for continuous variables in a Quasi-likelihood
> kind of way, by specifying the -robust- option.
>
> Hope this helps,
> Maarten
>
> --------------------------
> Maarten L. Buis
> Institut fuer Soziologie
> Universitaet Tuebingen
> Wilhelmstrasse 36
> 72074 Tuebingen
> Germany
>
> http://www.maartenbuis.nl
> --------------------------
>
>
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/