hi,
I have a dependant variable where a bulk of the data
varies between 0.5748 and -0.5984. Just plotting the
data between these values gives me a reasonably normal
curve. However, I also have about 100 observations
where the values on the dependant variable vary are
between abs(25) to abs(0.6). When these observations
are included, I get a very skewed distribution with
very high peaks on the 0, and 1, and then very few
observations on all other values. As a result, if I
run an OLS on the complete data, regression
diagnostics shows a very skewed error distribution,
and about a 100 outliers.
For theoretical reasons, I would rather not convert
the depenedent variable into a 0/1, and use logistic
regressions. Is there any other way to deal with this
data? Transformations such as log transformations,
inverse transformations, square root transformations
don't work due to the zeros and negative values, and
also because they further pull the extreme values in,
hence increasing the peaks in the distribution.
Is there a way of transforming the data so that it
stretches in the middle and pulls in values at the
extremes? Do any of you have suggestions about any
other ways of dealing with this kind of dependant
variable? It will be great to have techniques that
are relatively easy to implement in stata?
thanks
dalhia
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/