Jeph Herrin wrote:
>>This must have been addressed here before, but I can't
find it.
I have a dataset of 1500 observations, each with an
identifier and a -y- value. -y- is highly skewed, and
nothing I've tried seems to normalize it.<<
There are numerous distributions that are not readily transformable to
normality. For a trivial example, it's not possible to transform the
exponential distribution to normality, at least not in a way most people
would be comfortable with. You can also have just too long a tail for
"sensible" transformations to pull it down far enough (think of income
in Bill Gates' neighborhood).
>>I'd like to simulate the distribution of -y-. Is there
a reasonable way to do this if I can't find a transform
of it that looks like a standard distribution?<<
Well Maarten noted that you could simply resample from the dataset.
However, if you are looking for a parametric family, you may consider
looking to see if the distributions found in the parametric survival
analysis program (streg) fit your data. You'll have to "fake" stset but
that's not hard by making a few variables that are all constant. There
are some very nice graphical facilities there to check whether the model
fits as well.
Another possibility is to use some kind of non-parametric or
semi-parametric smoothing. I don't know if there are Stata ports, but
there are some nice smooth density estimators in R, e.g., the
Kooperberg-Stone logspline density estimator, which allow simulation.
Jay
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/