I am not sure that being affine is to the point;
linearity is the pertinent property, I think.
Otherwise I agree with Austin. Also, several
excellent points made by Maarten Buis earlier are
still, it seems, being overlooked here.
In general, log(x +/- constant) is not a transformation
I would ever use with percents. Rarely, if ever, does the constant
come with a story attached.
To restate various points more emphatically:
1. Having a skewness of 0 and being symmetric are not in
general identical, as any measure of skewness could be near 0
in some asymmetric distribution, surprised though one might
be by stark asymmetry when the chosen measure is 0.
(0, 0, 0, 1, 1, 1, 3) satisfies mean = median
and thus (mean - median) / spread = 0.
2. Being symmetric and being normal (Gaussian) are also
not in general identical, as many symmetric distributions
are not Gaussian.
Broadly speaking, -lnskew0- is better thought of as a way
of fitting three-parameter lognormals, not a very general
way of symmetrising (let alone normalising) data. Unless
there are grounds for thinking that the data should be
lognormal, there are essentially no guarantees here.
Nick
[email protected]
Austin Nichols
My suggestion was a bit tongue in cheek--it is not a z-score, as that
is an affine transformation (-ssc inst center- to use -center- to make
z-scores using the s option), whereas "sort mpg // g
z=invnorm(_n/(_N+1))" is nonlinear and makes a variable look very
normal indeed...
You should probably not transform your variable at all.
Why aren't you are using -ice-and -mim- from SSC? -ssc inst ice- and
-ssc inst mim- put both at your disposal.
tdavis7
>
> Thank you for your response. But I have one last question. The latter
> transformation appears to be a simple z score? Is that correct? I ask
> because this changes my regression results a bit and I want to make
> sure that I haven't performed some obscure transformation that I am
> unable to explain.
>
> I am concerned about normality because I am creating multiple imputed
> data sets using Amelia with the data that I currently have. One of the
> assumptions of multiple imputation is normality (univariate at the
> least but mulitvariate ideally). I plan on using STATA to estimated
> OLS regression coeffients with the imputed data, but I also plan to do
> some SEM and HLM with the imputed data. Can I still use the "g
> z=invnorm(_n/(_N+1))" transformation or should I stick with lnskew0
> even though the histogram appears skewed despite the acutal skew
> statistic?
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/