In addition to Steve Samuels' various comments, I add a prejudice here against ln(X - k) as a transform unless k is specifiable in advance and on independent scientific or practical grounds.
This prejudice has several bases, varying from solid to ectoplasmic:
1. k is a lower limit to X and limits are always difficult to estimate from data. This is especially well documented with maximum likelihood which is one method of choice.
2. In effect, you are saying once you entertain k != 0: the two-parameter lognormal is not good enough; let's consider a three-parameter lognormal. I'd rather try other two-parameter distributions first, or equivalently other transformations. For example, fitting a two-parameter gamma is roughly equivalent in some senses to working on a cube root scale.
3. If you play with this approach, you get all sorts of ad hoc constants floating round in your analysis. It then gets rather difficult to discuss, to compare with other studies, etc.
Nick
[email protected]
Christian Weiss
thank you a lot for your elaboration on this topic! Although this was
very interesting for me, my actual question is still not answered yet.
So let me rephrase: If a variable is lognormally distributed
(according to swilk, lnnormal), why is it not "normally" distributed
after transforming it via ln / skskew0 / bcskew0 (according to swilk)
On Mon, Jun 8, 2009 at 12:18 PM, Maarten buis<[email protected]> wrote:
>
> --- On Mon, 8/6/09, Christian Weiss wrote:
>> testing my dependent var via swilk or sfrancia rejects the
>> Null Hypothesis of Normality.
>
> This is problematic for a number of reasons:
>
> 1) Regression never assumes that the dependent variable is
> normally distributed, except when you have no explanatory
> variables. It only assumes that the residuals are normally
> distributed.
>
> 2) Testing for the normality of the residuals should only
> be done once you are confinced that the other assumptions
> have been met, as violations of the other assumptions are
> likely to lead to residuals that look non-normal
>
> 3) The normality of the residuals is probably the least
> important of the regression assumptions, as regression
> is reasonably robust to violations of it.
>
> 4) Tests are probably not the best way to assess whether
> the errors are normaly distributed. Graphical inspection
> is usually more informative and powerful, see:
> -help diagnostic plots- and -ssc d hangroot- for tools
> to help with that.
>
> For a more general set of tools to perform post-estimation
> checks of regression assumptions see:
> -help regress postestimation-.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/