Phil Schumm's already answered this, I think, but
let me add further comments.
The idea of a proper adjustment is chimerical
here. From what perspective? If you think
either model is correct, the other is
incorrect.
Otherwise put, different models are involved,
so you should not expect identical predictions.
Naturally, if they happen to be close, so
much the better.
The splendid name apart, I think Box-Cox has been rather
oversold. (Very strangely, their references as I recall
do not include the earlier Tukey _Annals of Mathematical
Statistics_ 1957 paper which was a key forerunner.)
The idea of using maximum likelihood to choose from
a family of transformations is indeed a big deal, but
not a very big deal. In their own paper, the authors
end up using logs in one example and reciprocals in
the other, which is just what good data analysts would
have done. If Box-Cox tells you the power should be 0.1,
most statistically-minded people I know take that
as a signal to use logarithms. So, the main idea
to me is, in Tukeyish terms, that of a ladder of
transformations, and the ML machinery is secondary.
Box-Cox mostly seems to appeal to those terrified of
appearing "subjective" or "arbitrary" to advisors and reviewers,
and frightened of using their judgement based on experience
and theory. I guess that if you have no experience
or theory to call upon the appeal will be substantial.
It's the same kind of issue as that facing those who will not
make the tiniest step without the sanction of a P-value.
(P = permission to proceed?)
Nick
[email protected]
Daniel Schneider
> Thanks for all the useful comments.
>
> Just to clarify the issue: For example, the predictions based on
> log(E[price]) = XG with GLM should be identical to the predictions
> generated from E[log(price)] = XB (fit by -regress-, generating
> B_hat), when the later are adjusted properly?
>
> What would you suggest for predictions based on a box-cox (left-hand
> side) transformation? A two step procedure, first estimating
> the box-cox
> transformation parameter and then using that parameter in a GLM to
> generate predicted variables?
Phil Schumm
> > To expand on Nick's suggestion, one of the primary features of the
> > GLM approach (as opposed to modeling a transformed variable) is to
> > obtain predictions on the raw (i.e., untransformed) scale.
> > So GLM is
> > absolutely an important alternative to consider if this is a
> > requirement.
> >
> > The reason your results are different is that you've fit two
> > different models. They are:
> >
> > E[log(price)] = XB (fit by -regress-, generating B_hat)
> >
> > and
> >
> > log(E[price]) = XG (fit by -glm-)
> >
> > One can show that under certain conditions, you can consistently
> > estimate G by B_hat (except for the intercept), but if those
> > conditions aren't met, B_hat will be estimating something
> > different.
> > Naively assuming that B_hat estimates G is a common mistake people
> > make when interpreting the results of a regression on a
> transformed
> > variable.
> >
> > The documentation on -glm- in [R] is a good start, but if you're
> > using this for anything important, I'd strongly suggest
> picking up a
> > copy of Generalized Linear Models (by McCullagh and Nelder), in
> > particular the chapters "An outline of generalized linear
> > models" and
> > "Models with constant coefficient of variation".
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/