There's a bundle of issues here. To avoid
some of them, I'll focus on the idea of R-sq as
corr(response, predicted response)^2
or
corr(log response, predicted log response)^2
I presume that's what you're calling
"proportion of explained variance",
a term many dislike, for often-rehearsed reasons.
Even with that focus, this is partial.
1 Numerical
===========
It's possible that these two measures are
fairly close. Presumably that's most likely
if corr(y, log y) is very near 1 -- in which
case there is little point in a log transformation.
The implication is that y is measured over so
small a relative range that the curvature of the log
function can be neglected.
I would never count on them being close. But
typically it's very easy to get both measures
and compare.
2 Scientific
============
In perhaps a minority of situations, one decides that
a logarithmic scale is as or more convenient -- even
more natural -- as the raw scale, in which case scientifically
(practically, sociologically, whatever)
one is as happy working on a logarithmic scale
as on the original. Hackneyed but genuine examples
are pH and decibels. If the statistics also says "logarithmic
scales are better, because then model assumptions are more
nearly correct", then everything marches together.
This is not an absolute distinction. It seems that
economists can flip quite easily between thinking
about income and thinking about log income, especially
with practice. Both scales make enormous sense.
I don't know if "log systolic blood pressure"
ever seems quite natural in the same way, even if
the log transformation appeared sensible on statistical
grounds.
Clearly, there can be some tension between the
scientific ego and the statistical id (or is it
the other way round?) if the scientist (scientific
part of the researcher) wants to think in terms of
the original scales (and presumably measurement must have been lousy
if measured scales are thought dispensable).
3 Statistical
=============
As often mentioned on this list, one signal
merit of generalised linear models is that
they purport to give you the best of both
worlds, that you do the calculations
on a transformed scale -- by courtesy of
a link function -- but get results on the response scale.
Perhaps that's something to check out.
Nick
[email protected]
Buzz Burhans
> If one estimates the proportion of explained variance for a
> model using a
> log transformed variable, is that proportion of explained variance
> approximately applicable to the untransformed variable ? In other
> words, if I derive the proportion of explained variance of
> a dependant
> variable in a logtransformed model associated with a
> predictor variable,
> does that variable also explain a similar proportion of
> the variance (not
> necessarily exactly the same) in the untransformed raw
> metric? I appreciate
> that the variance itself in the two metrics is different,
> but is the
> proportion of explained variance similar?
>
> I have a model in which a treatment effect is significant,
> but explains
> little of the total variance. The model is run on
> transformed variables
> (log transformed outcome, and a fractional polynomial
> dependant time
> variable). Interpretation in practical terms should speak
> to this issue of
> minimal albeit significant treatment effect relative to
> contribution to the
> total variance, but I am not sure how to express this, or
> even if I can
> make any statement about it relative to the original raw
> metric since I
> modeled in the transformed metric.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/