I've always been suspicious of the formulation 'percentage of variance
explained' in particular.
There is the underlying idea that 100% of the variance in the predicted
variable can be 'explained' (I prefer 'associated with variation in the
predictor variables'). This isn't the case.
You would imagine that if you had weight in kilos and weight in pounds, then
100% of the variation in one is associated with variation in the other. In a
real life case, we found that this was not so. Doctors who had weighed their
patients had used either a metric or imperial weighing scales, and had
converted the result so as to be able to fill in the weight as both pounds
and kilos. The errors and inaccuracies of the different methods they had
used meant that there was a less than perfect correlation between the two.
Errors in measurement will always place a ceiling on the amount of variation
shared between a number of variables. Unless you know what this ceiling is,
the idea that r^2 can reach the magic figure of 100% is a will of the wisp,
leading you astray.
Ironically, there is a lot of attention paid to R^2 in psychology, a
discipline in which imperfect measurement abounds.
But fundamentally, I agree with Nick Cox: R^2 tells you nothing of the
utility of the model, either from the theoretical or practical standpoint.
As a sole criterion for model selection, it should only be used when there
is no-one in the office capable of formulating a theory (and the cleaners
have gone home).
Ronan M Conroy ([email protected])
Lecturer in Biostatistics
Royal College of Surgeons
Dublin 2, Ireland
+353 1 402 2431 (fax 2764)
--------------------
Join the big noise
www.maketradefair.org
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/