Richard,
Thanks for your full reply to my thread. It's difficult to disagree with
most of what you say, but what I was attempting to demonstrate was what
happens to R^2 when the correlations between two or more statistically
significant X-variables of interest are most certainly *not* zero (say one
of 0.6). When this happens, R^2 is inflated, because not only is the
variation in Y partly explained by the unique contribution made to it by
X1 and X2, because also partly by the *overlap* (for the want of a more
precise expression!) between them.
As I said towards of my last thread, one of the desired aims is to build a
model of explanatory variables which demonstrate *total independence* of
each other. But, since we as social scientists attempt to model the
determinants of human behaviour, that's little more than a pious hope,
since there will inevitably be some inter-correlation between explanatory
variables. The example I put forward demonstates this, and also
invalidates the numerous futile attempts made by social scientists that X1
on its own contributed a certain proportion to the R^2 out of all the
significant X's.
C.
> At 05:09 AM 10/29/2003 +0000, Clive Nicholas wrote:
>>unlikely to vote Labour and vice versa. Because this overlap is carried
>>forward to the computation of R^2, R^2 has been upwardly biased.
>
> Thanks, but I'm afraid I still don't follow. If the beta coefficients
> were
> all zero, R^2 would be zero. Further, while the intercorrelations of the
> Xs may affect how large R^2 is, I don't see how that causes R^2 to be
> "upwardly biased", i.e. just because something causes R^2 to be bigger
> doesn't mean that it becomes biased towards a larger value. I'm aware of
> various consequences of multicollinearity, e.g. large standard errors,
> large confidence intervals, increased likelihood of saying a coefficient
> does not differ from zero when it really does. But, I don't remember ever
> hearing "upwardly biased R^2" as a problem. But that doesn't mean I
> couldn't have missed it! But multicollinearity does not cause regression
> coefficients to be biased (wildly variable from one sample to the next,
> maybe, but not biased) so I am not sure why it would cause R^2 to be
> biased.
>
> What I might say instead is, suppose you have two populations. In both
> populations, the effects of the Xs on Y are identical. But, in one
> population, the Xs are much more highly correlated with each other than
> they are in the other population. This will likely cause the R^2 to
> differ
> between the 2 populations. If you just compared R^2 between the two
> populations and not the actual coefficients, you could get a very
> misleading idea of the differences between the two populations. These
> kinds of ideas are discussed in my "Evils of R^2" handout at
> http://www.nd.edu/~rwilliam/xsoc593/lectures/l16.pdf.
>
> -------------------------------------------
> Richard Williams, Associate Professor
> OFFICE: (574)631-6668, (574)631-6463
> FAX: (574)288-4373
> HOME: (574)289-5227
> EMAIL: [email protected]
> WWW (personal): http://www.nd.edu/~rwilliam
> WWW (department): http://www.nd.edu/~soc
>
> *
> * For searches and help try:
> * http://www.stata.com/support/faqs/res/findit.html
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
Yours,
CLIVE NICHOLAS,
Politics Building,
School of Geography, Politics and Sociology,
University of Newcastle-upon-Tyne,
Newcastle-upon-Tyne,
NE1 7RU,
United Kingdom.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/