Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: multicollinearity with survey data


From   Steven Samuels <[email protected]>
To   [email protected]
Subject   Re: st: multicollinearity with survey data
Date   Wed, 23 Feb 2011 17:20:55 -0500

Rachel, 

Your advice about collinearity is incorrect.

1. A test for zero correlation among predictors has no place in a study of collinearity. Natural correlation among predictors is expected. 
 
Perfectly collinear variables are those with a multiple correlation R-square of 1.0 when regressed on others; these are the types that tossed out by regression programs.  Rather than "test" for multicolinearity (and I shouldn't have used that phrase), the proper approach is to evaluate how bad it is.  The measures for doing so are the variance inflation factor (VIF) for each predictor, or equivalently, the multiple R for predicting that variable with the others.

2. Contrary to your belief, adding collinear variables can improve a model. Indeed if the goal is simply to get the best possible prediction of Y, then collinearity might be more or less irrelevant. 

The real problem caused by high multicollinearity is that it makes it difficult to interpret individual regression coefficients.  For a treatment see any text on multiple regression. It is impossible to give blanket advice about what to do if high collinearity is found. Certainly dropping the most collinear variable is one option; but what if that is a predictor of interest?  There is a large literature on this topic.

Steve

Steven J. Samuels
Consulting Statistician
18 Cantine's Island
Saugerties, NY 12477 USA
Voice: 845-246-0774
Fax:   206-202-4783 
[email protected]







On Feb 23, 2011, at 5:30 AM, rachel grant wrote:

I am not an expert on this so correct me if I am wrong Stata Listers!
In my models (negative binomial regression) Stata automatically checks
for multicollinearity and omits colinear variables and then tells you
it has done so. Multicollinearity just means that variables are highly
correlated with each other so if you want to test for it, run a simple
correlation test. Including colinear variables adds no new info to the
model. Ifyou have several variables that are highly correlated with
each other, you only need use one of these in the model.
Rachel

Rachel Grant
Dept. Life Sciences
Open University
UK

On 23 February 2011 05:03, Christine Gourin <[email protected]> wrote:
> thank you!
> how do you test for collinearity with survey data, however?
> ________________________________________
> From: [email protected] [[email protected]] On Behalf Of Steven Samuels [[email protected]]
> Sent: Tuesday, February 22, 2011 1:27 PM
> To: [email protected]
> Subject: Re: st: multicollinearity with survey data
> 
>> On Feb 22, 2011, at 11:55 AM, Christine Gourin wrote:
>> 
>> i have a question about how to check for multicollinearity with survey data. the only information I can find about this is at the site
>> http://www.stata.com/support/faqs/res/statalist.html#toask
>> 
>> I am using survey data to investigate variables associated with hospital volume (HVH) as the dependent variable.
>> I suspect that teaching status (HOSP_TEACH) is collinear with HVH, as all HVH hospitals are teaching hospitals.
>> 
>> I am not sure how to check for multicollinearity in the full model, which is
>> 
>> 
>> xi: svy: logistic HVH elective i.agecat flap neckdissection i.procedure i.payor radiation HOSP_TEACH  i.RACE i.comorbidity
>> 
>> 
>> 
>> when I run this model, stata drops HOSP_TEACH saying it predicts failure perfectly.
>> 
> 
> This message has nothing to do with multicollinearity.  Multicollinearity concerns the correlations of predictors with each other. This message, refers to the association of outcome and one predictor.  Tabulating HVH against HOSP_TEACH should show you the problem.
> 
> 
> Steve
> 
> Steven J. Samuels
> Consulting Statistician
> 18 Cantine's Island
> Saugerties, NY 12477 USA
> Voice: 845-246-0774
> Fax:   206-202-4783
> [email protected]
> 
> 
> 
> 
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
> 



-- 
regards, Rachel

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index