Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: multicollinearity with survey data
From
Christine Gourin <[email protected]>
To
"[email protected]" <[email protected]>
Subject
st: multicollinearity with survey data
Date
Tue, 22 Feb 2011 11:55:41 -0500
i have a question about how to check for multicollinearity with survey data. the only information I can find about this is at the site
http://www.stata.com/support/faqs/res/statalist.html#toask
I am using survey data to investigate variables associated with hospital volume (HVH) as the dependent variable.
I suspect that teaching status (HOSP_TEACH) is collinear with HVH, as all HVH hospitals are teaching hospitals.
I am not sure how to check for multicollinearity in the full model, which is
xi: svy: logistic HVH elective i.agecat flap neckdissection i.procedure i.payor radiation HOSP_TEACH i.RACE i.comorbidity
when I run this model, stata drops HOSP_TEACH saying it predicts failure perfectly.
But when I check vif per the link attached it is not collinear.
have done so several ways:
1) testing just differing combinations of the independent variables: example,
xi: svy: regress HOSP_TEACH elective
display "tolerance = " 1-e(r2) " VIF = " 1/(1-e(r2))
this gives output of
tolerance = .99708964 VIF = 1.0029189
2) testing the dependent variable with individual independent variables:
xi: svy: regress HVH HOSP_TEACH
display "tolerance = " 1-e(r2) " VIF = " 1/(1-e(r2))
this gives output of
------------------------------------------------------------------------------
| Linearized
HVH | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
HOSP_TEACH | .2701522 .0414694 6.51 0.000 .188855 .3514494
_cons | 1.52e-14 . . . . .
------------------------------------------------------------------------------
but also tolerance = .90653199 VIF = 1.103105
3) running full regression of all independent variables only testing each first: example
xi: svy: regress HOSP_TEACH i.RACE i.comorbidity HVH elective age65 flap neckdissection i.procedure i.payor radiation
display "tolerance = " 1-e(r2) " VIF = " 1/(1-e(r2))
i get tolerance = .95517604 VIF = 1.0469274
4) finally if I just run the full model and "display tolerance"
xi: svy: regress HVH elective i.agecat flap neckdissection i.procedure i.payor radiation HOSP_TEACH i.RACE i.comorbidity
display "tolerance = " 1-e(r2) " VIF = " 1/(1-e(r2))
HOSP_TEACH is not dropped and the tolerance = .87624609 VIF = 1.1412319
this suggests I should leave all variables in?
********************************
none of these steps suggest that HOSP_TEACH is collinear, though I am unclear which of these 4 approaches is the correct approach to use.
when I run my final model as a logistic regression:
xi: svy: logistic HVH elective i.agecat flap neckdissection i.procedure i.payor radiation HOSP_TEACH i.RACE i.comorbidity
svylogitgof
HOSP_TEACH is dropped.
which is the right step I should take to test multicollinearity?
and am I confusing collinearity with perfect prediction? should I drop HOSP_TEACH from my final model (which will give me more power, population-size wise)?
many thanks in advance
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/