Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | "DeMichele, Matthew" <mdemichele@rti.org> |
To | <statalist@hsphsun2.harvard.edu> |
Subject | RE: st: multicollinearity stcox |
Date | Fri, 7 Mar 2014 06:07:04 -0500 |
Thanks for the direction Steve. I'll consult these sources. -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Steve Samuels Sent: Thursday, March 06, 2014 10:37 PM To: statalist@hsphsun2.harvard.edu Subject: Re: st: multicollinearity stcox Mathew: Unfortunately, the VIFs computed with -collin- assume that the technique is OLS, Also, -collin- can't handle time-varying covariates. I believe that instability indices should be based on the partial-likelihood information matrix; see, e.g. Hill et al. (2003); Lee and Weisfield (1996, which I've not read). John Hendrickx's -perturb- (SSC) can empirically assess ill-conditioning in the Cox model To handle collinear variables, I'd suggest a variable reduction technique like the LASSO. (Tibshirani, 2011). Unfortunately Stata has no LASSO for the Cox model, but you can find one in R's package penalized "L1 (lasso and fused lasso) and L2 (ridge) penalized estimation in GLMs and in the Cox model" (Goeman, 2010) Frank Harrell (2001) also has a section on variable selection and reduction. References: Goeman, J. J. 2010. L1 penalized estimation in the Cox proportional hazards model. Biometrical Journal 52(1), 70-84. Goeman: http://cran.r-project.org/web/packages/penalized/index.html Harrell, Frank E. 2001. Regression modeling strategies : with applications to linear models, logistic regression, and survival analysis. New York: Springer. Hill, R Carter, and Lee C Adkins. 2003. Chapter 12: Collinearity. In A Companion to Theoretical Econometrics, ed. BH Baltagi, 256-278. Oxford: Blackwell Publishing. Lee K. and Weissfeld L. (1996). A multicollinearity diagnostic for the Cox Model with time dependent covariates. Communications in Statistic: Simulation and Computation. 25(1): 41-60. Tibshirani, Robert. 2011. Regression shrinkage and selection via the lasso: a retrospective. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 73, no. 3: 273-282. Steve Samuels sjsamuels@gmail.com > On Mar 6, 2014, at 3:46 PM, "DeMichele, Matthew" <mdemichele@rti.org> wrote: > > Dear Statalist: > Looking for a bit of guidance regarding multicollinearity protocols > following stcox. I am running analyses on three split samples of > 10,002 (50%, 25%, and 25%) and don't see major problems with t-tests > and f-tests within the overall models. But, I do have variables that > have correlations in the region of .69 to .73. I've seen some > suggestion that up to r=.8 is acceptable. This seems a bit high to me. > So, following the stcox I've run collin with IVs. Here, I am using VIF > of less than 10 (which they're nowhere close ) and tolerance of <.4 > (which there are some in the .35 region). The condition numbers are in > the 12-19 range (with the 19 being a bit high). > > I have two questions (assuming this is enough information for people > to > answer): 1. Do my assumptions mentioned above sound reasonable related > to identifying multicollinearity? And, are there any suggestions about > alternative cutoffs? 2. I've also calculated a correlation matrix of > the coefficients (vce, corr). Are there suggested cutoffs? > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/