Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: multicollinearity stcox
From
Steve Samuels <[email protected]>
To
[email protected]
Subject
Re: st: multicollinearity stcox
Date
Thu, 6 Mar 2014 22:36:45 -0500
Mathew:
Unfortunately, the VIFs computed with -collin- assume that the technique
is OLS, Also, -collin- can't handle time-varying covariates. I believe
that instability indices should be based on the partial-likelihood
information matrix; see, e.g. Hill et al. (2003); Lee and Weisfield
(1996, which I've not read).
John Hendrickx's -perturb- (SSC) can empirically assess ill-conditioning
in the Cox model
To handle collinear variables, I'd suggest a variable reduction technique
like the LASSO. (Tibshirani, 2011). Unfortunately Stata has no LASSO for
the Cox model, but you can find one in R's package penalized "L1 (lasso
and fused lasso) and L2 (ridge) penalized estimation in GLMs and in the
Cox model" (Goeman, 2010) Frank Harrell (2001) also has a section on
variable selection and reduction.
References:
Goeman, J. J. 2010. L1 penalized estimation in the Cox proportional
hazards model. Biometrical Journal 52(1), 70–84.
Goeman: http://cran.r-project.org/web/packages/penalized/index.html
Harrell, Frank E. 2001. Regression modeling strategies : with
applications to linear models, logistic regression, and survival
analysis. New York: Springer.
Hill, R Carter, and Lee C Adkins. 2003. Chapter 12: Collinearity. In A
Companion to Theoretical Econometrics, ed. BH Baltagi, 256-278. Oxford:
Blackwell Publishing.
Lee K. and Weissfeld L. (1996). A multicollinearity diagnostic for the
Cox Model with time dependent covariates. Communications in Statistic:
Simulation and Computation. 25(1): 41-60.
Tibshirani, Robert. 2011. Regression shrinkage and selection via the
lasso: a retrospective. Journal of the Royal Statistical Society: Series
B (Statistical Methodology) 73, no. 3: 273-282.
Steve Samuels [email protected]
> On Mar 6, 2014, at 3:46 PM, "DeMichele, Matthew" <[email protected]> wrote:
>
> Dear Statalist:
> Looking for a bit of guidance regarding multicollinearity protocols
> following stcox. I am running analyses on three split samples of 10,002
> (50%, 25%, and 25%) and don't see major problems with t-tests and
> f-tests within the overall models. But, I do have variables that have
> correlations in the region of .69 to .73. I've seen some suggestion that
> up to r=.8 is acceptable. This seems a bit high to me. So, following the
> stcox I've run collin with IVs. Here, I am using VIF of less than 10
> (which they're nowhere close ) and tolerance of <.4 (which there are
> some in the .35 region). The condition numbers are in the 12-19 range
> (with the 19 being a bit high).
>
> I have two questions (assuming this is enough information for people to
> answer): 1. Do my assumptions mentioned above sound reasonable related
> to identifying multicollinearity? And, are there any suggestions about
> alternative cutoffs? 2. I've also calculated a correlation matrix of the
> coefficients (vce, corr). Are there suggested cutoffs?
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/