Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: multicollinearity stcox
From
"DeMichele, Matthew" <[email protected]>
To
<[email protected]>
Subject
RE: st: multicollinearity stcox
Date
Fri, 7 Mar 2014 06:07:04 -0500
Thanks for the direction Steve. I'll consult these sources.
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Steve Samuels
Sent: Thursday, March 06, 2014 10:37 PM
To: [email protected]
Subject: Re: st: multicollinearity stcox
Mathew:
Unfortunately, the VIFs computed with -collin- assume that the technique
is OLS, Also, -collin- can't handle time-varying covariates. I believe
that instability indices should be based on the partial-likelihood
information matrix; see, e.g. Hill et al. (2003); Lee and Weisfield
(1996, which I've not read).
John Hendrickx's -perturb- (SSC) can empirically assess ill-conditioning
in the Cox model
To handle collinear variables, I'd suggest a variable reduction
technique like the LASSO. (Tibshirani, 2011). Unfortunately Stata has no
LASSO for the Cox model, but you can find one in R's package penalized
"L1 (lasso and fused lasso) and L2 (ridge) penalized estimation in GLMs
and in the Cox model" (Goeman, 2010) Frank Harrell (2001) also has a
section on variable selection and reduction.
References:
Goeman, J. J. 2010. L1 penalized estimation in the Cox proportional
hazards model. Biometrical Journal 52(1), 70-84.
Goeman: http://cran.r-project.org/web/packages/penalized/index.html
Harrell, Frank E. 2001. Regression modeling strategies : with
applications to linear models, logistic regression, and survival
analysis. New York: Springer.
Hill, R Carter, and Lee C Adkins. 2003. Chapter 12: Collinearity. In A
Companion to Theoretical Econometrics, ed. BH Baltagi, 256-278. Oxford:
Blackwell Publishing.
Lee K. and Weissfeld L. (1996). A multicollinearity diagnostic for the
Cox Model with time dependent covariates. Communications in Statistic:
Simulation and Computation. 25(1): 41-60.
Tibshirani, Robert. 2011. Regression shrinkage and selection via the
lasso: a retrospective. Journal of the Royal Statistical Society: Series
B (Statistical Methodology) 73, no. 3: 273-282.
Steve Samuels [email protected]
> On Mar 6, 2014, at 3:46 PM, "DeMichele, Matthew" <[email protected]>
wrote:
>
> Dear Statalist:
> Looking for a bit of guidance regarding multicollinearity protocols
> following stcox. I am running analyses on three split samples of
> 10,002 (50%, 25%, and 25%) and don't see major problems with t-tests
> and f-tests within the overall models. But, I do have variables that
> have correlations in the region of .69 to .73. I've seen some
> suggestion that up to r=.8 is acceptable. This seems a bit high to me.
> So, following the stcox I've run collin with IVs. Here, I am using VIF
> of less than 10 (which they're nowhere close ) and tolerance of <.4
> (which there are some in the .35 region). The condition numbers are in
> the 12-19 range (with the 19 being a bit high).
>
> I have two questions (assuming this is enough information for people
> to
> answer): 1. Do my assumptions mentioned above sound reasonable related
> to identifying multicollinearity? And, are there any suggestions about
> alternative cutoffs? 2. I've also calculated a correlation matrix of
> the coefficients (vce, corr). Are there suggested cutoffs?
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/