Thank you Maarten,
When I run your program the histogram of mean deviance residuals is indeed
centered on the vertical line (i.e., the mean is ~0.97). So in that sense
the fitted Cox model behaves well on the bootstrap samples. Still, my
concern is that I am capturing a lot of random fluctuations with only 52
cases and six parameters. Maybe I am wrong.
Andre
On 11/13/07 3:54 PM, "Maarten buis" <[email protected]> wrote:
> --- Andrzej Niemierko <[email protected]> wrote:
>> Is there a test for overfitting when using Cox proportional hazards
>> model? Specifically, I have a small data set with only 52 cases (39
>> events) and 12 independent variables. Six of those independent
>> variables are jointly significant (all at p<0.001) in the Cox model.
>> Although in my case it makes perfect sense that those six variables
>> are associated with the outcome I have a feeling that I am
>> overfitting the data at hand.
>
> The problem with overfitting is that you are fitting such a flexible
> model that what you are picking up are the random fluctions due to
> sampling and not the systematic part of the fluctuations. This is
> problematic because the findings will now be uninformative about the
> population or the next sample.
>
> One way to get a feel for this is to 1) estimate your model in your
> data, 2) create a bootstrap sample, and 3) see how the model with the
> estimates from the real data would fit in the bootstrap sample. If you
> are overfitting, than your estimates in the real data would have little
> relevance for the bootstrap data and the fit would be bad. If you
> haven't overfitted the data than the fit would cluster around the fit
> in your real data.
>
> Usually I am not such a fan of fit statistics, so I can't remember an
> appropriate one for the Cox model, so I improvised by looking at the
> mean deviance residual. You might have to change that. Otherwise, the
> example below should do what you want: If you are overfitting that the
> center of the histogram should be to the right of the vertical line I
> added with the -xline()- option (assuming that "more is bad" in your
> choice of fit statistic).
>
> *--------------- begin example -------------------
> sysuse cancer, clear
> stset studytim, failure(died)
> xi: stcox age i.drug, mgale(mgale)
>
> tempname memhold
> tempfile results
> postfile `memhold' msd using `results'
> forvalues i = 1/1000 {
> preserve
> bsample
> predict dev, deviance
> gen dev2 = dev^2
> sum dev2, meanonly
> post `memhold' (`r(mean)')
> restore
> }
> postclose `memhold'
> predict dev, deviance
> gen dev2 = dev^2
> sum dev2, meanonly
> local observed = r(mean)
>
> use `results', clear
> hist msd, xline(`observed') /*
> */ xtitle("mean squared deviance residual")
> *---------------------- end example -------------------------
> (For more on how to use examples I sent to the Statalist, see
> http://home.fsw.vu.nl/m.buis/stata/exampleFAQ.html )
>
> Hope this helps,
> Maarten
>
> -----------------------------------------
> Maarten L. Buis
> Department of Social Research Methodology
> Vrije Universiteit Amsterdam
> Boelelaan 1081
> 1081 HV Amsterdam
> The Netherlands
>
> visiting address:
> Buitenveldertselaan 3 (Metropolitan), room Z434
>
> +31 20 5986715
>
> http://home.fsw.vu.nl/m.buis/
> -----------------------------------------
>
>
> ___________________________________________________________
> Want ideas for reducing your carbon footprint? Visit Yahoo! For Good
> http://uk.promotions.yahoo.com/forgood/environment.html
> *
> * For searches and help try:
> * http://www.stata.com/support/faqs/res/findit.html
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
The information transmitted in this electronic communication is intended only for the person or entity to whom it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you received this information in error, please contact the Compliance HelpLine at 800-856-1983 and properly dispose of this information.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/