Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Developing a Predictive Risk Equation from stcox survival analysis
From
"Roger B. Newson" <[email protected]>
To
[email protected]
Subject
Re: st: Developing a Predictive Risk Equation from stcox survival analysis
Date
Wed, 19 Sep 2012 15:36:59 +0100
I would second Steve's diagnosis. To find more about the issues with the
use of Harrell's c with predictive scores after -stcox- (or even after
-streg-), see Newson (2010).
Best wishes
Roger
References
Newson RB. Comparing the predictive power of survival models using
Harrell’s c or Somers’ D. The Stata Journal 2010; 10(3): 339–358.
Purchase from
http://www.stata-journal.com/article.html?article=st0198
or download pre-publication draft from
http://www.imperial.ac.uk/nhli/r.newson/papers.htm#papers_in_journals
Roger B Newson BSc MSc DPhil
Lecturer in Medical Statistics
Respiratory Epidemiology and Public Health Group
National Heart and Lung Institute
Imperial College London
Royal Brompton Campus
Room 33, Emmanuel Kaye Building
1B Manresa Road
London SW3 6LR
UNITED KINGDOM
Tel: +44 (0)20 7352 8121 ext 3381
Fax: +44 (0)20 7351 8322
Email: [email protected]
Web page: http://www.imperial.ac.uk/nhli/r.newson/
Departmental Web page:
http://www1.imperial.ac.uk/medicine/about/divisions/nhli/respiration/popgenetics/reph/
Opinions expressed are those of the author, not of the institution.
On 19/09/2012 15:03, Steve Samuels wrote:
Tom Robinson:
My problem is that the when I run an estat concordance on my model I get a
higher Harrel's C than I do when I run roctab on my outcome and the risks
I have calculated (using the development dataset still).
This is not surprising behavior: -roctab- is for binary outcomes; it ignores censoring and time.
Steve
On Sep 18, 2012, at 9:13 PM, Phil Clayton wrote:
After running your Cox model:
predict double xbeta, xb
predict double basesurv, basesurv
Now you need to know the baseline survivor function at 5 years. The mistake you've made is that this number is actually the same for everyone. Try this:
line basesurv _t, sort
You just need the point on the curve where _t==5. Since the baseline survivor function only goes down with time, this point is the minimum basesurv when time is less than 5 years:
sum basesurv if _t<5
scalar base5y=r(min)
Finally you can calculate each patient's risk at 5 years by adjusting the baseline risk:
gen risk5y=1 - base5y^exp(xbeta)
You can of course avoid the use of the scalar, but the above code makes it a little clearer what you're doing. Here's the abbreviated version:
sum basesurv if _t<5
gen risk5y=1 - r(min)^exp(xbeta)
A word on precision - you're raising a small number (between 0 and 1) to the exponent of another number. Therefore minor problems with precision rapidly become very big ones. This is why I have suggested using double precision for the xbeta and basesurv variables. It's also very useful to centre your covariates, so that the baseline survivor function represents an "average" patient rather than a patient with extremely or impossibly low risk. See "Making baseline reasonable" in [ST] stcox postestimation.
Phil
On 19/09/2012, at 6:23 AM, Tom Robinson wrote:
Hi,
I am using stcox to develop a predictive risk model but am unsure about how
to formulate the final equation. I am using Stata 12.1
I have independent variables that were collected by family physicians as
part of routine care e.g. blood pressure, lipids, renal function,
demographic variables, time since developing diabetes . These come from a
single review and I am using this review date as onset. The outcome is new
onset of end-stage renal failure which is collected from a range of
national datasets (in New Zealand).
I have developed a model using stcox which I'm happy with but need to turn
this into a risk prediction equation for risk at 5 years after which I can
use in a validation dataset. I have centered all the variables around their
mean.
What I have done so far is: (following Tangri N, Stevens LA, Griffith J, et
al. A predictive model for progression of chronic kidney disease to kidney
failure. JAMA. 2011;305(15):1553-1559.appendix)
- use predict *newvar*, xb to calculate each individuals overall hazard
coefficient
- confirmed for myself that this is equivalent to the sum of each
variable multiplied by its coefficient from the model
- confirmed that a dummy individual X with all the independent variables
set at 0 (in other words at the means) has a overall hazard of 0 (*newvar
*)
- used predict *newvar2*, basesurv to calculate the baseline survivals
- set individual X _t to 5 years which is the time period I'm interested
in predicting risk at. This individuals baseline survival is Y
- Used this survival in the equation gen risk5yr=1-(Y)^exp(*newvar*) to
calculate each persons risk of the event at 5 years
My problem is that the when I run an estat concordance on my model I get a
higher Harrel's C than I do when I run roctab on my outcome and the risks
I have calculated (using the development dataset still).
I have also run a calibration analysis on my calculated risks which is
wildly wrong (the predicted risks in each decile are about half of the
actual risks)
Clearly I'm doing something wrong but I can't see what. Thanks for any
advice
--
*Tom Robinson*
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/