Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: how evaluate the accuracy of parametric survival models in a resampling process?
From
Steven Samuels <[email protected]>
To
[email protected]
Subject
Re: st: how evaluate the accuracy of parametric survival models in a resampling process?
Date
Tue, 19 Apr 2011 12:23:28 -0400
Alberto-
The three distributions you studied are all special cases of the two-parameter (p,q) Generalized F distribution (Kalbfleisch & Prentice, 2002, p. 74, ). You can discriminate among these models. In fact, the Weibull and Log-Normal families differ in one parameter ("q") and their likelihoods can be compared; also the Weibull and log-normal families differ in one parameter ("p" and also can be compared. To discriminate between the Weibull and log-normal likelihoods, you would need to fit the two parameter model directly.
I suggest that fit the generalize F, then you could decide for each replicate: 1) which of the three models has the highest log-likelihood (and choose that as the "best"); and 2) whether a confidence interval for (p,q) excludes all three.
But you can do more if you did bootstrap resampling (it's not clear). Fit the generalized F and bootstrap estimates of "p" and "q" and also get a bivariate CI for the two.
You asked a similar question earlier this month, and I responded with a suggestion that you use Somer's D as a measure of fit. (http://www.stata.com/statalist/archive/2011-04/msg00107.html-)
Ref: J Kalbfleisch & R Prentice, 2002, The Statistical Analysis of Failure Time Data, 2nd Ed, Wiley, NY.
Steve
[email protected]
On Apr 18, 2011, at 4:05 PM, Albert Navarro wrote:
I re-write this message. Last subject message was unclear... I'm sorry!
----------------
Dear all,
we are conducting a study to identify the associated distribution with the generating process of a particular phenomenon (survival data). Briefly:
1. We fitted Weibull, log-normal and log-logistic models to 1000 resamples (null models, without covariates)
2. We compared the AIC of the models in each resample. We selected the model with lowest AIC in the higher number of resamples.
3. Estimated parameters: we selected the median of the estimations in the 1000 resamples, for the selected model.
The next step would be to get evidence on the accuracy of the model selected. The best model is not necessarily a good model ...
Because we are working with 1000 resamples, graphical methods aren’t very practical. We prefer not using a pseudo R2 (“A perfectly adequate model may have what, at face value, seems like a terrible low R2 due to a high percent of censored data” [Hosmer-Lemeshow]).
Can anyone help us, please? For weeks we are thinking on this issue and we fail to find a good solution.
Thank you very much,
Best regards,
Albert Navarro
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/