Title | Missing values reported by stsum | |
Author | Mario Cleves, StataCorp |
Look at the following stsum output:
. use https://www.stata.com/support/faqs/dta/stsum.dta . stsum, by(exp) Failure _d: disease Analysis time _t: time
Incidence Number of | Survival time | | ||
exp | Time at risk rate subjects 25% 50% 75% | |
0 | 83 .2650602 29 1 2 5 | |
1 | 93 .1935484 28 1 2 . | |
Total | 176 .2272727 57 1 2 . |
See the missing value for the 75th percentile of survival time for the group exp==1? That is the question: why is that value missing? In more extreme cases, you might see both the 50th and 75th percentile estimates, or even all three, missing.
In any case, the short answer is that the estimates are missing because the percentiles cannot be estimated. The percentiles of survival time reported are for completed survival times, and they can be obtained from the estimate of the failure function, F(t). This curve is computed as 1-(S(t)), where S(t) is the Kaplan–Meier product-limit estimate of the survival curve. Let's look at F(t) for this group:
. sts list if exp==1, failure Failure _d: disease Analysis time _t: time Kaplan–Meier failure function
At Failure Std. |
Time risk Fail Lost function error [95% conf. int.] |
1 28 7 0 0.2500 0.0818 0.1279 0.4539 2 21 8 0 0.5357 0.0942 0.3667 0.7244 3 13 2 0 0.6071 0.0923 0.4349 0.7833 4 11 1 4 0.6429 0.0906 0.4703 0.8114 5 6 0 2 0.6429 0.0906 0.4703 0.8114 6 4 0 2 0.6429 0.0906 0.4703 0.8114 10 2 0 1 0.6429 0.0906 0.4703 0.8114 12 1 0 1 0.6429 0.0906 0.4703 0.8114 |
. sts graph if exp==1, failure xlabel(1(1)15)
The failure function F(t) reports the probability of failing before or at time t. In other words, F(t) gives us the expected proportion of individuals that would fail by time t.
The failure function here becomes flat at F(t) = 0.6429. It does that because not all the subjects have died yet—this is called right-censoring.
What is the 25th percentile of completed survival times? The 25th percentile occurs where F(t) = .25 (meaning that 25% have failed and 75% have yet to fail), and that is t=1. Look back at the stsum output, and you will see that 1 is reported.
What is the 50th percentile of completed survival times? The 50th percentile occurs where F(t) = .50, and that is t=2. stsum reports that number too.
What is the 75th percentile? We do not know, because less than 75% of our individuals have already failed in our data. Thus, stsum reports the 75th percentile for survival time as missing.