|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: simulating discrete-time survival data
Dear all,
I have a question about simulating discrete-time survival data. A
number of articles in the biostat literature base the simulation on
the logit of the hazard function; however, I haven't found any that
give precise details of the algorithm used. In particular, I'm not
sure what the best approach is for translating hazard rates into
survival times in the simulation. The following is my attempt at this
with 1000 cases, 10 time periods, and one covariate.
*** setting the seed, case-period data structure, and one covariate
set seed 135711
set obs 1000
gen id = _n
gen x1 = (runiform() > 0.5)
forval i = 1/10 {
gen period`i' = 1
}
reshape long period, i(id) j(time)
*** simulating hazard rate
gen logit_hzd = -2 + 0.1*time + 0.69*x1
gen hzd = exp(logit_hzd)/(1 + exp(logit_hzd))
*** determining survival time
gen srv = (hzd > runiform())
The idea is that after simulating a hazard rate for each case at each
time period (which in this set up has no random component), I compare
the rate with a draw from uniform distribution. If the hazard rate (a
probability in discrete-time setting) is greater than the uniform
draw, then I set the case as experiencing the event in that time
period (and with some additional lines of the program not shown,
consider only the first period that a case experiences an event).
This seems to work in that when I estimate a discrete-time model I
recover the population parameters within sampling fluctuation, but I'm
not certain if this approach is theoretically justified. Does anyone
know of a better approach to the simulation or a citation that gives
precise details on simulating discrete-time survival data?
Best,
Shawn
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/