On Mon, 2 Jun 2003 13:40:04 -0500 (Central Daylight Time) Erin Kelly
<[email protected]> wrote:
> Dear all,
>
> I am estimating discrete time hazard models using cloglog. The data are
> structured as organization-years (similar to person-months) and many of
> the covariates are time-varying, i.e. change with each organization-year.
> I know only the year in which the failure / event occurred, not the
> specific moment or date. I also have many tied events / many failures in
> the same year. These characteristics of the data have led me to
> discrete-time models.
>
> My question is whether I'm specifying the time variables correctly to
> check out different functional forms of the hazard. I want to compare a
> constant hazard, a linear increase in the hazard, and a piece-wise
> constant exponential model. I have been referring to Stephen Jenkins'
> terrific lectures and lessons and found explicit confirmation on how to
> write a piece-wise exponential model for discrete-time data, but I'd like
> to run the other specifications by you all too.
>
> ALSO a reviewer saw these models and said "but you're not really doing
> event-history analyses." Any suggestions for quick explanations of why
> discrete-time methods are legitimate and actually more appropriate for
> these data? I'd be especially happy to cite recent sociology or political
> science articles that use discrete-time analyses, so let me know if you
> have an empirical example for me to review and possibly cite.
<snip>
Let me second what Jesper Sorensen said, and add a couple of remarks.
Your survival times are interval censored rather than intrinsically
discrete. So, if you assume a continuous time proportional hazards
model for the underlying process, then the model appropriate for
modelling your grouped-survival times is the cloglog one applied to the
organisation-year data set (derivation in many places, including my
Lecture Notes). I.e. if log[h(t,X)] = log[h0(t)] + b'X is the cts time
PH model, cloglog[p(t,X)] = g(t) + b'X is the corresponding discrete
time model. Thus you can identify the "b" from the cts time model from
your discrete time cloglog model. The g(t) for each interval can be
interpreted as the log of the integrated baseline hazard (h0(t)) over
the interval, and so restrictions on the shape of the g(t) function
lead to models corresponding to different continuous time models.
Have a look at, for example, page 417 of Sueyoshi GT, (1995) 'A class
of binary response models for grouped duration data', Journal of
Applied Econometrics, 10, 411-431. He shows the formula so that the
interval-censored PH model (i.e. cloglog) corresponds to cts time
Weibull and Gompertz models. [He also provides a number of
generalisations, and relates the discrete time logistic model to
underlying cts models.]
So one strategy of responding to your referee is to develop your
discrete model with direct reference to an underlying cts time one.
[Which of them should be labelled the "event history" model seems a
semantic waste of time to me. By contrast with your reviewer, the
sociologists typically use discrete time models when they do what they
call event history analysis!]
The specifications of the baseline hazard you imposed in your model
seem ok to me: piecewise constant, linear, loglinear. As you say, they
provide /analogues/ to the shapes of the cts time models you mention.
If you wanted to make the correspondence exact, you would have to
impose the more specific restrictions on the shape of g(t) as given by
Sueyoshi ... which is relatively complicated to do.
good luck
Stephen
----------------------
Professor Stephen P. Jenkins <[email protected]>
Institute for Social and Economic Research (ISER)
University of Essex, Colchester, CO4 3SQ, UK
Tel: +44 (0)1206 873374. Fax: +44 (0)1206 873151.
http://www.iser.essex.ac.uk
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/