Dear statalisters
I have a problem that probably stems from a basic lack of understanding how exactly -stset- expects data to be constructed. I hope one of you is able to help...
I am looking at survival after coronary artery bypass surgery (CABG).
The input data has a record for each surgery event (CABG) and death. People are uniquely identified by "mast_enc" . In this example, rows 15 and 16 are for a woman who had a CABG in April 1990, when she was aged 68. She died in August 2000, aged 78.
. list mast_enc sex age Date seq ageBand IncidentCABG event in 10/20
+-----------------------------------------------------------------+
| mast_enc sex age Date ageBand Incide~G event |
|-----------------------------------------------------------------|
15. | ........ F 68 03apr1990 65-74 1 CABG |
16. | ........ F 78 19aug2000 75-84 0 Death |
+-----------------------------------------------------------------+
The dataset is complete between 1 January 1988 and 31 December 2000. IncidentCABG marks the first surgery event (people can have multiple surgery events). I therefore declare this as survival time data with
. stset Date, id(mast_enc) failure(event==2) enter(time d(1jan1988)) origin(IncidentCABG==1) exit(time d(1jan2001)) scale(365.25)
id: mast_enc
failure event: event == 2
obs. time interval: (Date[_n-1], Date]
enter on or after: time d(1jan1988)
exit on or before: time d(1jan2001)
t for analysis: (time-origin)/365.25
origin: IncidentCABG==1
------------------------------------------------------------------------------
355412 total obs.
330154 ignored because never entered
7 obs. end on or before enter()
17971 obs. end on or before origin()
3916 obs. begin on or after exit
------------------------------------------------------------------------------
3364 obs. remaining, representing
3325 subjects
3081 failures in single failure-per-subject data
12891.04 total analysis time at risk, at risk from t = 0
earliest observed entry t = 0
last observed exit t = 13.02396
Only records after the first CABG are marked with _st==1, ie as being included in the survival time analysis. Record 16 shows a span of 10.37 years from the date of surgery to the date of death.
. list mast_enc age Date ageBand IncidentCABG event _st _d _origin _t _t0 in 10/20
+-------------------------------------------------------------------------------------+
| mast_enc age Date ageBand Incide~G event _st _d _origin _t _t0 |
|-------------------------------------------------------------------------------------|
15. | ........ 68 03apr1990 65-74 1 CABG 0 . 11050 . . |
16. | ........ 78 19aug2000 75-84 0 Death 1 1 11050 10.37974 0 |
+-------------------------------------------------------------------------------------+
The problem with this is that any survival analysis excludes people who had 1 or more CABG surgery events, but survived until after the exit date.
I presume the problem is something do with my using event rather than true time-span data. However while I have tried using -snapspan- to convert this to timespan data, I have clearly missed some important point -- any advice gratefully received!
==========================================
James Harris
National Centre for Epidemiology and Population Health
Building 62
The Australian National University
CANBERRA ACT 0200 Australia
CRICOS Provider #00120C
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/