Line for the server...
I am trying to construct a dataset to perform survival analysis on time from HIV seroconversion to
death, using a time-varying exposure. My data comes from an open cohort study with yearly surveys
which asks questions regarding the previous 12 months.
My current data structure is such that each participant has a row of information for each yearly
interview she participated in. Some women participated in only one round, others participated in
several, but not necessarily consecutive, rounds. This division into multiple rows per participant
allows for the incorporation of time-varying exposure information. Records subsequent to the first
row will need to be treated as late entries.
My questions relate to how best to construct the beginning and end variables to denote the time span
for each record. Given the dearth of information I have been able to find on data management for
survival analysis with time-varying covariates, I would be so grateful for comments on whether I
have conceived of this data structure properly.
1. The first row a woman contributes would span from date of seroconversion to date of her 1st
interview (this may require the assumption that information on time-varying covariates was valid
more than one year prior to the interview). If she did multiple interviews, her second row would
begin with a date one year prior to the date of her 2nd interview (since interview questions relate
to the past 12 months). The end date for this second row would be the date of the interview, assuming
that she remained alive and is thus counted as a censored observation. So on and so forth for additional
interviews.
2. If a woman died, the date of her death obviously occurred after the date of her last interview,
so I will need to assume that information provided at her last interview date carried forward until
the time of her death. So for a woman who died, her last row would begin one year prior to her final
interview, but would end at the date of her death (her final interview date would not appear in the
row). Does this seem correct?
3. This also means that if a woman contributed only one interview, the time span would be from
date of seroconversion to date of death, with the assumption that the information collected during
that one interview was consistent during that entire time span.
4. I believe I would stset my data as follows:
a. stset end, id(study_id) time0(begin) origin(time seroconversion_date) failure(event==1)
All thoughts and comments would be greatly appreciated!
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/