Hi Chelsea,
Since your question was specifically on data management, my answer was also from a data management perspective - ie without regards to any analysis that you might do to it. Of course discarding all the observations with death would be a big problem, but from a data management perspective, that's irrelevant. I would, however, code the data first with the missing data, and save that as one dataset. If you want to assume that the covariate carries forward to the next time period, that's an imputation procedure, and you should mention it specifically in your writeup - unless it is obvious that it must carry forward (eg sex). Carry out the imputation, and save it as another dataset. In case you want to use a different imputation method later, you can always go back to your original dataset.
Hope that helps.
Tim
-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Polis, Chelsea B.
Sent: 22 January 2009 09:27
To: [email protected]
Subject: RE: st: RE: Data management for survival analysis with time-varying covariates
Tim,
Many thanks for your response. May I ask you to elaborate on why you would keep the rows separate? It is
my understanding (based on Page 39 of "An Introduction to Survival Analysis Using Stata" that there is
no difference between recording:
Id t0 t1 outcome x
2 0 5 1 2 (subject observed between 0 and 5)
---or recording:
Id t0 t1 outcome x
2 0 2 0 2 (subject observed between 0 and 2)
2 2 5 1 2 (subject observed between 2 and 5)
If I use two rows at the bottom of a record of a woman who has died, as you suggest, all of the potentially
time-varying variables will remain the same, since no new information is collected after the date of the
final interview, except the death date. If the above example is true, what is the advantage of keeping
the rows separate? Are you saying that the purpose would be to reduce the assumptions I am making about
covariates which are potentially time-varying? The problem with that would be that I think it would
render the entire row unusable, since hormonal contraceptive use (my exposure) would be one of those
variables which would then be left missing. Wouldn't that then render that entire row unusable, and therefore
discard all of the records of women who have died (which would really be a problem for a time to death analysis?)
Many thanks,
Chelsea
2. If a woman died, the date of her death obviously occurred after the date of her last interview,
so I will need to assume that information provided at her last interview date carried forward until
the time of her death. So for a woman who died, her last row would begin one year prior to her final
interview, but would end at the date of her death (her final interview date would not appear in the
row). Does this seem correct?
No, I would use a third row for the period from the last interview to the time of death. Fill in all the information that can reasonably be carried forward from the last interview and leave the rest missing.
3. This also means that if a woman contributed only one interview, the time span would be from
date of seroconversion to date of death, with the assumption that the information collected during
that one interview was consistent during that entire time span.
If you follow my advice on 2, then there should be two rows.
Tim
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/