Hi all,
I am trying to stset my panel data to fit duration analysis but I am not
sure whether what I'm doing is right and how I should deal with delayed
entry. Apologies for the rather long message, but I want to be clear on what
I do. My data comes from a survey that is held with the same individuals at
5 different times and looks like this:
id date of birth date of interview date of death income
........
1 12 dec 1940 12 jan 1991 1 april 1998 200
1 12 dec 1940 13 feb 1993 1 april 1998 225
1 12 dec 1940 01 ma 1997 1 april 1998 230
2 15 jan 1961 15 jan 1991 . 350
3 27 feb 1955 15 jan 1991 . 100
3 27 feb 1955 22 feb 1993 . 110
3 27 feb 1955 30 jan 1997 . 130
3 27 feb 1955 05 sep 2000 . 200
3 27 feb 1955 10 dec 2004 . 180
So if an individual died during waves, other household members are asked
about the exact date of death of this individual. In the above example, only
the 1st individual died, the 2nd attried and the 3rd is observed in all
waves (but still alive in last wave).
I first generate a time variable that captures the age of each individual
and add an extra line for each person that died with a dummy to indicate the
death, so the data looks like this:
id age income died ........
1 50.08 200 .
1 52.17 225 .
1 56.25 230 .
1 57.60 . 1
2 30.00 350 .
3 35.91 100 .
3 37.95 110 .
3 41.92 130 .
3 45.67 200 .
3 49.92 180 .
And thereafter I use the SNAPSPAN command - snapspan id age died, gen(time0)
replace - to transform data to:
id time0 age income died ........
1 . 50.08 . .
1 50.08 52.17 200 .
1 52.17 56.25 225 .
1 56.25 57.60 230 1
2 . 30.00 . .
3 . 35.91 . .
3 35.91 37.95 100 .
3 37.95 41.92 110 .
3 41.92 45.67 130 .
3 45.67 49.92 200 .
But this implies that I lose the information on the last observed income for
those that did not die. So unless I want to assume that income is a
'retrospective' variable, I will lose this information and people that are
only observed once go lost in the analysis?
If this is correct, how should I stset my data then? I tried - stset age,
id(id) failure(died)- which seems to work, but this doesn't take into
account that people are at risk of dying as from their birth and not as from
when they enter the survey. I tried using the age at first entry as the
enter() variable, but then all first observations are ignored.
Does anyone have an idea on what I am doing wrong and how I can solve this
problem?
Many thanks in advance!
Ellen
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/