| |
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
st: streg, enter and origin
Dear Statalisters,
in writing my thesis, my (lack of) knowledge of Stata's -stset-
function seem to have become a problem; especially the options `enter'
and `origin' cause confusion. (I have consulted the ST manual many
times, but that did not help.)
My data is a stock sample of a population that is followed from
randomisation on 1 January 1992 until 1 January 2005. I have data for
date of birth (in the range form 1927 until 1969) and date of death
(for those who die).
For the survival times, I have generated a variable called `survival'
that counts the days of survival for an observation from day 0
(1.1.1992) until day 4,749 (1.1.2005). For the censoring/failure issue,
I have generated a dummy called `failure' that is equal to one for the
observations who die, and zero otherwise. Finally, the month, day, and
year of birth are stored in variables called `bm', `bd', and `by'.
The analysis I ultimately want to do is a Cox or a parametric
regression with the likelihood function weighted by the survivor
function to deal with the length-biased sampling issue. For this
purpose I have -stset- my data like this:
. stset survival, failure(failure) origin(time mdy(bm,bd,by))
enter(time mdy(1,1,1992))
failure event: failure != 0 & failure < .
obs. time interval: (origin, survival]
enter on or after: time mdy(1,1,1992)
exit on or before: failure
t for analysis: (time-origin)
origin: time mdy(bm,bd,by)
------------------------------------------------------------------------
------
92348 total obs.
0 exclusions
------------------------------------------------------------------------
------
92348 obs. remaining, representing
4717 failures in single record/single failure data
4.29e+08 total analysis time at risk, at risk from t = 0
earliest observed entry t = 8037
last observed exit t = 28485
Above I have added a constant of 11,688 to the `survival'-variable,
because the observations born after 1960 were excluded in a earlier
version of my -stset- (11688 is = mdy(1,1,1992)); presumably because
1.1.1960 means 0 to Stata and observations born after 1960 then ended
up with negative survival times when I introduced `origin' (because t =
`survival' - origin).
So, my question is if the procedure above is correct, and if not, if
there is a better way to do the -stset-.
Kind regards,
Henrik Lindegaard
Aarhus, Denmark
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/