Sara Mottram <[email protected]> writes,
> I am having some difficulty with -stset-. I'm almost certain that the
> fault lies with my data, as this same command has worked before in a
> similar dataset. However, I wonder if anyone could give me an idea as to
> where I might start looking to find the problem.
>
> [...]
>
> [...] I know from a tabulation of the data that there are 734
> consultations, but when I use -stset- it identifies 730 events. One
> person consults at time 0, so I think this person is being ignored - I
> understand this. However, this still leaves three events that are
> unidentified.
>
> [...]
And Sara included the following output:
-------------------------------------------------------------------
. stset cons_dt, id(surveyid) fail(kcons_post_3yr==1) origin(time
> edateass) exit(time censor_date)
id: surveyid
failure event: kcons_post_3yr == 1
obs. time interval: (cons_dt[_n-1], cons_dt]
exit on or before: time censor_date
t for analysis: (time-origin)
origin: time edateass
----------------------------------------------------------
16704 total obs.
28 obs. end on or before enter()
----------------------------------------------------------
16676 obs. remaining, representing
742 subjects
730 failures in multiple failure-per-subject data
703420 total analysis time at risk, at risk from t = 0
earliest observed entry t = 0
last observed exit t = 1096
-------------------------------------------------------------------
First, Sara, notice how Stata writes the time interval:
obs. time interval: (cons_dt[_n-1], cons_dt]
That is ( meaning open interval and ] meaning closed interval. Hence,
a subject with the interval (0,0] makes no sense. That subject failed
before he or she entered.
Do you have other examples like this. Do you, perhaps, have someone else
with interval (12,12] or (20,20]? That would be the same story.
Note that -stset- reported
28 obs. end on or before enter()
so Sara must have obs like (12,12] or (20,20], or she has more obvious
errors such as (20,12].
Assuming the problems are all of the form (12,12] and (20,20], I would do
the following:
. replace censor_date = censor_date + .125
and try again. I'm assuming that Sara's dates are all integers and so
moving all the censoring dates forward just a little won't matter.
There's nothing magic about .125; Sara could use .0625 or .03125 or even,
say .00390625. Or .1, .01, .001, etc. The only reason I don't use nice
numbers like .1, and .01 is that binary computers cannot store exactly
negative powers of 10, and so later, I cannot type things like
. list if censor_date==12.1
I have to type things like
. list if censor_date==float(12.1)
and I invariably forget, so I use negative powers of 2 to shift dates.
Anyway, perhaps moving the end dates forward just a little will solve the
problem.
Or maybe not. Sara has lots of dates in her files. Quoting from the output
again:
obs. time interval: (cons_dt[_n-1], cons_dt]
exit on or before: time censor_date
t for analysis: (time-origin)
origin: time edateass
So we need to look at cons_dt as well. And we need to look censor_date and
edateass carefully, because Sara has multiple records per subject.
I would do the following:
. sort surveyid cons_dt
// make sure dates are growing
. by surveyid: assert cons_dt > cons_dt[_n-1] if _n>1
// make sure censor_date is constant
. by surveyid: assert censor_date == censor_date[1]
// make sure edateass is constant
. by surveyid: assert edateass == edateass[1]
// make sure censor_date after enter date
. by surveyid: assert censor_date > cons_dt[1]
-- Bill
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/