Arne Kolstad
> I have survival time data about sickness spells, in the
> following form:
>
> personid startdate stopdate
> 1 01mai1997 07dec1997
> 1 28jan2002 09feb2002
> 2 31jul1994 06mar1998
> .
> .
> N 31dec2002 (sensored)
>
> What I need is a table a) with prevalences for each day :
>
> month spersons
> 01jan1994 897
> 02jan1994 789
> .
> .
> 31dec2002 987
>
> ---
>
> and a table b) of person-days of sickness for each month
> through the period
> of interest:
>
>
> month pdays
> jan1994 22345
> feb1994 24567
> .
> .
> dec2002 26789
>
> ---
>
>
> I believe I will have my a) data set thusly:
>
> forvalues x=12419/15705 {
> quietly stdes if startdate<=`x' & stopdate>`x'
> di r[N_sub]
> }
>
> So to the real problem: The data set has more than 5
> million records.
> Looping through thousands of days is slow, partly because
> stdes doea a lot
> of work, and I need to repeat it a lot of times as
> different versions of the
> data are produced. Is there a more efficient method?
>
> >From table a) to table b) should be straightforward, but
> is there a really
> efficient code hidden somewhere among the st commands or elsewhere?
Don't loop!
What a neat problem! I don't know about -st-, but here
is one first principles attack:
/// get your data in long form and -sort-ed on date:
rename startdate date1
rename stopdate date2
reshape long date , i(personid)
sort date
// the number of persons who are sick increases by 1
// every time someone goes on sick leave and decreases
// by 1 every time some one stops
gen spersons = sum((_j == 1) - (_j == 2))
// reduce to one observation daily
bysort date : keep if _n == _N
// fill in gaps
gen lag = date[_n+1] - date
expand lag
bysort date : replace date = date[_n-1] + 1 if _n > 1
// listing
l date spersons
// monthlt summary
gen month = mofd(date)
egen Spersons = sum(spersons), by(month)
tabdisp month, c(Spersons)
(plus some adjustment dependent on how censoring is
done?)
Nick
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/