I have survival time data about sickness spells, in the following form:
personid startdate stopdate
1 01mai1997 07dec1997
1 28jan2002 09feb2002
2 31jul1994 06mar1998
.
.
N 31dec2002 (sensored)
---
What I need is a table a) with prevalences for each day :
month spersons
01jan1994 897
02jan1994 789
.
.
31dec2002 987
---
and a table b) of person-days of sickness for each month through the period
of interest:
month pdays
jan1994 22345
feb1994 24567
.
.
dec2002 26789
---
I believe I will have my a) data set thusly:
forvalues x=12419/15705 {
quietly stdes if startdate<=`x' & stopdate>`x'
di r[N_sub]
}
So to the real problem: The data set has more than 5 million records.
Looping through thousands of days is slow, partly because stdes doea a lot
of work, and I need to repeat it a lot of times as different versions of the
data are produced. Is there a more efficient method?