Dear Statalisters,
I am analyzing a cohort study on the risk of breast cancer after
surviving childhood cancer. The cohort includes 8000 female survivors
of childhood cancer of whom 75 subsequently have developed breast
cancer. I have calculated standardized incidence ratios of breast
cancer by type of childhood cancer, treatment decade, follow-up time,
attained age etc.
To do this I have stset and then stsplit the data by calendar period
and age and merged with reference rates from the general population. I
have also split on attained age and risk interval (=time since
childhood cancer diagnosis) because these variables vary with time.
Then I have calculated the expected number of breast cancers by
multiplying the person-years in each stratum by this rate.
**stset and split data***
.stset dox, fail(fail) origin(dob) entry(doe) scale(365.25) id(id)
.stsplit ageband, at(0 1 5(5)85)
.stsplit calender_period, after(time=d(1/1/1900)) at(0 71(1)106)
.replace calender_period = calender_period + 1900
***merge to external ref rates****
.sort ageband calender_period
.merge ageband calender_period using rates/tmp_rates.dta
.drop if _merge!=3
**split on age and time since childhood cancer diagnosis**
.stsplit age, at(0,20,30,40,50) //attained age
.stsplit riskint, after(time=doe) at(5,10,15,20,25,30,35)
****calculate Expected and Personyrs****
.gen pyrs = _t - _t0
.gen E=(pyrs*rate)
To calculate the SIRs by childhood cancer diagnosis I collapsed the
data by diagnosis. The variable diagnosis includes 10 different
categories (leukaemia, Hodgkin, Non-Hodgkin etc.).
I used Poisson regression to calculate the Incidence Rate Ratio, which
is essentially a ratio of SIRs (the baseline SIR, from the leukaemia
group,which is he reference group, versus the group of interest).
. collapse (sum) _d E pyrs, by(diag)
. xi:poisson _d i.diag if E!=0, exposure(E) irr
Now from the Incidence Rate Ratio I would like to calculate the SIR
for each group, so I used:
.predict coef, xb nooffset
.gen SIR=exp(coef)
. bysort diag: sum SIR
I did this for all the variables I am interested in. I know there are
easier ways to calculate the SIR, but I have used this approach
because I like to calculate SIRs in a multivariate Poisson model as
well and I thought that this would be the best approach.
So next I used a multivariate approach. I collapsed the data on all
variables that I am interested in.
. collapse (sum) _d E pyrs, by(diag trtagegp rt ct trt_dec riskint age)
. xi:poisson _d i.diag i.trtagegp i.rt i.ct i.trt_dec i.riskint i.age
if E!=0, exposure(E) irr
_d = observed number of breast cancers
diag= diagnostic group (type of childhood cancer)
trtagegp= age at start of childhood cancer treatment (0-4, 5-9, 10-14)
rt= treatment with radiotherapy (yes/no)
ct = treatment with chemotherapy (yes/no)
trt_dec = decade of initial treatment (1970-1979, 1980-1989, >1990)
riskint = time since childhood cancer diagnosis
age= age
Now my questions is:
the output of the model gives me Incidence Rate Ratios and uses the
first category of each variable as the reference category. How do I
get (adjusted) SIRs? I guess I could use the same approach as above
and use:
.predict coef, xb nooffset
But what do I do next?
Thanks,
Raoul
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/