Hi folks,
I'm using stata v8 at the moment, and am having a problem when I
manually replicate the output of the 'dstdize' command.
When I run the code (at the bottom of the email), my results are correct
for all ages, but I run into problems in the calculation of the adj_rate
variable when I repeat the analysis for the population aged 0-64.
Example 1: All Ages
Summary of Study Populations: (output for dstdize, for all ages)
state N Crude Adj_Rate Confidence Interval
------------------------------------------------------------------------
--
1 1010740 0.028408 0.018579 [ 0.018378,
0.018780]
2 1014144 0.030460 0.021090 [ 0.020867,
0.021314]
3 1012676 0.035128 0.023172 [ 0.022940,
0.023405]
4 1012054 0.039290 0.026232 [ 0.025980,
0.026484]
5 1012397 0.038515 0.031786 [ 0.031487,
0.032084]
Summary of output using manual calculations
+---------------------------------------------------------------+
| state N crude adj_rate lb ub |
|---------------------------------------------------------------|
1. | 1 1010740 .0284079 .0185789 .0183779 .01878 |
2. | 2 1014144 .0304602 .0210904 .0208671 .0213137 |
3. | 3 1012676 .0351277 .0231724 .0229398 .023405 |
4. | 4 1012054 .0392904 .0262319 .02598 .0264838 |
5. | 5 1012397 .0385145 .0317857 .0314869 .0320844 |
+---------------------------------------------------------------+
Example 2: Premature mortality 0-64 years inclusive
DSTDIZE OUTPUT
Summary of Study Populations:
state N Crude Adj_Rate Confidence Interval
------------------------------------------------------------------------
--
1 852842 0.004972 0.004496 [ 0.004360,
0.004633]
2 859488 0.006272 0.006011 [ 0.005851,
0.006171]
3 843940 0.007608 0.007257 [ 0.007080,
0.007433]
4 835061 0.009727 0.009414 [ 0.009211,
0.009618]
5 865780 0.012857 0.013308 [ 0.013064,
0.013552]
MANUAL CALCULATION OUTPUT
+--------------------------------------------------------------+
| state N crude adj_rate lb ub |
|--------------------------------------------------------------|
1. | 1 852842 .0049716 .0040019 .0038807 .0041231 |
2. | 2 859488 .0062723 .0053499 .0052074 .0054924 |
3. | 3 843940 .0076084 .0064583 .0063011 .0066156 |
4. | 4 835061 .0097274 .0083787 .0081976 .0085597 |
5. | 5 865780 .0128566 .0118441 .0116268 .0120614 |
+--------------------------------------------------------------+
Could anybody suggest why this might be the case? The only difference in
the calculation for the premature mortality version is that I have an if
statement which drops records if age_group >=14 ( representing 65years
+)
I have included the code below
Cheers
Dan
Statamodel.do
capture log close
capture log using "test.log", text replace
foreach a in all prem {
** prepare the death data
use mortality, clear
if "`a'" == "prem" {
drop if(age_group>=14)
}
save tmp_mort, replace
** prepare the standard population
use std_pop, clear
sort age_group
if "`a'" == "prem" {
drop if(age_group>=14)
}
list
save tmp_pop, replace
** run the stata command for direct standardisation
use tmp_mort, clear
dstdize deaths pop age_group, by(state) using (tmp_pop)
* now do the standardisation programmatically
use tmp_mort, clear
qui {
egen age = group(age_group)
egen N = sum(population), by(state)
egen cases = sum(deaths), by(state)
gen popdist = population/N
gen loc_crude = deaths/population
rename population loc_pop
sort age_group
merge age_group using tmp_pop
gen sum_n = sum(population)
* generate crude rate
by state, sort: gen crude = (cases/N)
* and the adjusted rate
gen product = loc_crude* population
by state, sort: egen adj_rate = sum(product)
generate prod2 = (population^2*loc_crude*(1-loc_crude)/loc_pop)
by state: egen se = sum(prod2)
replace se = sqrt(se)
* and the Upper and lower CI's
gen ub = adj_rate + invnorm(0.975)*se
gen lb = adj_rate - invnorm(0.975)*se
drop _merge
keep if age ==1
}
list state N crude1 adj_rate lb ub
}
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/