Title | Stata 6: Continuity adjustments | |
Author | William Gould, StataCorp |
The short answer is that we do not apply the continuity adjustment, but Epi Info does. The rest of the FAQ details why we believe our answer is to be slightly preferred except when N is very small, in which case neither result is to be trusted.
A user sent the following 2x2 table to us:
| Exposed Unexposed ---------+------------------------ Cases | 11 3 Controls | 106 223
The user reported that Stata and Epi Info differed in their reported 95% confidence intervals even though both packages claimed to be using the Cornfield approximation. The reported confidence intervals are
Epi Info [1.94, 35.63] Stata [2.26, 26.20]
The Stata result can be obtained by typing cci 11 3 106 223.
We have independently verified that Stata results are the results intended; see Appendix below.
We have independently verified that the Epi Info results are the results they intended; see Appendix below.
The difference in reported results is not due to programming errors. Rather, the difference hinges on whether one makes a continuity correction to the Cornfield iterative formula.
The Cornfield formula presented in Schlesselman (1982, 177) includes the continuity correction. Our two justifications for not including the continuity correction are
If you really care about the confidence interval when dealing with small N, you should be using exact methods such as those available in the StatXact software package.
Logistic regression provides another way one can obtain estimates of the odds ratio and the standard error. The estimated odds ratio will be the same as reported by Stata’s cci command (and by Epi Info). The standard error and derived confidence interval will be different from those reported by cci because different formulas are used.
In any case, we obtained the following results:
Epi Info [1.94, 35.63] Stata [2.26, 26.20] logistic regression [2.11, 28.23]
Below we obtain the logistic regression results:
. list dead expos pop 1. 1 1 11 2. 1 0 106 3. 0 1 3 4. 0 0 223 . logistic dead exp [fw=pop] Logit Estimates Number of obs = 343 LR chi2(1) = 12.15 Prob > chi2 = 0.0005 Log Likelihood = -214.05327 Pseudo R2 = 0.0276
dead | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] | |
expos | 7.713836 5.105902 3.087 0.002 2.107888 28.22885 | |
As a quick way of determining the reliability of the Cornfield approximation without the continuity correction, we ran a simulation, under the null hypothesis (odds ratio==1), for a table with the same marginals as in the example above. In 1,000 replications, the results were
. summarize
Variable | Obs Mean Std. Dev. Min Max | |
accept | 1000 .96 .1960572 0 1 |
That is to say, the C.I. reported by Stata that was calculated without the continuity correction resulted in nonrejection of the null hypothesis in 960 out of 1,000 cases. Thus widening the confidence interval — as the continuity correction would — does not seem called for.
The following Stata do-file will reproduce the simulation results reported above and allow you to run your own:
------------------------------------------ BEGIN --- mysim.do --- CUT HERE --- version 6.0 program drop _all program define mkdta set obs 343 gen exposed = _n<=14 end program define asim gen u = uniform() sort u gen case = _n<=117 cc case exposed post mm ($S_10<=1 & $S_11>=1) drop u case end program define sim drop _all mkdta postfile mm accept using myres, replace local i 1 qui while `i' <= `1' { asim local i = `i' + 1 } postclose mm use myres, clear end set seed 39483 sim 1000 sum -------------------------------------------- END --- mysim.do --- CUT HERE ---
The purpose of this appendix is to establish that Stata is using the Cornfield approximation without the continuity correction and that Epi Info is using the same formula with the continuity correction.
Let us use the following notation:
| Exposed Unexposed | ---------+-----------------------+--- Cases | a b | M1 Controls | c d | M0 ---------+-----------------------+--- | N1 N2 | T
The Cornfield confidence interval is
ol = al(M0 - N1 + al)/((N1-al)(M1-al)) ou = au(M0 - N1 + au)/((N1-au)(M1-au))
where al and au are obtained from
a[i+1] = a +/- z*1/sqrt( 1/a[i] + 1/(N1-a[i]) + 1/(M1-a[i]) + 1/(M0-N1+a[i]) )
At least, that is the formula Stata uses. Epi Info uses
a[i+1] = a +/- .5 +/- z*1/sqrt( 1/a[i] + 1/(N1-a[i]) + 1/(M1-a[i]) + 1/(M0-N1+a[i]) )
That is, Epi Info includes the continuity correction whereas Stata does not.
The following program will reproduce the Stata results:
program define upper /* a0 */ local a = 11 local b = 106 local c = 3 local d = 223 local M1 = `a' + `b' local M0 = `c' + `d' local N1 = `a' + `c' local N0 = `b' + `d' local T = `M1' + `M0' local z = 1.96 local ai = `1' while (1) { di `ai' " " `ou' local ai = `a' + `z'*1/sqrt( /* */ 1/`ai' + /* */ 1/(`N1'-`ai') + /* */ 1/(`M1'-`ai') + /* */ 1/(`M0'-`N1'+`ai') /* */ ) local ou = `ai'*(`M0'-`N1'+`ai') / /* */ ((`N1'-`ai')*(`M1'-`ai')) } end
The result of running this program is
. upper 3 3 13.962681 820.50662 11.37803 9.1775262 13.819558 167.61792 11.826157 11.577601 13.62256 78.771227 12.184741 14.356851 13.436792 51.933344 12.435577 17.061584 13.288298 40.55852 [output omitted] 12.932766 26.192115 12.932766 26.192115 12.932766 26.192115 --Break-- r(1);
The slight difference from the result reported by Stata is due to our use of the (imprecise) 1.96.
We then modified the program to add ½ to `ai'. This resulted in nonconvergence. However, if we first converged the noncontinuity corrected formula and then used the continuity corrected formula, the formula would converge to 35.635.