In the ongoing discussion of multiple logistic regression versus conditional
logistic regression approaches to analysis of data from a randomized,
prospective study with a parallel-groups design that includes baseline
measurement of the binary outcome variable, Ricardo Ovaldia presented a dataset
and regression results by both approaches.
>Continuing on a previous discussion, I applied both
>Joseph's and Kieran's method to a a large set of the
>seat belt intervention data and obtained some
>questionable results. Here is a summary table:
>
[redacted]
>
>With Joseph's method the p-value for the interaction
>is 0.683, indicating no treatment effect.
>But with Kieran's method the p-value is 0.032
>indicating a significant treatment effect. Looking at
>the actual data I believe the results from the
>conditional logistic more than the "MANOVA" like
>approach, given that the baselines are similar.
>
>What am I missing?
>
>Thank you,
>Ricardo.
With ANCOVA-like multiple logistic regression, which treats the baseline as a
predictor (covariate), the "main effects" of intervention (treatment) in
Ricardo's dataset were associated with a Z-statistic of 2.42 (P < 0.05). The
corresponding slope coefficient for the baseline covariate was asssociated with
a Z-statistic of 4.96 (P < 0.05). As Ricardo noted, the interaction term was
associated with a Z-statistic of -0.41 (P > 0.05). My interpretation of this
is that
(i) intervention *does* result in a statistically significant difference (at
the 5% level) in seatbelt usage from nonintervention, with an estimated odds
ratio of 2 (95% confidence interval: 1-4),
(ii) as expected, pretreatment seatbelt use strongly predicts postreatment
usage, with an odds ratio of 7 (3-14), and
(iii) pretreatment seatbelt usage and intervention do not interact.
Note that the interpretation is straightforward and analogous to that for
ANCOVA in that statistical significance of the baseline-by-treatment
interaction term is not needed to infer that treament or intervention displays
an effect.
I don't know how to interpret the results from the corresponding conditional
logistic regression printout, except to note, as Kieran McCaul earlier showed,
that the odds ratios reflect those obtained in separate McNemar tests of the
two treatment groups. From the brief numerical study described below, it is
apparent that, while the p-value from the conditional logistic regression
approach is affected by a lot of (irrelevant) things, it does not reflect the
statistical significance of the treatment factor, and therefore this approach
is unsuitable for this type of study design.
In the simulation, all scenarios have equal treatment group sizes, which is the
expectation in a randomized parallel-group design with a 1:1 treatment
assigment ratio. The first three settings (simulations) evaluate the "test
size" (Type I error rate) of both approaches under a variety of conditions.
The first setting has unbalanced baseline rates of responses and equal rates of
changeover due to treatment. The second setting balances the baseline rates of
response and maintains the equal rates of changeover. The third setting
maintains the balance in both baseline and changeover rates, and just increases
the changeover rates. The fourth and fifth settings illustrate the relative
power of the two approaches.
Setting 1 presents the case with a baseline imbalance: 1/3 positive response
for control treatment and 2/3 in experimental (intervention) treatment. There
is no treatment effect: 25% conversion in each treatment group at each level
of baseline response. The Type 1 error rates for the ANCOVA-like multiple
logistic regression are, as expected, in the 4-5% range. But that for the
conditional logistic approach yields a whopping 64% false-positive error rate,
an order of magnitude greater than the nominal level. (Detailed results are
shown below; apologies for the length of the post.) This reflects that this
latter approach cannot tease out baseline rates of response from the treatment
effects.
Setting 2 has balanced baseline rates of response (50% each) on the outcome
variable, but keeps the same 25% conversion to nonbaseline values in each,
i.e., again no treatment effect. Note that this will give an identical
expected value of the odds ratios for the two McNemar tests; both odds ratios
have an expected value of one. The ANCOVA-like multiple regression approach
again provides proper control over Type I error, with rates again in the 4-5%
range. Here, the conditional logistic regression approach is overly
conservative, with a Type I error rate less than half of the nominal rate.
Setting 3 is as that for Setting 2, but with a 50% conversion rate. The ANCOVA-
like approach gives Type I error rates in the 5-6% range. In contrast, the
conditional logistic approach yields a rate of less than one-half of one
percent, an order of magnitude lower than what it's supposed to be. The only
difference from Setting 2 is the higher rate of across-the-board conversion,
which induces a lower binomial correlation between pre- and posttreatment
outcomes; thus, the p-value of the conditional logistic regression approach
reflects the within-subject correlation.
Setting 4 keeps the balanced baseline rates of the previous cases, but changes
the rates of conversion differentially between treatment groups in order to
provide a small treatment effect: a 10% conversion of baseline-negatives to
positive posttreatment in the control treatment group, a 10% switching of
baseline-positives to negative in this group, a 15% conversion of baseline-
negatives to positive in the experimental treatment group and a 5% conversion
of baseline-positives to negative in this group. Although I don't know what
the two true-positive rate should be, the relative power of the two approaches
can be assessed, since both should show the treatment (intervention) effect--a
0% net change in the control (nonintervention) treatment group and a net 10%
conversion to positives in the experimental (intervention) treatment group. In
this setting, both approaches display the same relative power, about 11-12%
rejections of the null hypotheses. The test of the joint hypothesis, which is
only doable in the ANCOVA-like approach, shows slightly higher relative power,
about 15% rate of rejection, and probably reflects the differential switch rate
between the two levels of baseline that occurs only at one level of the
treatment factor. In any event, this enhanced power in the face of maintaining
the level of Type I error rate (Settings 1 through 3) is another argument for
using this joint-hypothesis as the default primary hypothesis.
Setting 5 is even more dramatic in the differentiation of the treatment groups:
the control treatment group remains as before with a 10% switchover at each
level of baseline response for a net 0% change, but the difference is
exaggerated for the experimental treatment group to a 25% switch from negative
to positive and a 2.5% switch from positive to negative. Again, the relative
power of the ANCOVA-like and conditional logistic approachs in this perfectly
balanced-baseline case is similar, about 32-34% rejection rates. Again, the
joint-hypothesis test is more powerful, about twice that of the main-effects-
only hypothesis. Its concordance with the differential switching rates
occuring differentially in one treatment group reinforces the argument for its
primacy. Even the interaction-only hypothesis test failed to discern this
situation reliably.
Given that the false-positive (Type 1 error) rate for the conditional logistic
regression approach is affected by baseline imbalance and by the gross rate of
conversion (binomial correlation between observations), I conclude that results
from this approach are uninterpretable for this type of study design. The
ANCOVA-like multiple logistic regression approach, however, maintains the
nominal level of Type I error, and has at least the power of the invalid
conditional logistic regression approach, and is even perhaps a smidgen better.
The results of the exercise follow immediately below, and the do-file follows
afterward.
Joseph Coveney
-------------------------------------------------------------------------------
Means represent rates of declaring statistical
significance at a nominal 5% level of Type 1 error rate
pclo: Conditional logistic regression
pant: ANCOVA-like, main effects of treatment
pani: ANCOVA-like, treatment-by-baseline interaction
panb: ANCOVA-like, treatment main effects & interaction
Setting 1
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
panb | 10000 .0426 .2019637 0 1
pant | 10000 .0483 .2144101 0 1
pani | 10000 .052 .2220381 0 1
pclo | 10000 .6355 .4813137 0 1
Setting 2
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
panb | 10000 .0404 .1969054 0 1
pant | 10000 .0426 .2019637 0 1
pani | 10000 .0485 .214831 0 1
pclo | 10000 .0213 .1443897 0 1
Setting 3
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
panb | 10000 .0476 .212929 0 1
pant | 10000 .0529 .223845 0 1
pani | 10000 .057 .2318542 0 1
pclo | 10000 .0048 .069119 0 1
Setting 4
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
panb | 10000 .1473 .3544224 0 1
pant | 10000 .1233 .3287977 0 1
pani | 10000 .0263 .160034 0 1
pclo | 10000 .1109 .314024 0 1
Setting 5
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
panb | 10000 .5955 .4908196 0 1
pant | 10000 .3372 .4727774 0 1
pani | 10000 .0147 .1203551 0 1
pclo | 10000 .3156 .4647776 0 1
-------------------------------------------------------------------------------
program define corlo1
version 8.2
replace dep0 = 1 in 67/166
generate byte dep1 = abs(dep0 - (uniform() > 0.75))
end
program define corlo2
version 8.2
replace dep0 = 1 in 51/150
generate byte dep1 = abs(dep0 - (uniform() > 0.75))
end
program define corlo3
version 8.2
replace dep0 = 1 in 51/150
generate byte dep1 = abs(dep0 - (uniform() > 0.5))
end
program define corlo4
version 8.2
replace dep0 = 1 in 51/150
generate byte dep1 = dep0
replace dep1 = abs(dep0 - (uniform() > 0.90)) in 1/50
replace dep1 = abs(dep0 - (uniform() > 0.90)) in 51/100
replace dep1 = abs(dep0 - (uniform() > 0.85)) in 101/150
replace dep1 = abs(dep0 - (uniform() > 0.95)) in 151/l
end
program define corlo5
version 8.2
replace dep0 = 1 in 51/150
generate byte dep1 = dep0
replace dep1 = abs(dep0 - (uniform() > 0.90)) in 1/50
replace dep1 = abs(dep0 - (uniform() > 0.90)) in 51/100
replace dep1 = abs(dep0 - (uniform() > 0.75)) in 101/150
replace dep1 = abs(dep0 - (uniform() > 0.975)) in 151/l
end
program define corlo, rclass
version 8.2
syntax , setting(integer)
drop _all
set obs 200
generate byte dep0 = 0
corlo`setting'
generate byte trt = _n > _N / 2
generate byte iac = trt * dep0
logistic dep1 dep0 trt iac, nolog
test trt iac
return scalar anb = r(p)
test trt
return scalar ant = r(p)
test iac
return scalar ani = r(p)
generate int pid = _n
reshape long dep, i(pid) j(per)
replace iac = trt * per
xtlogit dep trt per iac, i(pid) fe nolog
test iac
return scalar clo = r(p)
end
program define runem
version 8.2
clear
set more off
set seed 20040211
display
display as text "Means represent rates of declaring statistical"
display as text " significance at a nominal 5% level of Type 1 error rate"
display
display as text "pclo: Conditional logistic regression"
display as text "pant: ANCOVA-like, main effects of treatment"
display as text "pani: ANCOVA-like, treatment-by-baseline interaction"
display as text "panb: ANCOVA-like, treatment main effects & interaction"
display
forvalues scenario = 1/5 {
display as input "Setting `scenario'"
quietly simulate "corlo, setting(`scenario')" anb = r(anb) ///
ant = r(ant) ani = r(ani) clo = r(clo), reps(10000)
foreach var of varlist _all {
generate byte p`var' = `var' < 0.05
}
summarize p*
display
}
end
runem
exit
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/