Ricardo Ovaldia wrote:
>Thank you Joseph and Kieran.
>I originally though to model this problem as Joseph's
>"ANCOVA-like approach" but without the interaction
>term (i.e.):
>
>xi: logistic followup i.baseline i.intervention
>
>If I do these, isn't the test: Beta(intervention)=0
>testing whether the intervention had an effect? I am
>not certain what the interaction term adds in this
>context? Please excuse me if this is a stupid
>question, but I do not get it. What am I missing?
Well, here's my take on it: the interaction term tests the analogue of what it
would in a linear model--whether the intervention effect depends upon the level
of baseline. In one sense, it seems difficult to fathom in a pre-post design:
if a person doesn't wear a seatbelt prior to intevention, then the odds that
that person wears a seatbelt is zero and the the intervention odds ratio for a
group of like-behaving people would be infinite, regardless of whether they
wore seatbelts after intervention. In another sense, however, the interaction
term measures how justified you are in collapsing a 2 X 2 X 2 table (baseline X
intervention X outcome) into a 2 X 2 table (intervention X outcome). In this
latter sense, it would test whether ratio of the odds that a person wears a
seatbelt after experimental intervention to the odds that a person wears a
seatbelt after control intervention needs to take into account the odds that
the person wears a seatbelt before intevention. The analogy with the linear
model is seen better with -logit- and -lincom- as shown below.
clear
set obs 400
set seed 20040131
generate byte baseline = _n > _N / 2
generate byte treatment = mod(_n, 2)
generate byte result = uniform() < 1 / 4
replace result = uniform() < 3/4 if baseline == 1 & treatment == 0
table baseline treatment, contents(mean result)
generate byte iac = baseline * treatment
logistic result baseline treatment iac, nolog
* Creating the analogue of the "cell means model" of ANOVA
egen byte group = group(baseline treatment)
xi: logit result i.group, nolog
* The following linearized contrast is
* the interaction term (iac) in -logistic- above.
lincom _Igroup_4 - _Igroup_3 - _Igroup_2
lincom _Igroup_4 - _Igroup_3 - _Igroup_2, or
exit
I'll take a crack at answering my own questions:
>1. Which is better for binary outcomes, Kieran's repeated-measures approach
>or an ANCOVA-like approach using the pretreatment values as a baseline
>covariate in conventional logistic regression? The do-file below suggests
>that completely different conclusions would be drawn from the same dataset
>depending upon which approach is used to analyze it.
It looks like the ANCOVA-like approach is the one to use, from the results of a
Monte Carlo simulation under the null hypothesis. The false-positive rejection
rate for the repeated-measures approach is orders of magnitude too high. (See
do-file below.)
>2. As Kieran mentioned, the repeated-measures approach drops one of the
>"main effects" (treatment) so that the model ends up having an interaction
>term in it when one of the component "main effects" terms contributing to
>the interaction is not in the model. This would be a no-no from what I've
>heard, at least for the analogous situation in ANOVA. But, I assume that
>this *not* a problem for conditional logistic regression due to the
>conditioning. Is that correct?
Apparently not. (See answer for 1. above.)
>2. When using the likelihood-ratio test (-lrtest-), which is the proper
>model against which to compare for testing individual "main effects" of
>treatment and baseline--the saturated model (*with* the interaction) or the
>partially reduced model (*no* interaction term, i.e., the model that
>includes only both of the main effects)? Or should we be testing a
>constant-only model against one with the "main effect" in order to test that
>"main effect"?
Well, they test different hypotheses: one tests whether there is an effect of
intervention at both levels of baseline, and the other tests whether there is
an effect of intervention at *any* level of baseline. (Mentioned by Frank E.
Harrell, Jr., in the context of clinical studies in
hesweb1.med.virginia.edu/biostat/presentations/feh/covadj.pdf .) I knew this,
but this answer doesn't really answer my question: which hypothesis ought we
to be testing as a default when we believe that a baseline covariate is
sufficiently important to include in a model a priori (in the protocol or
statistical analysis plan)?
Joseph Coveney
Monte Carlo simulation evaluating null-hypothesis behavior of ANCOVA-like and
repeated-measures approaches to pre-post design with binary endpoint:
clear
set more off
set seed 20040130
*
program define twolog, rclass
version 8.2
tempvar a b iac pid dep per
tempname A B C
drop _all
set obs 200
generate byte `a' = _n > _N / 2
generate byte `b' = mod(_n, 2)
generate byte `iac' = `a' * `b'
generate byte `dep' = uniform() > 0.5
logistic `dep' `a' `b' `iac'
estimates store `A'
logistic `dep' `a' `b'
estimates store `B'
logistic `dep' `b'
lrtest `A' .
return scalar ancova_iac = r(p)
lrtest `B' .
return scalar ancova_me = r(p)
drop `iac'
estimates drop _all
generate int `pid' = _n
rename `b' `dep'0
rename `dep' `dep'1
reshape long `dep', i(`pid') j(`per')
generate byte `iac' = `a' * `per'
clogit `dep' `a' `per' `iac', group(`pid')
estimates store `C'
clogit `dep' `per', group(`pid')
lrtest `C' .
return scalar rpm = r(p)
end
*
simulate "twolog" ancova_me = r(ancova_me) rpm = r(rpm) ///
ancova_iac = r(ancova_iac), reps(3000)
generate byte pancova_me = ancova_me < 0.05
generate byte prpm = rpm < 0.05
generate byte pancova_iac = ancova_iac < 0.05
summarize p*
exit
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/