Continuing on a previous discussion, I applied both
Joseph's and Kieran's method to a a large set of the
seat belt intervention data and obtained some
questionable results. Here is a summary table:
--------------------------------------------------
case | N pre post
----------+---------------------------------------
0 | 140 60(42.9%) 72(51.4%)
1 | 139 53(38.1%) 89(64.0%)
--------------------------------------------------
In the control group (case=0) we so an increase from
42.9% to 51.4% (diff=8.5%), whereas in the
intervention group (case=1), we so an increase from
38.1% to 64.0% (diff=25.9%). So the increase appears
to be greater in the intervention group than in the
control group. i.e. the intervention seem to work.
Here are the results of Joseph's MANOVA like approach:
. xi:logistic post pre case i.case*pre
i.case _Icase_0-1 (naturally
coded; _Icase_0 omitted)
i.case*pre _IcasXpre_# (coded as above)
note: _Icase_1 dropped due to collinearity
note: pre dropped due to collinearity
Logistic regression
Number of obs = 279
LR
chi2(3) = 49.88
Prob
> chi2 = 0.0000
Log likelihood = -165.12038
Pseudo R2 = 0.1312
------------------------------------------------------------------------------
post | Odds Ratio Std. Err. z P>|z|
[95% Conf. Interval]
-------------+----------------------------------------------------------------
pre | 6.824176 2.644293 4.96 0.000
3.193147 14.58416
case | 2.175824 .7000504 2.42 0.016
1.158132 4.0878
_IcasXpre_1 | .7868083 .4614124 -0.41 0.683
.2492838 2.483384
------------------------------------------------------------------------------
and Kieran's conditional logistic method yields:
. xi:clogit period i.seatbelt*i.case,group(
participantid) or nolog
i.seatbelt _Iseatbelt_0-1 (naturally
coded; _Iseatbelt_0 omitted)
i.case _Icase_0-1 (naturally
coded; _Icase_0 omitted)
i.sea~t*i.case _IseaXcas_#_# (coded as above)
note: _Icase_1 omitted due to no within-group
variance.
Conditional (fixed-effects) logistic regression
Number of obs = 558
LR
chi2(2) = 31.09
Prob
> chi2 = 0.0000
Log likelihood = -177.84119
Pseudo R2 = 0.0804
------------------------------------------------------------------------------
period | Odds Ratio Std. Err. z P>|z|
[95% Conf. Interval]
-------------+----------------------------------------------------------------
_Iseatbelt_1 | 1.857143 .6156369 1.87 0.062
.9697834 3.556443
_IseaXcas_~1 | 2.961538 1.503159 2.14 0.032
1.095169 8.008541
------------------------------------------------------------------------------
With Joseph's method the p-value for the interaction
is 0.683, indicating no treatment effect.
But with Kieran's method the p-value is 0.032
indicating a significant treatment effect. Looking at
the actual data I believe the results from the
conditional logistic more than the "MANOVA" like
approach, given that the baselines are similar.
What am I missing?
Thank you,
Ricardo.
--- Ricardo Ovaldia <[email protected]> wrote:
> Thank you Joseph and Kieran. Obviously this was not
> the easy question I though it was. I have spent
> several days contemplating the answers and playing
> around with my data. Although I find Kieran's
> conditional logistic approach appealing, I
> understand
> and agree with Joseph's concerns and objections.
> Faced
> with the need to analyze these data and the eventual
> submission for publication I fear that reviewers may
> disagree with which ever method I select. The issue
> becomes more complicated when one considers the
> effect
> of additional covariates such as sex on the
> intervention.
>
> Regardless of all this, I appreciate tremendously
> Joseph and Kieran comments and time thinking about
> this problem.
>
> Ricardo.
>
>
> --- Joseph Coveney <[email protected]> wrote:
> >
> > Kieran McCaul posted results from a randomized
> > parallel-group design study to
> > illustrate the use of conditional logistic
> > regression. The study randomized
> > households to an intervention designed to promote
> > banning of smoking in the
> > home. Policy in the home was measured before and
> > after intervention. Kieran
> > invited Ricardo and I to respond with what we
> think
> > of advocating conditional
> > logistic regression to assess the efficacy of the
> > intervention for before-and-
> > after studies based upon the results posted for
> that
> > study.
> >
> > I don't claim to speak for Ricardo, but his
> original
> > question related to
> > imbalances in the baseline rates of the outcome
> > between the two parallel
> > intervention groups. It appears that Kieran's
> study
> > was successful in its
> > randomization (or used stratified randomization
> and
> > didn't lose too many
> > households to dropout), because the proportions of
> > households banning smoking
> > at baseline were nearly identical between the
> > intervention groups. With
> > essentially identical rates of baseline, there
> would
> > be little or no cause for
> > concern about confounding due to it and little
> > statistical difference in
> > including baseline as a covariate. And, in fact,
> > both conditional logistic
> > regression approach and the so-called ANCOVA-like
> > multiple logistic regression
> > approach give essentially similar results in this
> > balanced study. (I think the
> > same would have obtained for Ricardo's study had
> the
> > baseline rates of seatbelt
> > use been similar between the two intervention
> > groups.)
> >
> > But, let's look at the issue of which approach is
> > more suitable when the
> > concern is, as it was for Ricardo, to analyze an
> > intervention effect _in the
> > face of an imbalance in the baseline rates of an
> > outcome_.
> >
> > If Kieran will indulge me one more time to use a
> > fictional dataset to
> > illustrate a point, let's say that Kieran's
> > randomization method did not
> > stratify on baseline household smoking policy, and
> > suffered an unfortunate
> > imbalance due to chance, for instance a 50 : 50
> > ratio of households banning
> > smoking at baseline in the nonintervention group,
> > but a 75 : 25 ratio in the
> > intervention group. Let's say that 2 of the 50
> > households that previously
> > banned smoking in the nonintervention group now
> > permit it, a worsening of 4%
> > (if your health policy is to ban smoking), and
> that
> > only 1 of the 50 households
> > that didn't ban smoking now do so in the
> > nonintervention group, a meager
> > improvement of 2%. Let's say that 4 of the 75
> > households that banned smoking
> > at baseline switched and permitted smoking in the
> > home after the intervention,
> > and 2 of the 25 households that didn't ban smoking
> > switched as a result of the
> > intervention. The results of the intervention are
> a
> > slightly greater 5.3%
> > worsening (compare to 4%) in the former nonbanning
> > household population, but a
> > much greater 8% (compare to 2%) improvement among
> > the formerly permissive
> > households.
> >
> > Now, the effects of intervention are no great
> > shakes, but I think that it would
> > be safe to say that it's not *nothing*, especially
> > if you somehow take into
> > account the possible confounding effect of the
> > chance unfortunate imbalance in
> > baseline policy between treatment groups.
> >
> > But, by the conditional logistic regression
> > approach, it *is* nothing--the odds
> > ratio for both nonintervention and intervention
> > groups is 0.5 (McNemar's test
> > uses only the off-diagonal values and ignores the
> > diagonal values) so the ratio
> > of the two odds ratios is 1.0, and this is what
> the
> > conditional logistic
> > regression dutifully reports: the period term is
> > 0.5 and the interaction
> > term's odds ratio is 1.0 with a Z-statistic of
> 0.00
> > and a p-value of 1.00.
> > Granted, the confidence interval encompasses a
> lot,
> > but the point estimate and
> > hypothesis test for the interaction term (which is
> > ostensibly the effect of
> > intervention) just don't give the same take-home
> > message as inspection of the
> > data. So, my conclusion differs from Kieran's on
> > this; I don't think that
> > conditional logistic regression is valid to test
> for
> > differences between
> > treatment effects (differences between treatment
> > differences, which are between-
> > subject effects) in parallel-group designs with a
> > repeated binary outcome
> > measure, especially in the presence of baseline
> > differences in the outcome
> > measure, which are ignored in the conditional
> > logistic model.
> >
> > In contrast, the ANCOVA-like,
> baseline-as-covariate
> > multiple regression
> > approach does provide a separate, and I think
> > competent, handling of baseline
> > differences and their potential for confounding.
> In
> > the fictitious example,
> > this approach shows the pronounced effect of
> > baseline smoking policy as
> > expected, and it shows that the odds ratio for
> > intervention isn't 1.0 given
> > baseline differences between intervention groups.
> > The saturated model (with
> > the interaction term) also helps to put the
> > potential for confounding into
> > perspective. (The do-file for all of this is
> below
> > for anyone interested.)
> >
> > It seems that at least some of the discrepancy
> > between the two approaches
> > reflects Simpson's paradox. This is the same
> > underlying phenomenon that
> > results in bias in logistic regression
> coefficients
> > (and in nonlinear
> > regression, in general) when important covariates
> > are left out of the model.
> > This is what Frank E. Harrell Jr.'s lecture dealt
> > with in the URL given in my
> > last posting. And it relates to the
> > "noncollapsibility of odds ratios" that
> > epidemiologists sometimes refer to.
> >
> > In fairness to us all (Kieran, Ricardo and me), it
> > seems that the matter of
> > which approach is better isn't completely settled
> > even for *linear* models,
> > where this incollapsibility-of-odds-ratios
> > phenomenon and the incidental
> > parameters problem don't apply: there is a thread
> > ("Repeated measures and
>
=== message truncated ===
=====
Ricardo Ovaldia, MS
Statistician
Oklahoma City, OK
__________________________________
Do you Yahoo!?
Yahoo! Finance: Get your refund fast by filing online.
http://taxes.yahoo.com/filing.html
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/