[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: Comparing change in rates - frustrating problem: questionableresults

From	Joseph Coveney <[email protected]>
To	Statalist <[email protected]>
Subject	Re: st: Comparing change in rates - frustrating problem: questionableresults
Date	Wed, 11 Feb 2004 20:50:45 +0900
In the ongoing discussion of multiple logistic regression versus conditional 
logistic regression approaches to analysis of data from a randomized, 
prospective study with a parallel-groups design that includes baseline 
measurement of the binary outcome variable, Ricardo Ovaldia presented a dataset 
and regression results by both approaches.

>Continuing on a previous discussion, I applied both
>Joseph's and Kieran's method to a a large set of the
>seat belt intervention data and obtained some
>questionable results. Here is a summary table:
>
[redacted]
>
>With Joseph's method the p-value for the interaction
>is 0.683, indicating no treatment effect.
>But with Kieran's method the p-value is 0.032
>indicating a significant treatment effect. Looking at
>the actual data I believe the results from the
>conditional logistic more than the "MANOVA" like
>approach, given that the baselines are similar.
>
>What am I missing?
>
>Thank you,
>Ricardo.

With ANCOVA-like multiple logistic regression, which treats the baseline as a 
predictor (covariate), the "main effects" of intervention (treatment) in 
Ricardo's dataset were associated with a Z-statistic of 2.42 (P < 0.05).  The 
corresponding slope coefficient for the baseline covariate was asssociated with 
a Z-statistic of 4.96 (P < 0.05).  As Ricardo noted, the interaction term was 
associated with a Z-statistic of -0.41 (P > 0.05).  My interpretation of this 
is that 

(i) intervention *does* result in a statistically significant difference (at 
the 5% level) in seatbelt usage from nonintervention, with an estimated odds 
ratio of 2 (95% confidence interval: 1-4), 

(ii) as expected, pretreatment seatbelt use strongly predicts postreatment 
usage, with an odds ratio of 7 (3-14), and 

(iii) pretreatment seatbelt usage and intervention do not interact.  

Note that the interpretation is straightforward and analogous to that for 
ANCOVA in that statistical significance of the baseline-by-treatment 
interaction term is not needed to infer that treament or intervention displays 
an effect.

I don't know how to interpret the results from the corresponding conditional 
logistic regression printout, except to note, as Kieran McCaul earlier showed, 
that the odds ratios reflect those obtained in separate McNemar tests of the 
two treatment groups.  From the brief numerical study described below, it is 
apparent that, while the p-value from the conditional logistic regression 
approach is affected by a lot of (irrelevant) things, it does not reflect the 
statistical significance of the treatment factor, and therefore this approach 
is unsuitable for this type of study design.

In the simulation, all scenarios have equal treatment group sizes, which is the 
expectation in a randomized parallel-group design with a 1:1 treatment 
assigment ratio.  The first three settings (simulations) evaluate the "test 
size" (Type I error rate) of both approaches under a variety of conditions.  
The first setting has unbalanced baseline rates of responses and equal rates of 
changeover due to treatment.  The second setting balances the baseline rates of 
response and maintains the equal rates of changeover.  The third setting 
maintains the balance in both baseline and changeover rates, and just increases 
the changeover rates.  The fourth and fifth settings illustrate the relative 
power of the two approaches.

Setting 1 presents the case with a baseline imbalance:  1/3 positive response 
for control treatment and 2/3 in experimental (intervention) treatment.  There 
is no treatment effect:  25% conversion in each treatment group at each level 
of baseline response.  The Type 1 error rates for the ANCOVA-like multiple 
logistic regression are, as expected, in the 4-5% range.  But that for the 
conditional logistic approach yields a whopping 64% false-positive error rate, 
an order of magnitude greater than the nominal level.  (Detailed results are 
shown below; apologies for the length of the post.)  This reflects that this 
latter approach cannot tease out baseline rates of response from the treatment 
effects.

Setting 2 has balanced baseline rates of response (50% each) on the outcome 
variable, but keeps the same 25% conversion to nonbaseline values in each, 
i.e., again no treatment effect.  Note that this will give an identical 
expected value of the odds ratios for the two McNemar tests; both odds ratios 
have an expected value of one.  The ANCOVA-like multiple regression approach 
again provides proper control over Type I error, with rates again in the 4-5% 
range.  Here, the conditional logistic regression approach is overly 
conservative, with a Type I error rate less than half of the nominal rate.

Setting 3 is as that for Setting 2, but with a 50% conversion rate.  The ANCOVA-
like approach gives Type I error rates in the 5-6% range.  In contrast, the 
conditional logistic approach yields a rate of less than one-half of one 
percent, an order of magnitude lower than what it's supposed to be.  The only 
difference from Setting 2 is the higher rate of across-the-board conversion, 
which induces a lower binomial correlation between pre- and posttreatment 
outcomes; thus, the p-value of the conditional logistic regression approach 
reflects the within-subject correlation.

Setting 4 keeps the balanced baseline rates of the previous cases, but changes 
the rates of conversion differentially between treatment groups in order to 
provide a small treatment effect:  a 10% conversion of baseline-negatives to 
positive posttreatment in the control treatment group, a 10% switching of 
baseline-positives to negative in this group, a 15% conversion of baseline-
negatives to positive in the experimental treatment group and a 5% conversion 
of baseline-positives to negative in this group.  Although I don't know what 
the two true-positive rate should be, the relative power of the two approaches 
can be assessed, since both should show the treatment (intervention) effect--a 
0% net change in the control (nonintervention) treatment group and a net 10% 
conversion to positives in the experimental (intervention) treatment group.  In 
this setting, both approaches display the same relative power, about 11-12% 
rejections of the null hypotheses.  The test of the joint hypothesis, which is 
only doable in the ANCOVA-like approach, shows slightly higher relative power, 
about 15% rate of rejection, and probably reflects the differential switch rate 
between the two levels of baseline that occurs only at one level of the 
treatment factor.  In any event, this enhanced power in the face of maintaining 
the level of Type I error rate (Settings 1 through 3) is another argument for 
using this joint-hypothesis as the default primary hypothesis. 

Setting 5 is even more dramatic in the differentiation of the treatment groups: 
 the control treatment group remains as before with a 10% switchover at each 
level of baseline response for a net 0% change, but the difference is 
exaggerated for the experimental treatment group to a 25% switch from negative 
to positive and a 2.5% switch from positive to negative.  Again, the relative 
power of the ANCOVA-like and conditional logistic approachs in this perfectly 
balanced-baseline case is similar, about 32-34% rejection rates.  Again, the 
joint-hypothesis test is more powerful, about twice that of the main-effects-
only hypothesis.  Its concordance with the differential switching rates 
occuring differentially in one treatment group reinforces the argument for its 
primacy.  Even the interaction-only hypothesis test failed to discern this 
situation reliably.

Given that the false-positive (Type 1 error) rate for the conditional logistic 
regression approach is affected by baseline imbalance and by the gross rate of 
conversion (binomial correlation between observations), I conclude that results 
from this approach are uninterpretable for this type of study design.  The 
ANCOVA-like multiple logistic regression approach, however, maintains the 
nominal level of Type I error, and has at least the power of the invalid 
conditional logistic regression approach, and is even perhaps a smidgen better.

The results of the exercise follow immediately below, and the do-file follows 
afterward.

Joseph Coveney

-------------------------------------------------------------------------------

Means represent rates of declaring statistical
  significance at a nominal 5% level of Type 1 error rate

pclo: Conditional logistic regression
pant: ANCOVA-like, main effects of treatment
pani: ANCOVA-like, treatment-by-baseline interaction
panb: ANCOVA-like, treatment main effects & interaction

Setting 1

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
        panb |     10000       .0426    .2019637          0          1
        pant |     10000       .0483    .2144101          0          1
        pani |     10000        .052    .2220381          0          1
        pclo |     10000       .6355    .4813137          0          1

Setting 2

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
        panb |     10000       .0404    .1969054          0          1
        pant |     10000       .0426    .2019637          0          1
        pani |     10000       .0485     .214831          0          1
        pclo |     10000       .0213    .1443897          0          1

Setting 3

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
        panb |     10000       .0476     .212929          0          1
        pant |     10000       .0529     .223845          0          1
        pani |     10000        .057    .2318542          0          1
        pclo |     10000       .0048     .069119          0          1

Setting 4

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
        panb |     10000       .1473    .3544224          0          1
        pant |     10000       .1233    .3287977          0          1
        pani |     10000       .0263     .160034          0          1
        pclo |     10000       .1109     .314024          0          1

Setting 5

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
        panb |     10000       .5955    .4908196          0          1
        pant |     10000       .3372    .4727774          0          1
        pani |     10000       .0147    .1203551          0          1
        pclo |     10000       .3156    .4647776          0          1

-------------------------------------------------------------------------------

program define corlo1
    version 8.2
    replace dep0 = 1 in 67/166
    generate byte dep1 = abs(dep0 - (uniform() > 0.75))    
end
program define corlo2
    version 8.2
    replace dep0 = 1 in 51/150
    generate byte dep1 = abs(dep0 - (uniform() > 0.75))
end
program define corlo3
    version 8.2
    replace dep0 = 1 in 51/150
    generate byte dep1 = abs(dep0 - (uniform() > 0.5))
end
program define corlo4
    version 8.2
    replace dep0 = 1 in 51/150
    generate byte dep1 = dep0
    replace dep1 = abs(dep0 - (uniform() > 0.90)) in 1/50
    replace dep1 = abs(dep0 - (uniform() > 0.90)) in 51/100
    replace dep1 = abs(dep0 - (uniform() > 0.85)) in 101/150
    replace dep1 = abs(dep0 - (uniform() > 0.95)) in 151/l
end
program define corlo5
    version 8.2
    replace dep0 = 1 in 51/150
    generate byte dep1 = dep0
    replace dep1 = abs(dep0 - (uniform() > 0.90)) in 1/50
    replace dep1 = abs(dep0 - (uniform() > 0.90)) in 51/100
    replace dep1 = abs(dep0 - (uniform() > 0.75)) in 101/150
    replace dep1 = abs(dep0 - (uniform() > 0.975)) in 151/l
end
program define corlo, rclass
    version 8.2
    syntax , setting(integer)
    drop _all
    set obs 200
    generate byte dep0 = 0
    corlo`setting'
    generate byte trt = _n > _N / 2
    generate byte iac = trt * dep0
    logistic dep1 dep0 trt iac, nolog
    test trt iac
    return scalar anb = r(p)
    test trt
    return scalar ant = r(p)
    test iac
    return scalar ani = r(p)
    generate int pid = _n
    reshape long dep, i(pid) j(per)
    replace iac = trt * per
    xtlogit dep trt per iac, i(pid) fe nolog
    test iac
    return scalar clo = r(p)
end
program define runem
    version 8.2
    clear
    set more off
    set seed 20040211
    display
    display as text "Means represent rates of declaring statistical"
    display as text "  significance at a nominal 5% level of Type 1 error rate"
    display
    display as text "pclo: Conditional logistic regression"
    display as text "pant: ANCOVA-like, main effects of treatment"
    display as text "pani: ANCOVA-like, treatment-by-baseline interaction"
    display as text "panb: ANCOVA-like, treatment main effects & interaction"
    display 
    forvalues scenario = 1/5 {
        display as input "Setting `scenario'"
        quietly simulate "corlo, setting(`scenario')" anb = r(anb) ///
          ant = r(ant) ani = r(ani) clo = r(clo), reps(10000)
        foreach var of varlist _all {
            generate byte p`var' = `var' < 0.05
        }
        summarize p*
        display
    }
end
runem
exit



*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Follow-Ups:
- Re: st: Comparing change in rates - frustrating problem: questionable results
  - From: Ricardo Ovaldia <[email protected]>
Prev by Date: st: constrainted regression
Next by Date: st: RE: Re: Normality Testing
Previous by thread: st: constrainted regression
Next by thread: Re: st: Comparing change in rates - frustrating problem: questionable results
Index(es):
- Date
- Thread