OK Joseph and Ricardo, see what you think of this.
Here's some real data I have.
People have been recruited into a RCT to evaluate an intervention that is
designed to encourage them to implement smoking bans in their home.
So there is a baseline measure BAN1 (Do you ban smoking in the home (Y/N)?)
and the same measure repeated at the end of the trial BAN2
In the intervention group the results look like this:
.
. mcc ban2 ban1 if intervention==1
| Controls |
Cases | Exposed Unexposed | Total
-----------------+------------------------+----------
Exposed | 47 16 | 63
Unexposed | 6 59 | 65
-----------------+------------------------+----------
Total | 53 75 | 128
McNemar's chi2(1) = 4.55 Prob > chi2 = 0.0330
Exact McNemar significance probability = 0.0525
Proportion with factor
Cases .4921875
Controls .4140625 [95% Conf. Interval]
--------- --------------------
difference .078125 -.0002214 .1564714
ratio 1.188679 1.013845 1.393663
rel. diff. .1333333 .0192232 .2474435
odds ratio 2.666667 .9911545 8.320598 (exact)
If I reshape the data, I create a new variable 'period' which identifies the
post-period (case) from the pre-period or baseline (control). I can do a
conditional logistic regression on the these data:
. xi:clogit period i.ban if intervention==1, group(id) or
i.ban _Iban_0-1 (naturally coded; _Iban_0 omitted)
Conditional (fixed-effects) logistic regression Number of obs =
256
LR chi2(1) =
4.72
Prob > chi2 =
0.0299
Log likelihood = -86.364559 Pseudo R2 =
0.0266
----------------------------------------------------------------------------
--
period | Odds Ratio Std. Err. z P>|z| [95% Conf.
Interval]
-------------+--------------------------------------------------------------
--
_Iban_1 | 2.666667 1.276569 2.05 0.040 1.043487
6.814758
----------------------------------------------------------------------------
--
Now I get the same OR, but different 95%CIs because mcc is using an exact
method.
In the non-intervention arm, the results look like this:
. mcc ban2 ban1 if intervention==0
| Controls |
Cases | Exposed Unexposed | Total
-----------------+------------------------+----------
Exposed | 47 10 | 57
Unexposed | 7 72 | 79
-----------------+------------------------+----------
Total | 54 82 | 136
McNemar's chi2(1) = 0.53 Prob > chi2 = 0.4669
Exact McNemar significance probability = 0.6291
Proportion with factor
Cases .4191176
Controls .3970588 [95% Conf. Interval]
--------- --------------------
difference .0220588 -.0445985 .0887161
ratio 1.055556 .9124773 1.221069
rel. diff. .0365854 -.0601456 .1333163
odds ratio 1.428571 .4908621 4.421907 (exact)
and reshaping and repeating this analysis, I get the following. Same OR,
but different 95%CIs.
. xi:clogit period i.ban if intervention==0, group(id) or
i.ban _Iban_0-1 (naturally coded; _Iban_0 omitted)
Conditional (fixed-effects) logistic regression Number of obs =
272
LR chi2(1) =
0.53
Prob > chi2 =
0.4657
Log likelihood = -94.001919 Pseudo R2 =
0.0028
----------------------------------------------------------------------------
--
period | Odds Ratio Std. Err. z P>|z| [95% Conf.
Interval]
-------------+--------------------------------------------------------------
--
_Iban_1 | 1.428571 .7040076 0.72 0.469 .5437826
3.753
---------------------------------------------------------
Now if I put the two arms of the trial together, introducing a variable
'intervention' to identify which arm of the trial each person is in, I can
fit the model:
. xi:clogit period i.ban*i.intervention, group(id) or
i.ban _Iban_0-1 (naturally coded; _Iban_0 omitted)
i.intervention _Iintervent_0-1 (naturally coded; _Iintervent_0
omitted)
i.ban*i.inter~n _IbanXint_#_# (coded as above)
note: _Iintervent_1 omitted due to no within-group variance.
Conditional (fixed-effects) logistic regression Number of obs =
528
LR chi2(2) =
5.25
Prob > chi2 =
0.0725
Log likelihood = -180.36648 Pseudo R2 =
0.0143
----------------------------------------------------------------------------
--
period | Odds Ratio Std. Err. z P>|z| [95% Conf.
Interval]
-------------+--------------------------------------------------------------
--
_Iban_1 | 1.428571 .7040077 0.72 0.469 .5437825
3.753001
_IbanXint_~1 | 1.866667 1.282474 0.91 0.364 .4855762
7.175897
----------------------------------------------------------------------------
--
The effect if 'ban' in this model is simply the effect in the
non-intervention arm:
. lincom _Iban_1,or
( 1) _Iban_1 = 0
----------------------------------------------------------------------------
--
period | Odds Ratio Std. Err. z P>|z| [95% Conf.
Interval]
-------------+--------------------------------------------------------------
--
(1) | 1.428571 .7040077 0.72 0.469 .5437825
3.753001
----------------------------------------------------------------------------
--
. lincom _Iban_1+ _IbanXint_1_1,or
The effect of 'ban' in the intervention arm is given by:
( 1) _Iban_1 + _IbanXint_1_1 = 0
----------------------------------------------------------------------------
--
period | Odds Ratio Std. Err. z P>|z| [95% Conf.
Interval]
-------------+--------------------------------------------------------------
--
(1) | 2.666667 1.276569 2.05 0.040 1.043487
6.814758
----------------------------------------------------------------------------
--
Are these two significantly different? That's given by the p-value for the
interaction term:0.364.
I think that this is valid. The variable 'intervention' is "omitted due to
no within-group variance" - it has a beta of zero by design, so you could
regard it as being in the model, it just doesn't appear because the design
matches it out.
The difference in the p-values that you obtain using exact methods is
another problem entirely - if I had something that would do "exact
conditional logistic regression" I would have obtained results that matched
those pproduced by mcc.
Kieran
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/