Ben Dwamena asked earlier this month about using bivariate random effects
modeling in meta-analysis of accuracy indexes of diagnostic tests.
>I was interested in using bivariate random effects regression to
>meta-analyze simultaneously sensitivity and specificity ( logit
>transforms) as correlated heterogeneous outcomes versus a number of
>covariates such as study quality, sample size, prevance of disease and
>other clinical/methodologic characteristics. This has been done by other
>investigators using Proc Mixed in SAS. How may this be done with Stata?
>Gllamm? mvreg?. Thanks
I had posted some possibilities with Stata, acknowledging not knowing what
other investigators had done with SAS's PROC MIX in this regard. As a
follow-up, Gerben ter Riet provided partial citations to two literature
articles for the method, and posted some control language that his
colleagues use to implement bivariate random effects meta-analysis using
PROC MIX.
I tracked down the articles and finally received photocopies of them
yesterday. The full citations are:
H. C. van Houwelingen & K. H. Zwinderman, A bivariate approach to
meta-analysis. _Statistics in Medicine_ 12:2273-84, 1993.
H. C. van Houwelingen, L. R. Arends & T. Stijnen, Tutorial in biostatistics.
Advanced methods in meta-analysis: multivariate approach and
meta-regression. _Statistics in Medicine_ 21:589-624, 2002.
There is good news and bad news.
First, the bad news. The suggestions that I posted are not for what Ben's
other investigators meant by "bivariate random effects meta-analysis." As a
rough analogy, bivariate random effects meta-analysis is to what I posted as
MANOVA is to repeated-measures ANOVA.
The good news is that -gllamm- can indeed handily perform bivariate random
effects meta-analysis. The do-files below show how, and reproduce the
results in each of the two articles that Gerben cites. Convergence is
rather rapid as far as -gllamm- analyses go.
A few notes:
First, the first do-file below follows the method that van Houwelingen and
his colleagues use in the first article above, with the exception that the
do-file doesn't add 0.5 to both successes (failures) and trials when there
are zero successes (failures). -gllamm- doesn't seem bothered by the
occasional clinical study with a zero in one or two cells of the fourfold
table. So, without the kludge of adding small increments "to avoid
degeneracy" (from the first article, p. 2274), the results from the first
do-file below occasionally differ in the third decimal place from the values
given in Table III (p. 2281) of the first article.
Second, the method that van Houwelingen and his colleagues use in the second
article is, as Ben and Gerben describe, a logit transformation followed by
linear mixed-effects regression (using PROC MIXED) with the residual
variance fixed at the Woolf-formula value of the log odds. The second
do-file below implements this in Stata, again, using -gllamm-. Realize,
however, that this method is not the same as that in their first article,
which was by maximizing the full likelihood. (They wrote GAUSS code to do
this maximum likelihood estimation for the first article.) If you use the
first article's approach on the second article's example dataset, then the
results will differ from the results published in the second article using
the PROC MIXED approach. I'm not sure why van Houwelingen and his
colleagues chose to use PROC MIXED and the fixed, "known"-residual-variance
expedient for the second article, instead of, say, PROC NLMIXED (which would
be like -gllamm-), or perhaps the GLIMMIX macro (with a caveat about
potential for bias in some circumstances with binomial data). Their first
article's approach (with -gllamm-) seems more elegant and and would seem
more accurate than that using PROC MIXED.
Third, the SAS control language that Gerben posted indicates that his
colleagues at his institution use PROC MIX's default restricted maximum
likelihood (REML) estimation method instead of maximum likelihood. In the
second article above, van Houwelingen in contrast deliberately uses maximum
likelihood with PROC MIX in order to be able to estimate likelihood-ratio
confidence intervals around the parameter estimates, admonishing that REML
won't allow this. (One of the advantages that van Houwelingen and his
colleagues mention for the bivariate random effects approach using maximum
likelihood method in the first place was that it is able to avoid using Wald
confidence intervals for parameter estimates; likelihood ratio confidence
intervals accommodate the variance components' contributions.)
Likelihood-ratio confidence intervals are estimated by profiling the
likelihood. Although I haven't shown it below, -gllamm-'s -eval from()-
options enable this.
Last, one of the motivations of the bivariate random effects meta-analysis
for van Houwelingen and his colleagues was to allow the log odds of the
control treatment group and of the experimental treatment group to be
individually estimated. This enables examination of the baseline risk
("underlying risk") among studies independently of the treatment effect.
The merits and demerits of this are discussed, among other places, in an
online article accessible without charge to anyone who's interested at
http://bmj.bmjjournals.com/cgi/content/full/313/7059/735 . (van
Houwelingen's approach is broached in the article.)
Joseph Coveney
----------------------------------------------------------------------------
clear
set more off
/* The dataset is from R. Collins & M. Langman,
Treatment with histamine H1 antagonists in
acute upper gastrointestinal hemorrhage.
_New England Journal of Medicine_ 313:660-66, 1985,
cited in H. C. van Houwelingen & K. H. Zwinderman,
A bivariate approach to meta-analysis. _Statistics
in Medicine_ 12:2273-84, 1993. */
input trial n_T X_T n_C X_C
1 259 50 260 51
2 153 44 132 30
3 106 16 107 15
4 78 14 80 21
5 51 16 54 15
6 56 11 53 12
7 50 8 50 16
8 40 9 48 11
9 33 12 36 10
10 46 5 47 12
11 31 6 29 12
12 33 6 39 7
13 36 11 26 5
14 21 10 19 8
15 20 7 20 8
16 45 9 43 3
17 34 3 31 13
18 24 4 24 5
19 14 2 15 6
20 15 2 14 3
21 18 5 12 1
22 10 1 9 4
23 10 3 11 1
24 18 0 19 3
25 14 0 14 0
end
compress
reshape long n_ X_, i(trial) j(trt) string
label define Group 0 C 1 T
encode trt, generate(grp) label(Group)
drop trt
rename grp trt
rensfix _
/* Actual model fitting begins here. */
quietly tabulate trt, generate(grp)
eq grp1: grp1
eq grp2: grp2
gllamm X grp1 grp2, nocons i(trial) nrf(2) eqs(grp1 grp2) ///
family(binomial) denom(n) link(logit) adapt trace allc
exit
----------------------------------------------------------------------------
clear
set more off
/* The dataset is from G. A. Colditz,
F. B. Brewer, C. S. Berkey, E. M.
Wilson, E. Burdick, H. V. Fineberg &
F. Mosteller, Efficacy of BCG vaccine
in the prevention of tuberculosis.
_Journal of the American Medical Association_
271:698-702, 1994, cited in H. C. van
Houwelingen, L. R. Arends and T. Stijnen,
Tutorial in Biostatistics. Advanced methods
in meta-analysis: multivariate approach and
meta-regression. _Statistics in Medicine_
21:589-624, 2002. */
input trial vd vwd nvd nvwd
1 4 119 11 128
2 6 300 29 274
3 3 228 11 209
4 62 13536 248 12619
5 33 5036 47 5761
6 180 1361 372 1079
7 8 2537 10 619
8 505 87886 499 87892
9 29 7470 45 7232
10 17 1699 65 1600
11 186 50448 141 27197
12 5 2493 3 2338
13 27 16886 29 17825
end
compress
/* This is in the spirit of the SAS data step
in the article; a Stata implementation ab initio
would be briefer, -reshape-d long first. */
generate float lno1 = ln(vd/ vwd)
generate float lno0 = ln(nvd / nvwd)
generate float var1 = 1/vd + 1/vwd
generate float var0 = 1/nvd + 1/nvwd
/* The article is confusing in that it
describes that the standard errors are stored
in the PARMSDATA file (p. 602), but what's stored
are really variance estimates. For stability,
-gllamm- parametrizes the residual variance as
logarithms of the standard error, so . . . */
generate float lsd1 = ln(sqrt(var1))
generate float lsd0 = ln(sqrt(var0))
reshape long lno var lsd, i(trial) j(trt)
/* The two random effects, one for each treatment. */
quietly tabulate trt, generate(grp)
eq grp1: grp1
eq grp2: grp2
/* Setting up the fixed ("known") residual
(first-level) variances. */
egen int cell = group(trial trt)
quietly tabulate cell, generate(dum)
eq wgt: dum1-dum26
forvalues i = 1/26 {
local dummy = lsd[`i']
constraint define `i' [lns1]dum`i' = `dummy'
}
macro drop dummy
/* First-pass estimation. -gllamm- doesn't apply
constraints in preliminary estimation of the fixed
effects (if -gllamm- decides to use itself for this),
so don't allow it to iterate here and get lost trying
to estimate as many or more parameters as there are data. */
gllamm lno grp1 grp2, nocons i(trial) nrf(2) eqs(grp1 grp2) ///
s(wgt) constraint(1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 ///
17 18 19 20 21 22 23 24 25 26) noest
matrix A = M_initf
/* Fit the model here. */
gllamm lno grp1 grp2, nocons i(trial) nrf(2) eqs(grp1 grp2) ///
s(wgt) constraint(1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 ///
17 18 19 20 21 22 23 24 25 26) from(A) long adapt trace allc
exit
----------------------------------------------------------------------------
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/