Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: mepoisson vs meqrpoisson
From
Steve Samuels <[email protected]>
To
[email protected]
Subject
Re: st: mepoisson vs meqrpoisson
Date
Fri, 2 Aug 2013 11:28:34 -0400
Correction: The differing signs for the pe & *meqr* adp coefficients are
likely an instance of Simpson's paradox.
SS
Phil:
1. I can't explain the difference between the -mepoisson and
-mepqrpoisson-. The near equivalence of the meqr1 and meqr2 estimates
gives me the impression that the QR results are more accurate, but I
really don't know. In any case, the comparisons between me and meqr are
premature if you haven't yet followed this advice from the ME manual
(page 10):
"When fitting mixed-effects models, you should always assess whether the
approximation is adequate by refitting the model with a larger number of
quadrature points."
Try this and report back.
2. The differing signs for the pe & me adp coefficients are an instance
of Simpson's paradox. To save Nick Cox a post, I'll add that this
phenomenon was recognized earlier by Yule and by Karl Pearson. See:
https://en.wikipedia.org/wiki/Simpson%27s_paradox
Steve
On Aug 1, 2013, at 9:05 PM, Phil Clayton wrote:
Dear Statalisters,
I have been experimenting with the new -mepoisson- and -meqrpoisson- commands and have found substantively different results that I can't explain.
I am analysing infections in patients undergoing dialysis. I have a 3-level model - episodes of dialysis are clustered within patients (n=6959), who are clustered within hospitals (n=56). The outcome is "inf" (number of infections) and my main exposure of interest is "apd" (type of dialysis) which can vary within patients. Exposure time is stored in "duration".
There is also a variable called "anyprevinf" which indicates if the patient has previously had an infection. So there can be up to 4 observations per patient, eg:
hospid patient anypre~f apd duration inf
13 49 0 0 3 0
13 49 0 1 49 1
13 49 1 0 89 0
13 49 1 1 20 0
To illustrate the problem I am ignoring the other covariates and have run 3 models - simple Poisson model ignoring the hierarchical structure, -mepoisson- and -meqrpoisson-:
poisson inf apd, exp(duration)
est store p1
mepoisson inf apd, exp(duration) || hospid: || patient:
est store mep1
meqrpoisson inf apd, exp(duration) || hospid: || patient:
est store meqrp1
Since in this example I'm not using anyprevinf I then collapsed the dataset and re-ran the same 3 models:
collapse (sum) inf duration, by(hospid apd patient)
poisson inf apd, exp(duration)
est store p2
mepoisson inf apd, exp(duration) || hospid: || patient:
est store mep2
meqrpoisson inf apd, exp(duration) || hospid: || patient:
est store meqrp2
My results are summarised as follows:
. est table p1 p2 mep1 mep2 meqrp1 meqrp2, keep(apd) eq(1) b se p
--------------------------------------------------------------------------------------------
Variable | p1 p2 mep1 mep2 meqrp1 meqrp2
-------------+------------------------------------------------------------------------------
apd | -.11492828 -.11492828 -.07347119 .0277205 .04527607 .04511924
| .02520192 .02520192 .03083056 .03562662 .03693156 .0369239
| 0.0000 0.0000 0.0172 0.4365 0.2202 0.2217
--------------------------------------------------------------------------------------------
legend: b/se/p
Comments:
- p1 and p2 are identical as expected
- mep1 and mep2 are quite different which I don't understand - I think they should be almost totally equivalent since the only difference was the -collapse- (see meqrp1 vs meqrp2)
- mep1 and meqr1 are quite different which I don't understand - I thought they were fitting the same model using a slightly different technique
- mep2 and meqr2 are pretty similar, but not as similar as I was expecting since my understanding is that they're fitting the same model
I would be grateful for any suggestions as to what's happening here. Below I've put a log of some descriptive data and the full output from each model.
One subtle potential problem is that if anyprevinf==1 there must be another observation for that patient with anyprevinf==0 & inf==1 - so in the uncollapsed dataset there isn't perfect conditional independence. Not sure if that's enough to explain the discrepancy though.
Kind regards,
Phil
. describe
Contains data
obs: 12,978
vars: 6
size: 272,538 (_dta has notes)
-------------------------------------------------------------------------------------------------------------------
storage display value
variable name type format label variable label
-------------------------------------------------------------------------------------------------------------------
patient int %9.0g
apd byte %8.0g
anyprevinf byte %8.0g
hospid byte %9.0g group(hospcode)
inf double %8.0g (sum) inf
duration double %9.0g (sum) duration
-------------------------------------------------------------------------------------------------------------------
Sorted by: hospid apd patient anyprevinf
Note: dataset has changed since last saved
. misstable summarize // no missing data
(variables nonmissing or string)
. codebook hospid patient
-------------------------------------------------------------------------------------------------------------------
hospid group(hospcode)
-------------------------------------------------------------------------------------------------------------------
type: numeric (byte)
range: [1,56] units: 1
unique values: 56 missing .: 0/12978
mean: 33.0623
std. dev: 15.064
percentiles: 10% 25% 50% 75% 90%
10 20 35 47 53
-------------------------------------------------------------------------------------------------------------------
patient (unlabeled)
-------------------------------------------------------------------------------------------------------------------
type: numeric (int)
range: [1,6959] units: 1
unique values: 6959 missing .: 0/12978
mean: 3461.36
std. dev: 2043.82
percentiles: 10% 25% 50% 75% 90%
674 1650 3457.5 5258 6284
. bysort patient: assert hospid==hospid[_n-1] if _n>1
. tab apd
apd | Freq. Percent Cum.
------------+-----------------------------------
0 | 7,275 56.06 56.06
1 | 5,703 43.94 100.00
------------+-----------------------------------
Total | 12,978 100.00
. tab anyprevinf
anyprevinf | Freq. Percent Cum.
------------+-----------------------------------
0 | 9,411 72.52 72.52
1 | 3,567 27.48 100.00
------------+-----------------------------------
Total | 12,978 100.00
. tab inf
(sum) inf | Freq. Percent Cum.
------------+-----------------------------------
0 | 8,231 63.42 63.42
1 | 3,980 30.67 94.09
2 | 388 2.99 97.08
3 | 185 1.43 98.51
4 | 89 0.69 99.19
5 | 54 0.42 99.61
6 | 25 0.19 99.80
7 | 11 0.08 99.88
8 | 7 0.05 99.94
9 | 3 0.02 99.96
10 | 4 0.03 99.99
11 | 1 0.01 100.00
------------+-----------------------------------
Total | 12,978 100.00
. sum duration
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
duration | 12978 322.0892 389.4168 .5 2802
.
. poisson inf apd, exp(duration)
Iteration 0: log likelihood = -14529.051
Iteration 1: log likelihood = -14528.992
Iteration 2: log likelihood = -14528.992
Poisson regression Number of obs = 12978
LR chi2(1) = 20.79
Prob > chi2 = 0.0000
Log likelihood = -14528.992 Pseudo R2 = 0.0007
------------------------------------------------------------------------------
inf | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
apd | -.1149283 .0252019 -4.56 0.000 -.1643231 -.0655334
_cons | -6.439011 .017778 -362.19 0.000 -6.473855 -6.404167
ln(duration) | 1 (exposure)
------------------------------------------------------------------------------
. est store p1
. mepoisson inf apd, exp(duration) || hospid: || patient:
Fitting fixed-effects model:
Iteration 0: log likelihood = -26408.57
Iteration 1: log likelihood = -14532.71
Iteration 2: log likelihood = -14528.994
Iteration 3: log likelihood = -14528.992
Iteration 4: log likelihood = -14528.992
Refining starting values:
Grid node 0: log likelihood = -9069.4652
Fitting full model:
Iteration 0: log likelihood = -9069.4652
Iteration 1: log likelihood = -9063.4762
Iteration 2: log likelihood = -9059.664
Iteration 3: log likelihood = -9058.714
Iteration 4: log likelihood = -9058.715
Iteration 5: log likelihood = -9058.715
Mixed-effects Poisson regression Number of obs = 12978
-----------------------------------------------------------
| No. of Observations per Group
Group Variable | Groups Minimum Average Maximum
----------------+------------------------------------------
hospid | 56 1 231.8 949
patient | 6959 1 1.9 4
-----------------------------------------------------------
Integration method: mvaghermite Integration points = 7
Wald chi2(1) = 5.68
Log likelihood = -9058.715 Prob > chi2 = 0.0172
--------------------------------------------------------------------------------
inf | Coef. Std. Err. z P>|z| [95% Conf. Interval]
---------------+----------------------------------------------------------------
apd | -.0734712 .0308306 -2.38 0.017 -.133898 -.0130444
_cons | -6.505974 .0244448 -266.15 0.000 -6.553884 -6.458063
ln(duration) | 1 (exposure)
---------------+----------------------------------------------------------------
hospid |
var(_cons)| .3186859 .0910349 .1820586 .5578463
---------------+----------------------------------------------------------------
hospid>patient |
var(_cons)| .912644 .0565245 .8083181 1.030435
--------------------------------------------------------------------------------
LR test vs. Poisson regression: chi2(2) = 10940.55 Prob > chi2 = 0.0000
Note: LR test is conservative and provided only for reference.
. est store mep1
. meqrpoisson inf apd, exp(duration) || hospid: || patient:
Refining starting values:
Iteration 0: log likelihood = -13532.955
Iteration 1: log likelihood = -13514.626
Iteration 2: log likelihood = -13503.088
Performing gradient-based optimization:
Iteration 0: log likelihood = -13503.088
Iteration 1: log likelihood = -13502.469
Iteration 2: log likelihood = -13502.468
Iteration 3: log likelihood = -13502.468
Mixed-effects Poisson regression Number of obs = 12978
--------------------------------------------------------------------------
| No. of Observations per Group Integration
Group Variable | Groups Minimum Average Maximum Points
----------------+---------------------------------------------------------
hospid | 56 1 231.8 949 7
patient | 6959 1 1.9 4 7
--------------------------------------------------------------------------
Wald chi2(1) = 1.50
Log likelihood = -13502.468 Prob > chi2 = 0.2202
------------------------------------------------------------------------------
inf | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
apd | .0452761 .0369316 1.23 0.220 -.0271085 .1176606
_cons | -6.922041 .0735074 -94.17 0.000 -7.066113 -6.777969
ln(duration) | 1 (exposure)
------------------------------------------------------------------------------
------------------------------------------------------------------------------
Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------
hospid: Identity |
var(_cons) | .1840679 .0504352 .1075836 .3149271
-----------------------------+------------------------------------------------
patient: Identity |
var(_cons) | 1.010816 .0477642 .9214039 1.108904
------------------------------------------------------------------------------
LR test vs. Poisson regression: chi2(2) = 2053.05 Prob > chi2 = 0.0000
Note: LR test is conservative and provided only for reference.
. est store meqrp1
.
. collapse (sum) inf duration, by(hospid apd patient)
. poisson inf apd, exp(duration)
Iteration 0: log likelihood = -10969.711
Iteration 1: log likelihood = -10969.378
Iteration 2: log likelihood = -10969.378
Poisson regression Number of obs = 9756
LR chi2(1) = 20.79
Prob > chi2 = 0.0000
Log likelihood = -10969.378 Pseudo R2 = 0.0009
------------------------------------------------------------------------------
inf | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
apd | -.1149283 .0252019 -4.56 0.000 -.1643231 -.0655334
_cons | -6.439011 .017778 -362.19 0.000 -6.473855 -6.404167
ln(duration) | 1 (exposure)
------------------------------------------------------------------------------
. est store p2
. mepoisson inf apd, exp(duration) || hospid: || patient:
Fitting fixed-effects model:
Iteration 0: log likelihood = -18163.874
Iteration 1: log likelihood = -11003.634
Iteration 2: log likelihood = -10969.486
Iteration 3: log likelihood = -10969.378
Iteration 4: log likelihood = -10969.378
Refining starting values:
Grid node 0: log likelihood = -9019.1437
Fitting full model:
Iteration 0: log likelihood = -9019.1437
Iteration 1: log likelihood = -9009.7733
Iteration 2: log likelihood = -8995.3399
Iteration 3: log likelihood = -8986.9478
Iteration 4: log likelihood = -8986.5939
Iteration 5: log likelihood = -8986.5907
Iteration 6: log likelihood = -8986.591
Mixed-effects Poisson regression Number of obs = 9756
-----------------------------------------------------------
| No. of Observations per Group
Group Variable | Groups Minimum Average Maximum
----------------+------------------------------------------
hospid | 56 1 174.2 716
patient | 6959 1 1.4 2
-----------------------------------------------------------
Integration method: mvaghermite Integration points = 7
Wald chi2(1) = 0.61
Log likelihood = -8986.591 Prob > chi2 = 0.4365
--------------------------------------------------------------------------------
inf | Coef. Std. Err. z P>|z| [95% Conf. Interval]
---------------+----------------------------------------------------------------
apd | .0277205 .0356266 0.78 0.437 -.0421064 .0975474
_cons | -6.738195 .0441765 -152.53 0.000 -6.824779 -6.651611
ln(duration) | 1 (exposure)
---------------+----------------------------------------------------------------
hospid |
var(_cons)| .2065251 .0586674 .1183512 .3603903
---------------+----------------------------------------------------------------
hospid>patient |
var(_cons)| .9873731 .0491168 .89565 1.088489
--------------------------------------------------------------------------------
LR test vs. Poisson regression: chi2(2) = 3965.57 Prob > chi2 = 0.0000
Note: LR test is conservative and provided only for reference.
. est store mep2
. meqrpoisson inf apd, exp(duration) || hospid: || patient:
Refining starting values:
Iteration 0: log likelihood = -9972.7637
Iteration 1: log likelihood = -9949.2459
Iteration 2: log likelihood = -9943.3971
Performing gradient-based optimization:
Iteration 0: log likelihood = -9943.3971
Iteration 1: log likelihood = -9942.2696
Iteration 2: log likelihood = -9942.2532
Iteration 3: log likelihood = -9942.2532
Mixed-effects Poisson regression Number of obs = 9756
--------------------------------------------------------------------------
| No. of Observations per Group Integration
Group Variable | Groups Minimum Average Maximum Points
----------------+---------------------------------------------------------
hospid | 56 1 174.2 716 7
patient | 6959 1 1.4 2 7
--------------------------------------------------------------------------
Wald chi2(1) = 1.49
Log likelihood = -9942.2532 Prob > chi2 = 0.2217
------------------------------------------------------------------------------
inf | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
apd | .0451192 .0369239 1.22 0.222 -.0272503 .1174888
_cons | -6.921458 .0734339 -94.25 0.000 -7.065386 -6.77753
ln(duration) | 1 (exposure)
------------------------------------------------------------------------------
------------------------------------------------------------------------------
Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------
hospid: Identity |
var(_cons) | .1835775 .0503448 .1072467 .3142355
-----------------------------+------------------------------------------------
patient: Identity |
var(_cons) | 1.009257 .0477619 .9198554 1.107347
------------------------------------------------------------------------------
LR test vs. Poisson regression: chi2(2) = 2054.25 Prob > chi2 = 0.0000
Note: LR test is conservative and provided only for reference.
. est store meqrp2
.
. est table p1 p2 mep1 mep2 meqrp1 meqrp2, keep(apd) eq(1) b se p
--------------------------------------------------------------------------------------------
Variable | p1 p2 mep1 mep2 meqrp1 meqrp2
-------------+------------------------------------------------------------------------------
apd | -.11492828 -.11492828 -.07347119 .0277205 .04527607 .04511924
| .02520192 .02520192 .03083056 .03562662 .03693156 .0369239
| 0.0000 0.0000 0.0172 0.4365 0.2202 0.2217
--------------------------------------------------------------------------------------------
legend: b/se/p
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/