Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: mepoisson vs meqrpoisson
From
Phil Clayton <[email protected]>
To
[email protected]
Subject
Re: st: mepoisson vs meqrpoisson
Date
Sat, 3 Aug 2013 17:48:19 +1000
Hi Steve,
Thanks for your reply. I did try changing the number of integration points but forgot to mention this, sorry (it didn't substantially change the results). In any case StataCorp is now on the case so we'll see what they say.
I agree it's a nice example of Simpson's paradox - and of course this is why I want to use the multilevel model.
Cheers,
Phil
On 03/08/2013, at 1:28 AM, Steve Samuels <[email protected]> wrote:
> Correction: The differing signs for the pe & *meqr* adp coefficients are
> likely an instance of Simpson's paradox.
>
> SS
>
> Phil:
>
> 1. I can't explain the difference between the -mepoisson and
> -mepqrpoisson-. The near equivalence of the meqr1 and meqr2 estimates
> gives me the impression that the QR results are more accurate, but I
> really don't know. In any case, the comparisons between me and meqr are
> premature if you haven't yet followed this advice from the ME manual
> (page 10):
>
> "When fitting mixed-effects models, you should always assess whether the
> approximation is adequate by refitting the model with a larger number of
> quadrature points."
>
> Try this and report back.
>
> 2. The differing signs for the pe & me adp coefficients are an instance
> of Simpson's paradox. To save Nick Cox a post, I'll add that this
> phenomenon was recognized earlier by Yule and by Karl Pearson. See:
> https://en.wikipedia.org/wiki/Simpson%27s_paradox
>
>
> Steve
>
>
>
>
> On Aug 1, 2013, at 9:05 PM, Phil Clayton wrote:
>
> Dear Statalisters,
>
> I have been experimenting with the new -mepoisson- and -meqrpoisson- commands and have found substantively different results that I can't explain.
>
> I am analysing infections in patients undergoing dialysis. I have a 3-level model - episodes of dialysis are clustered within patients (n=6959), who are clustered within hospitals (n=56). The outcome is "inf" (number of infections) and my main exposure of interest is "apd" (type of dialysis) which can vary within patients. Exposure time is stored in "duration".
>
> There is also a variable called "anyprevinf" which indicates if the patient has previously had an infection. So there can be up to 4 observations per patient, eg:
> hospid patient anypre~f apd duration inf
> 13 49 0 0 3 0
> 13 49 0 1 49 1
> 13 49 1 0 89 0
> 13 49 1 1 20 0
>
> To illustrate the problem I am ignoring the other covariates and have run 3 models - simple Poisson model ignoring the hierarchical structure, -mepoisson- and -meqrpoisson-:
> poisson inf apd, exp(duration)
> est store p1
> mepoisson inf apd, exp(duration) || hospid: || patient:
> est store mep1
> meqrpoisson inf apd, exp(duration) || hospid: || patient:
> est store meqrp1
>
> Since in this example I'm not using anyprevinf I then collapsed the dataset and re-ran the same 3 models:
> collapse (sum) inf duration, by(hospid apd patient)
> poisson inf apd, exp(duration)
> est store p2
> mepoisson inf apd, exp(duration) || hospid: || patient:
> est store mep2
> meqrpoisson inf apd, exp(duration) || hospid: || patient:
> est store meqrp2
>
> My results are summarised as follows:
> . est table p1 p2 mep1 mep2 meqrp1 meqrp2, keep(apd) eq(1) b se p
>
> --------------------------------------------------------------------------------------------
> Variable | p1 p2 mep1 mep2 meqrp1 meqrp2
> -------------+------------------------------------------------------------------------------
> apd | -.11492828 -.11492828 -.07347119 .0277205 .04527607 .04511924
> | .02520192 .02520192 .03083056 .03562662 .03693156 .0369239
> | 0.0000 0.0000 0.0172 0.4365 0.2202 0.2217
> --------------------------------------------------------------------------------------------
> legend: b/se/p
>
> Comments:
> - p1 and p2 are identical as expected
> - mep1 and mep2 are quite different which I don't understand - I think they should be almost totally equivalent since the only difference was the -collapse- (see meqrp1 vs meqrp2)
> - mep1 and meqr1 are quite different which I don't understand - I thought they were fitting the same model using a slightly different technique
> - mep2 and meqr2 are pretty similar, but not as similar as I was expecting since my understanding is that they're fitting the same model
>
> I would be grateful for any suggestions as to what's happening here. Below I've put a log of some descriptive data and the full output from each model.
>
> One subtle potential problem is that if anyprevinf==1 there must be another observation for that patient with anyprevinf==0 & inf==1 - so in the uncollapsed dataset there isn't perfect conditional independence. Not sure if that's enough to explain the discrepancy though.
>
> Kind regards,
> Phil
>
> . describe
>
> Contains data
> obs: 12,978
> vars: 6
> size: 272,538 (_dta has notes)
> -------------------------------------------------------------------------------------------------------------------
> storage display value
> variable name type format label variable label
> -------------------------------------------------------------------------------------------------------------------
> patient int %9.0g
> apd byte %8.0g
> anyprevinf byte %8.0g
> hospid byte %9.0g group(hospcode)
> inf double %8.0g (sum) inf
> duration double %9.0g (sum) duration
> -------------------------------------------------------------------------------------------------------------------
> Sorted by: hospid apd patient anyprevinf
> Note: dataset has changed since last saved
>
> . misstable summarize // no missing data
> (variables nonmissing or string)
>
> . codebook hospid patient
>
> -------------------------------------------------------------------------------------------------------------------
> hospid group(hospcode)
> -------------------------------------------------------------------------------------------------------------------
>
> type: numeric (byte)
>
> range: [1,56] units: 1
> unique values: 56 missing .: 0/12978
>
> mean: 33.0623
> std. dev: 15.064
>
> percentiles: 10% 25% 50% 75% 90%
> 10 20 35 47 53
>
> -------------------------------------------------------------------------------------------------------------------
> patient (unlabeled)
> -------------------------------------------------------------------------------------------------------------------
>
> type: numeric (int)
>
> range: [1,6959] units: 1
> unique values: 6959 missing .: 0/12978
>
> mean: 3461.36
> std. dev: 2043.82
>
> percentiles: 10% 25% 50% 75% 90%
> 674 1650 3457.5 5258 6284
>
> . bysort patient: assert hospid==hospid[_n-1] if _n>1
>
> . tab apd
>
> apd | Freq. Percent Cum.
> ------------+-----------------------------------
> 0 | 7,275 56.06 56.06
> 1 | 5,703 43.94 100.00
> ------------+-----------------------------------
> Total | 12,978 100.00
>
> . tab anyprevinf
>
> anyprevinf | Freq. Percent Cum.
> ------------+-----------------------------------
> 0 | 9,411 72.52 72.52
> 1 | 3,567 27.48 100.00
> ------------+-----------------------------------
> Total | 12,978 100.00
>
> . tab inf
>
> (sum) inf | Freq. Percent Cum.
> ------------+-----------------------------------
> 0 | 8,231 63.42 63.42
> 1 | 3,980 30.67 94.09
> 2 | 388 2.99 97.08
> 3 | 185 1.43 98.51
> 4 | 89 0.69 99.19
> 5 | 54 0.42 99.61
> 6 | 25 0.19 99.80
> 7 | 11 0.08 99.88
> 8 | 7 0.05 99.94
> 9 | 3 0.02 99.96
> 10 | 4 0.03 99.99
> 11 | 1 0.01 100.00
> ------------+-----------------------------------
> Total | 12,978 100.00
>
> . sum duration
>
> Variable | Obs Mean Std. Dev. Min Max
> -------------+--------------------------------------------------------
> duration | 12978 322.0892 389.4168 .5 2802
>
> .
> . poisson inf apd, exp(duration)
>
> Iteration 0: log likelihood = -14529.051
> Iteration 1: log likelihood = -14528.992
> Iteration 2: log likelihood = -14528.992
>
> Poisson regression Number of obs = 12978
> LR chi2(1) = 20.79
> Prob > chi2 = 0.0000
> Log likelihood = -14528.992 Pseudo R2 = 0.0007
>
> ------------------------------------------------------------------------------
> inf | Coef. Std. Err. z P>|z| [95% Conf. Interval]
> -------------+----------------------------------------------------------------
> apd | -.1149283 .0252019 -4.56 0.000 -.1643231 -.0655334
> _cons | -6.439011 .017778 -362.19 0.000 -6.473855 -6.404167
> ln(duration) | 1 (exposure)
> ------------------------------------------------------------------------------
>
> . est store p1
>
> . mepoisson inf apd, exp(duration) || hospid: || patient:
>
> Fitting fixed-effects model:
>
> Iteration 0: log likelihood = -26408.57
> Iteration 1: log likelihood = -14532.71
> Iteration 2: log likelihood = -14528.994
> Iteration 3: log likelihood = -14528.992
> Iteration 4: log likelihood = -14528.992
>
> Refining starting values:
>
> Grid node 0: log likelihood = -9069.4652
>
> Fitting full model:
>
> Iteration 0: log likelihood = -9069.4652
> Iteration 1: log likelihood = -9063.4762
> Iteration 2: log likelihood = -9059.664
> Iteration 3: log likelihood = -9058.714
> Iteration 4: log likelihood = -9058.715
> Iteration 5: log likelihood = -9058.715
>
> Mixed-effects Poisson regression Number of obs = 12978
>
> -----------------------------------------------------------
> | No. of Observations per Group
> Group Variable | Groups Minimum Average Maximum
> ----------------+------------------------------------------
> hospid | 56 1 231.8 949
> patient | 6959 1 1.9 4
> -----------------------------------------------------------
>
> Integration method: mvaghermite Integration points = 7
>
> Wald chi2(1) = 5.68
> Log likelihood = -9058.715 Prob > chi2 = 0.0172
> --------------------------------------------------------------------------------
> inf | Coef. Std. Err. z P>|z| [95% Conf. Interval]
> ---------------+----------------------------------------------------------------
> apd | -.0734712 .0308306 -2.38 0.017 -.133898 -.0130444
> _cons | -6.505974 .0244448 -266.15 0.000 -6.553884 -6.458063
> ln(duration) | 1 (exposure)
> ---------------+----------------------------------------------------------------
> hospid |
> var(_cons)| .3186859 .0910349 .1820586 .5578463
> ---------------+----------------------------------------------------------------
> hospid>patient |
> var(_cons)| .912644 .0565245 .8083181 1.030435
> --------------------------------------------------------------------------------
> LR test vs. Poisson regression: chi2(2) = 10940.55 Prob > chi2 = 0.0000
>
> Note: LR test is conservative and provided only for reference.
>
> . est store mep1
>
> . meqrpoisson inf apd, exp(duration) || hospid: || patient:
>
> Refining starting values:
>
> Iteration 0: log likelihood = -13532.955
> Iteration 1: log likelihood = -13514.626
> Iteration 2: log likelihood = -13503.088
>
> Performing gradient-based optimization:
>
> Iteration 0: log likelihood = -13503.088
> Iteration 1: log likelihood = -13502.469
> Iteration 2: log likelihood = -13502.468
> Iteration 3: log likelihood = -13502.468
>
> Mixed-effects Poisson regression Number of obs = 12978
>
> --------------------------------------------------------------------------
> | No. of Observations per Group Integration
> Group Variable | Groups Minimum Average Maximum Points
> ----------------+---------------------------------------------------------
> hospid | 56 1 231.8 949 7
> patient | 6959 1 1.9 4 7
> --------------------------------------------------------------------------
>
> Wald chi2(1) = 1.50
> Log likelihood = -13502.468 Prob > chi2 = 0.2202
>
> ------------------------------------------------------------------------------
> inf | Coef. Std. Err. z P>|z| [95% Conf. Interval]
> -------------+----------------------------------------------------------------
> apd | .0452761 .0369316 1.23 0.220 -.0271085 .1176606
> _cons | -6.922041 .0735074 -94.17 0.000 -7.066113 -6.777969
> ln(duration) | 1 (exposure)
> ------------------------------------------------------------------------------
>
> ------------------------------------------------------------------------------
> Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
> -----------------------------+------------------------------------------------
> hospid: Identity |
> var(_cons) | .1840679 .0504352 .1075836 .3149271
> -----------------------------+------------------------------------------------
> patient: Identity |
> var(_cons) | 1.010816 .0477642 .9214039 1.108904
> ------------------------------------------------------------------------------
> LR test vs. Poisson regression: chi2(2) = 2053.05 Prob > chi2 = 0.0000
>
> Note: LR test is conservative and provided only for reference.
>
> . est store meqrp1
>
> .
> . collapse (sum) inf duration, by(hospid apd patient)
>
> . poisson inf apd, exp(duration)
>
> Iteration 0: log likelihood = -10969.711
> Iteration 1: log likelihood = -10969.378
> Iteration 2: log likelihood = -10969.378
>
> Poisson regression Number of obs = 9756
> LR chi2(1) = 20.79
> Prob > chi2 = 0.0000
> Log likelihood = -10969.378 Pseudo R2 = 0.0009
>
> ------------------------------------------------------------------------------
> inf | Coef. Std. Err. z P>|z| [95% Conf. Interval]
> -------------+----------------------------------------------------------------
> apd | -.1149283 .0252019 -4.56 0.000 -.1643231 -.0655334
> _cons | -6.439011 .017778 -362.19 0.000 -6.473855 -6.404167
> ln(duration) | 1 (exposure)
> ------------------------------------------------------------------------------
>
> . est store p2
>
> . mepoisson inf apd, exp(duration) || hospid: || patient:
>
> Fitting fixed-effects model:
>
> Iteration 0: log likelihood = -18163.874
> Iteration 1: log likelihood = -11003.634
> Iteration 2: log likelihood = -10969.486
> Iteration 3: log likelihood = -10969.378
> Iteration 4: log likelihood = -10969.378
>
> Refining starting values:
>
> Grid node 0: log likelihood = -9019.1437
>
> Fitting full model:
>
> Iteration 0: log likelihood = -9019.1437
> Iteration 1: log likelihood = -9009.7733
> Iteration 2: log likelihood = -8995.3399
> Iteration 3: log likelihood = -8986.9478
> Iteration 4: log likelihood = -8986.5939
> Iteration 5: log likelihood = -8986.5907
> Iteration 6: log likelihood = -8986.591
>
> Mixed-effects Poisson regression Number of obs = 9756
>
> -----------------------------------------------------------
> | No. of Observations per Group
> Group Variable | Groups Minimum Average Maximum
> ----------------+------------------------------------------
> hospid | 56 1 174.2 716
> patient | 6959 1 1.4 2
> -----------------------------------------------------------
>
> Integration method: mvaghermite Integration points = 7
>
> Wald chi2(1) = 0.61
> Log likelihood = -8986.591 Prob > chi2 = 0.4365
> --------------------------------------------------------------------------------
> inf | Coef. Std. Err. z P>|z| [95% Conf. Interval]
> ---------------+----------------------------------------------------------------
> apd | .0277205 .0356266 0.78 0.437 -.0421064 .0975474
> _cons | -6.738195 .0441765 -152.53 0.000 -6.824779 -6.651611
> ln(duration) | 1 (exposure)
> ---------------+----------------------------------------------------------------
> hospid |
> var(_cons)| .2065251 .0586674 .1183512 .3603903
> ---------------+----------------------------------------------------------------
> hospid>patient |
> var(_cons)| .9873731 .0491168 .89565 1.088489
> --------------------------------------------------------------------------------
> LR test vs. Poisson regression: chi2(2) = 3965.57 Prob > chi2 = 0.0000
>
> Note: LR test is conservative and provided only for reference.
>
> . est store mep2
>
> . meqrpoisson inf apd, exp(duration) || hospid: || patient:
>
> Refining starting values:
>
> Iteration 0: log likelihood = -9972.7637
> Iteration 1: log likelihood = -9949.2459
> Iteration 2: log likelihood = -9943.3971
>
> Performing gradient-based optimization:
>
> Iteration 0: log likelihood = -9943.3971
> Iteration 1: log likelihood = -9942.2696
> Iteration 2: log likelihood = -9942.2532
> Iteration 3: log likelihood = -9942.2532
>
> Mixed-effects Poisson regression Number of obs = 9756
>
> --------------------------------------------------------------------------
> | No. of Observations per Group Integration
> Group Variable | Groups Minimum Average Maximum Points
> ----------------+---------------------------------------------------------
> hospid | 56 1 174.2 716 7
> patient | 6959 1 1.4 2 7
> --------------------------------------------------------------------------
>
> Wald chi2(1) = 1.49
> Log likelihood = -9942.2532 Prob > chi2 = 0.2217
>
> ------------------------------------------------------------------------------
> inf | Coef. Std. Err. z P>|z| [95% Conf. Interval]
> -------------+----------------------------------------------------------------
> apd | .0451192 .0369239 1.22 0.222 -.0272503 .1174888
> _cons | -6.921458 .0734339 -94.25 0.000 -7.065386 -6.77753
> ln(duration) | 1 (exposure)
> ------------------------------------------------------------------------------
>
> ------------------------------------------------------------------------------
> Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
> -----------------------------+------------------------------------------------
> hospid: Identity |
> var(_cons) | .1835775 .0503448 .1072467 .3142355
> -----------------------------+------------------------------------------------
> patient: Identity |
> var(_cons) | 1.009257 .0477619 .9198554 1.107347
> ------------------------------------------------------------------------------
> LR test vs. Poisson regression: chi2(2) = 2054.25 Prob > chi2 = 0.0000
>
> Note: LR test is conservative and provided only for reference.
>
> . est store meqrp2
>
> .
> . est table p1 p2 mep1 mep2 meqrp1 meqrp2, keep(apd) eq(1) b se p
>
> --------------------------------------------------------------------------------------------
> Variable | p1 p2 mep1 mep2 meqrp1 meqrp2
> -------------+------------------------------------------------------------------------------
> apd | -.11492828 -.11492828 -.07347119 .0277205 .04527607 .04511924
> | .02520192 .02520192 .03083056 .03562662 .03693156 .0369239
> | 0.0000 0.0000 0.0172 0.4365 0.2202 0.2217
> --------------------------------------------------------------------------------------------
> legend: b/se/p
>
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/