Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Phil Clayton <philclayton@internode.on.net> |
To | "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |
Subject | st: mepoisson vs meqrpoisson |
Date | Fri, 2 Aug 2013 10:35:26 +0930 |
Dear Statalisters, I have been experimenting with the new -mepoisson- and -meqrpoisson- commands and have found substantively different results that I can't explain. I am analysing infections in patients undergoing dialysis. I have a 3-level model - episodes of dialysis are clustered within patients (n=6959), who are clustered within hospitals (n=56). The outcome is "inf" (number of infections) and my main exposure of interest is "apd" (type of dialysis) which can vary within patients. Exposure time is stored in "duration". There is also a variable called "anyprevinf" which indicates if the patient has previously had an infection. So there can be up to 4 observations per patient, eg: hospid patient anypre~f apd duration inf 13 49 0 0 3 0 13 49 0 1 49 1 13 49 1 0 89 0 13 49 1 1 20 0 To illustrate the problem I am ignoring the other covariates and have run 3 models - simple Poisson model ignoring the hierarchical structure, -mepoisson- and -meqrpoisson-: poisson inf apd, exp(duration) est store p1 mepoisson inf apd, exp(duration) || hospid: || patient: est store mep1 meqrpoisson inf apd, exp(duration) || hospid: || patient: est store meqrp1 Since in this example I'm not using anyprevinf I then collapsed the dataset and re-ran the same 3 models: collapse (sum) inf duration, by(hospid apd patient) poisson inf apd, exp(duration) est store p2 mepoisson inf apd, exp(duration) || hospid: || patient: est store mep2 meqrpoisson inf apd, exp(duration) || hospid: || patient: est store meqrp2 My results are summarised as follows: . est table p1 p2 mep1 mep2 meqrp1 meqrp2, keep(apd) eq(1) b se p -------------------------------------------------------------------------------------------- Variable | p1 p2 mep1 mep2 meqrp1 meqrp2 -------------+------------------------------------------------------------------------------ apd | -.11492828 -.11492828 -.07347119 .0277205 .04527607 .04511924 | .02520192 .02520192 .03083056 .03562662 .03693156 .0369239 | 0.0000 0.0000 0.0172 0.4365 0.2202 0.2217 -------------------------------------------------------------------------------------------- legend: b/se/p Comments: - p1 and p2 are identical as expected - mep1 and mep2 are quite different which I don't understand - I think they should be almost totally equivalent since the only difference was the -collapse- (see meqrp1 vs meqrp2) - mep1 and meqr1 are quite different which I don't understand - I thought they were fitting the same model using a slightly different technique - mep2 and meqr2 are pretty similar, but not as similar as I was expecting since my understanding is that they're fitting the same model I would be grateful for any suggestions as to what's happening here. Below I've put a log of some descriptive data and the full output from each model. One subtle potential problem is that if anyprevinf==1 there must be another observation for that patient with anyprevinf==0 & inf==1 - so in the uncollapsed dataset there isn't perfect conditional independence. Not sure if that's enough to explain the discrepancy though. Kind regards, Phil . describe Contains data obs: 12,978 vars: 6 size: 272,538 (_dta has notes) ------------------------------------------------------------------------------------------------------------------- storage display value variable name type format label variable label ------------------------------------------------------------------------------------------------------------------- patient int %9.0g apd byte %8.0g anyprevinf byte %8.0g hospid byte %9.0g group(hospcode) inf double %8.0g (sum) inf duration double %9.0g (sum) duration ------------------------------------------------------------------------------------------------------------------- Sorted by: hospid apd patient anyprevinf Note: dataset has changed since last saved . misstable summarize // no missing data (variables nonmissing or string) . codebook hospid patient ------------------------------------------------------------------------------------------------------------------- hospid group(hospcode) ------------------------------------------------------------------------------------------------------------------- type: numeric (byte) range: [1,56] units: 1 unique values: 56 missing .: 0/12978 mean: 33.0623 std. dev: 15.064 percentiles: 10% 25% 50% 75% 90% 10 20 35 47 53 ------------------------------------------------------------------------------------------------------------------- patient (unlabeled) ------------------------------------------------------------------------------------------------------------------- type: numeric (int) range: [1,6959] units: 1 unique values: 6959 missing .: 0/12978 mean: 3461.36 std. dev: 2043.82 percentiles: 10% 25% 50% 75% 90% 674 1650 3457.5 5258 6284 . bysort patient: assert hospid==hospid[_n-1] if _n>1 . tab apd apd | Freq. Percent Cum. ------------+----------------------------------- 0 | 7,275 56.06 56.06 1 | 5,703 43.94 100.00 ------------+----------------------------------- Total | 12,978 100.00 . tab anyprevinf anyprevinf | Freq. Percent Cum. ------------+----------------------------------- 0 | 9,411 72.52 72.52 1 | 3,567 27.48 100.00 ------------+----------------------------------- Total | 12,978 100.00 . tab inf (sum) inf | Freq. Percent Cum. ------------+----------------------------------- 0 | 8,231 63.42 63.42 1 | 3,980 30.67 94.09 2 | 388 2.99 97.08 3 | 185 1.43 98.51 4 | 89 0.69 99.19 5 | 54 0.42 99.61 6 | 25 0.19 99.80 7 | 11 0.08 99.88 8 | 7 0.05 99.94 9 | 3 0.02 99.96 10 | 4 0.03 99.99 11 | 1 0.01 100.00 ------------+----------------------------------- Total | 12,978 100.00 . sum duration Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- duration | 12978 322.0892 389.4168 .5 2802 . . poisson inf apd, exp(duration) Iteration 0: log likelihood = -14529.051 Iteration 1: log likelihood = -14528.992 Iteration 2: log likelihood = -14528.992 Poisson regression Number of obs = 12978 LR chi2(1) = 20.79 Prob > chi2 = 0.0000 Log likelihood = -14528.992 Pseudo R2 = 0.0007 ------------------------------------------------------------------------------ inf | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- apd | -.1149283 .0252019 -4.56 0.000 -.1643231 -.0655334 _cons | -6.439011 .017778 -362.19 0.000 -6.473855 -6.404167 ln(duration) | 1 (exposure) ------------------------------------------------------------------------------ . est store p1 . mepoisson inf apd, exp(duration) || hospid: || patient: Fitting fixed-effects model: Iteration 0: log likelihood = -26408.57 Iteration 1: log likelihood = -14532.71 Iteration 2: log likelihood = -14528.994 Iteration 3: log likelihood = -14528.992 Iteration 4: log likelihood = -14528.992 Refining starting values: Grid node 0: log likelihood = -9069.4652 Fitting full model: Iteration 0: log likelihood = -9069.4652 Iteration 1: log likelihood = -9063.4762 Iteration 2: log likelihood = -9059.664 Iteration 3: log likelihood = -9058.714 Iteration 4: log likelihood = -9058.715 Iteration 5: log likelihood = -9058.715 Mixed-effects Poisson regression Number of obs = 12978 ----------------------------------------------------------- | No. of Observations per Group Group Variable | Groups Minimum Average Maximum ----------------+------------------------------------------ hospid | 56 1 231.8 949 patient | 6959 1 1.9 4 ----------------------------------------------------------- Integration method: mvaghermite Integration points = 7 Wald chi2(1) = 5.68 Log likelihood = -9058.715 Prob > chi2 = 0.0172 -------------------------------------------------------------------------------- inf | Coef. Std. Err. z P>|z| [95% Conf. Interval] ---------------+---------------------------------------------------------------- apd | -.0734712 .0308306 -2.38 0.017 -.133898 -.0130444 _cons | -6.505974 .0244448 -266.15 0.000 -6.553884 -6.458063 ln(duration) | 1 (exposure) ---------------+---------------------------------------------------------------- hospid | var(_cons)| .3186859 .0910349 .1820586 .5578463 ---------------+---------------------------------------------------------------- hospid>patient | var(_cons)| .912644 .0565245 .8083181 1.030435 -------------------------------------------------------------------------------- LR test vs. Poisson regression: chi2(2) = 10940.55 Prob > chi2 = 0.0000 Note: LR test is conservative and provided only for reference. . est store mep1 . meqrpoisson inf apd, exp(duration) || hospid: || patient: Refining starting values: Iteration 0: log likelihood = -13532.955 Iteration 1: log likelihood = -13514.626 Iteration 2: log likelihood = -13503.088 Performing gradient-based optimization: Iteration 0: log likelihood = -13503.088 Iteration 1: log likelihood = -13502.469 Iteration 2: log likelihood = -13502.468 Iteration 3: log likelihood = -13502.468 Mixed-effects Poisson regression Number of obs = 12978 -------------------------------------------------------------------------- | No. of Observations per Group Integration Group Variable | Groups Minimum Average Maximum Points ----------------+--------------------------------------------------------- hospid | 56 1 231.8 949 7 patient | 6959 1 1.9 4 7 -------------------------------------------------------------------------- Wald chi2(1) = 1.50 Log likelihood = -13502.468 Prob > chi2 = 0.2202 ------------------------------------------------------------------------------ inf | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- apd | .0452761 .0369316 1.23 0.220 -.0271085 .1176606 _cons | -6.922041 .0735074 -94.17 0.000 -7.066113 -6.777969 ln(duration) | 1 (exposure) ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval] -----------------------------+------------------------------------------------ hospid: Identity | var(_cons) | .1840679 .0504352 .1075836 .3149271 -----------------------------+------------------------------------------------ patient: Identity | var(_cons) | 1.010816 .0477642 .9214039 1.108904 ------------------------------------------------------------------------------ LR test vs. Poisson regression: chi2(2) = 2053.05 Prob > chi2 = 0.0000 Note: LR test is conservative and provided only for reference. . est store meqrp1 . . collapse (sum) inf duration, by(hospid apd patient) . poisson inf apd, exp(duration) Iteration 0: log likelihood = -10969.711 Iteration 1: log likelihood = -10969.378 Iteration 2: log likelihood = -10969.378 Poisson regression Number of obs = 9756 LR chi2(1) = 20.79 Prob > chi2 = 0.0000 Log likelihood = -10969.378 Pseudo R2 = 0.0009 ------------------------------------------------------------------------------ inf | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- apd | -.1149283 .0252019 -4.56 0.000 -.1643231 -.0655334 _cons | -6.439011 .017778 -362.19 0.000 -6.473855 -6.404167 ln(duration) | 1 (exposure) ------------------------------------------------------------------------------ . est store p2 . mepoisson inf apd, exp(duration) || hospid: || patient: Fitting fixed-effects model: Iteration 0: log likelihood = -18163.874 Iteration 1: log likelihood = -11003.634 Iteration 2: log likelihood = -10969.486 Iteration 3: log likelihood = -10969.378 Iteration 4: log likelihood = -10969.378 Refining starting values: Grid node 0: log likelihood = -9019.1437 Fitting full model: Iteration 0: log likelihood = -9019.1437 Iteration 1: log likelihood = -9009.7733 Iteration 2: log likelihood = -8995.3399 Iteration 3: log likelihood = -8986.9478 Iteration 4: log likelihood = -8986.5939 Iteration 5: log likelihood = -8986.5907 Iteration 6: log likelihood = -8986.591 Mixed-effects Poisson regression Number of obs = 9756 ----------------------------------------------------------- | No. of Observations per Group Group Variable | Groups Minimum Average Maximum ----------------+------------------------------------------ hospid | 56 1 174.2 716 patient | 6959 1 1.4 2 ----------------------------------------------------------- Integration method: mvaghermite Integration points = 7 Wald chi2(1) = 0.61 Log likelihood = -8986.591 Prob > chi2 = 0.4365 -------------------------------------------------------------------------------- inf | Coef. Std. Err. z P>|z| [95% Conf. Interval] ---------------+---------------------------------------------------------------- apd | .0277205 .0356266 0.78 0.437 -.0421064 .0975474 _cons | -6.738195 .0441765 -152.53 0.000 -6.824779 -6.651611 ln(duration) | 1 (exposure) ---------------+---------------------------------------------------------------- hospid | var(_cons)| .2065251 .0586674 .1183512 .3603903 ---------------+---------------------------------------------------------------- hospid>patient | var(_cons)| .9873731 .0491168 .89565 1.088489 -------------------------------------------------------------------------------- LR test vs. Poisson regression: chi2(2) = 3965.57 Prob > chi2 = 0.0000 Note: LR test is conservative and provided only for reference. . est store mep2 . meqrpoisson inf apd, exp(duration) || hospid: || patient: Refining starting values: Iteration 0: log likelihood = -9972.7637 Iteration 1: log likelihood = -9949.2459 Iteration 2: log likelihood = -9943.3971 Performing gradient-based optimization: Iteration 0: log likelihood = -9943.3971 Iteration 1: log likelihood = -9942.2696 Iteration 2: log likelihood = -9942.2532 Iteration 3: log likelihood = -9942.2532 Mixed-effects Poisson regression Number of obs = 9756 -------------------------------------------------------------------------- | No. of Observations per Group Integration Group Variable | Groups Minimum Average Maximum Points ----------------+--------------------------------------------------------- hospid | 56 1 174.2 716 7 patient | 6959 1 1.4 2 7 -------------------------------------------------------------------------- Wald chi2(1) = 1.49 Log likelihood = -9942.2532 Prob > chi2 = 0.2217 ------------------------------------------------------------------------------ inf | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- apd | .0451192 .0369239 1.22 0.222 -.0272503 .1174888 _cons | -6.921458 .0734339 -94.25 0.000 -7.065386 -6.77753 ln(duration) | 1 (exposure) ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval] -----------------------------+------------------------------------------------ hospid: Identity | var(_cons) | .1835775 .0503448 .1072467 .3142355 -----------------------------+------------------------------------------------ patient: Identity | var(_cons) | 1.009257 .0477619 .9198554 1.107347 ------------------------------------------------------------------------------ LR test vs. Poisson regression: chi2(2) = 2054.25 Prob > chi2 = 0.0000 Note: LR test is conservative and provided only for reference. . est store meqrp2 . . est table p1 p2 mep1 mep2 meqrp1 meqrp2, keep(apd) eq(1) b se p -------------------------------------------------------------------------------------------- Variable | p1 p2 mep1 mep2 meqrp1 meqrp2 -------------+------------------------------------------------------------------------------ apd | -.11492828 -.11492828 -.07347119 .0277205 .04527607 .04511924 | .02520192 .02520192 .03083056 .03562662 .03693156 .0369239 | 0.0000 0.0000 0.0172 0.4365 0.2202 0.2217 -------------------------------------------------------------------------------------------- legend: b/se/p * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/