Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: mepoisson vs meqrpoisson

From	Phil Clayton <[email protected]>
To	[email protected]
Subject	Re: st: mepoisson vs meqrpoisson
Date	Sat, 3 Aug 2013 17:48:19 +1000
Hi Steve,

Thanks for your reply. I did try changing the number of integration points but forgot to mention this, sorry (it didn't substantially change the results). In any case StataCorp is now on the case so we'll see what they say.

I agree it's a nice example of Simpson's paradox - and of course this is why I want to use the multilevel model.

Cheers,
Phil

On 03/08/2013, at 1:28 AM, Steve Samuels <[email protected]> wrote:

> Correction: The differing signs for the pe & *meqr* adp coefficients are
> likely an instance of Simpson's paradox.
> 
> SS
> 
> Phil:
> 
> 1. I can't explain the difference between the -mepoisson and
> -mepqrpoisson-. The near equivalence of the meqr1 and meqr2 estimates
> gives me the impression that the QR results are more accurate, but I
> really don't know. In any case, the comparisons between me and meqr are
> premature if you haven't yet followed this advice from the ME manual
> (page 10):
> 
> "When fitting mixed-effects models, you should always assess whether the
> approximation is adequate by refitting the model with a larger number of
> quadrature points."
> 
> Try this and report back.
> 
> 2. The differing signs for the pe & me adp coefficients are an instance
> of Simpson's paradox. To save Nick Cox a post, I'll add that this
> phenomenon was recognized earlier by Yule and by Karl Pearson. See:
> https://en.wikipedia.org/wiki/Simpson%27s_paradox
> 
> 
> Steve
> 
> 
> 
> 
> On Aug 1, 2013, at 9:05 PM, Phil Clayton wrote:
> 
> Dear Statalisters,
> 
> I have been experimenting with the new -mepoisson- and -meqrpoisson- commands and have found substantively different results that I can't explain.
> 
> I am analysing infections in patients undergoing dialysis. I have a 3-level model - episodes of dialysis are clustered within patients (n=6959), who are clustered within hospitals (n=56). The outcome is "inf" (number of infections) and my main exposure of interest is "apd" (type of dialysis) which can vary within patients. Exposure time is stored in "duration".
> 
> There is also a variable called "anyprevinf" which indicates if the patient has previously had an infection. So there can be up to 4 observations per patient, eg:
>  hospid   patient   anypre~f   apd   duration   inf  
>      13        49          0     0          3     0  
>      13        49          0     1         49     1  
>      13        49          1     0         89     0  
>      13        49          1     1         20     0  
> 
> To illustrate the problem I am ignoring the other covariates and have run 3 models - simple Poisson model ignoring the hierarchical structure, -mepoisson- and -meqrpoisson-:
> poisson inf apd, exp(duration)
> est store p1
> mepoisson inf apd, exp(duration) || hospid: || patient:
> est store mep1
> meqrpoisson inf apd, exp(duration) || hospid: || patient:
> est store meqrp1
> 
> Since in this example I'm not using anyprevinf I then collapsed the dataset and re-ran the same 3 models:
> collapse (sum) inf duration, by(hospid apd patient)
> poisson inf apd, exp(duration)
> est store p2
> mepoisson inf apd, exp(duration) || hospid: || patient:
> est store mep2
> meqrpoisson inf apd, exp(duration) || hospid: || patient:
> est store meqrp2
> 
> My results are summarised as follows:
> . est table p1 p2 mep1 mep2 meqrp1 meqrp2, keep(apd) eq(1) b se p
> 
> --------------------------------------------------------------------------------------------
>  Variable |     p1           p2          mep1         mep2        meqrp1       meqrp2    
> -------------+------------------------------------------------------------------------------
>       apd | -.11492828   -.11492828   -.07347119     .0277205    .04527607    .04511924  
>           |  .02520192    .02520192    .03083056    .03562662    .03693156     .0369239  
>           |     0.0000       0.0000       0.0172       0.4365       0.2202       0.2217  
> --------------------------------------------------------------------------------------------
>                                                                            legend: b/se/p
> 
> Comments:
> - p1 and p2 are identical as expected
> - mep1 and mep2 are quite different which I don't understand - I think they should be almost totally equivalent since the only difference was the -collapse- (see meqrp1 vs meqrp2)
> - mep1 and meqr1 are quite different which I don't understand - I thought they were fitting the same model using a slightly different technique
> - mep2 and meqr2 are pretty similar, but not as similar as I was expecting since my understanding is that they're fitting the same model
> 
> I would be grateful for any suggestions as to what's happening here. Below I've put a log of some descriptive data and the full output from each model.
> 
> One subtle potential problem is that if anyprevinf==1 there must be another observation for that patient with anyprevinf==0 & inf==1 - so in the uncollapsed dataset there isn't perfect conditional independence. Not sure if that's enough to explain the discrepancy though.
> 
> Kind regards,
> Phil
> 
> . describe
> 
> Contains data
> obs:        12,978                          
> vars:             6                          
> size:       272,538                          (_dta has notes)
> -------------------------------------------------------------------------------------------------------------------
>            storage   display    value
> variable name   type    format     label      variable label
> -------------------------------------------------------------------------------------------------------------------
> patient         int     %9.0g                 
> apd             byte    %8.0g                 
> anyprevinf      byte    %8.0g                 
> hospid          byte    %9.0g                 group(hospcode)
> inf             double  %8.0g                 (sum) inf
> duration        double  %9.0g                 (sum) duration
> -------------------------------------------------------------------------------------------------------------------
> Sorted by:  hospid  apd  patient  anyprevinf
>   Note:  dataset has changed since last saved
> 
> . misstable summarize // no missing data
> (variables nonmissing or string)
> 
> . codebook hospid patient
> 
> -------------------------------------------------------------------------------------------------------------------
> hospid                                                                                              group(hospcode)
> -------------------------------------------------------------------------------------------------------------------
> 
>                type:  numeric (byte)
> 
>               range:  [1,56]                       units:  1
>       unique values:  56                       missing .:  0/12978
> 
>                mean:   33.0623
>            std. dev:    15.064
> 
>         percentiles:        10%       25%       50%       75%       90%
>                              10        20        35        47        53
> 
> -------------------------------------------------------------------------------------------------------------------
> patient                                                                                                 (unlabeled)
> -------------------------------------------------------------------------------------------------------------------
> 
>                type:  numeric (int)
> 
>               range:  [1,6959]                     units:  1
>       unique values:  6959                     missing .:  0/12978
> 
>                mean:   3461.36
>            std. dev:   2043.82
> 
>         percentiles:        10%       25%       50%       75%       90%
>                             674      1650    3457.5      5258      6284
> 
> . bysort patient: assert hospid==hospid[_n-1] if _n>1
> 
> . tab apd
> 
>      apd |      Freq.     Percent        Cum.
> ------------+-----------------------------------
>        0 |      7,275       56.06       56.06
>        1 |      5,703       43.94      100.00
> ------------+-----------------------------------
>    Total |     12,978      100.00
> 
> . tab anyprevinf
> 
> anyprevinf |      Freq.     Percent        Cum.
> ------------+-----------------------------------
>        0 |      9,411       72.52       72.52
>        1 |      3,567       27.48      100.00
> ------------+-----------------------------------
>    Total |     12,978      100.00
> 
> . tab inf
> 
> (sum) inf |      Freq.     Percent        Cum.
> ------------+-----------------------------------
>        0 |      8,231       63.42       63.42
>        1 |      3,980       30.67       94.09
>        2 |        388        2.99       97.08
>        3 |        185        1.43       98.51
>        4 |         89        0.69       99.19
>        5 |         54        0.42       99.61
>        6 |         25        0.19       99.80
>        7 |         11        0.08       99.88
>        8 |          7        0.05       99.94
>        9 |          3        0.02       99.96
>       10 |          4        0.03       99.99
>       11 |          1        0.01      100.00
> ------------+-----------------------------------
>    Total |     12,978      100.00
> 
> . sum duration
> 
>  Variable |       Obs        Mean    Std. Dev.       Min        Max
> -------------+--------------------------------------------------------
>  duration |     12978    322.0892    389.4168         .5       2802
> 
> . 
> . poisson inf apd, exp(duration)
> 
> Iteration 0:   log likelihood = -14529.051  
> Iteration 1:   log likelihood = -14528.992  
> Iteration 2:   log likelihood = -14528.992  
> 
> Poisson regression                                Number of obs   =      12978
>                                                LR chi2(1)      =      20.79
>                                                Prob > chi2     =     0.0000
> Log likelihood = -14528.992                       Pseudo R2       =     0.0007
> 
> ------------------------------------------------------------------------------
>       inf |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
> -------------+----------------------------------------------------------------
>       apd |  -.1149283   .0252019    -4.56   0.000    -.1643231   -.0655334
>     _cons |  -6.439011    .017778  -362.19   0.000    -6.473855   -6.404167
> ln(duration) |          1  (exposure)
> ------------------------------------------------------------------------------
> 
> . est store p1
> 
> . mepoisson inf apd, exp(duration) || hospid: || patient:
> 
> Fitting fixed-effects model:
> 
> Iteration 0:   log likelihood =  -26408.57  
> Iteration 1:   log likelihood =  -14532.71  
> Iteration 2:   log likelihood = -14528.994  
> Iteration 3:   log likelihood = -14528.992  
> Iteration 4:   log likelihood = -14528.992  
> 
> Refining starting values:
> 
> Grid node 0:   log likelihood = -9069.4652
> 
> Fitting full model:
> 
> Iteration 0:   log likelihood = -9069.4652  
> Iteration 1:   log likelihood = -9063.4762  
> Iteration 2:   log likelihood =  -9059.664  
> Iteration 3:   log likelihood =  -9058.714  
> Iteration 4:   log likelihood =  -9058.715  
> Iteration 5:   log likelihood =  -9058.715  
> 
> Mixed-effects Poisson regression                Number of obs      =     12978
> 
> -----------------------------------------------------------
>              |   No. of       Observations per Group
> Group Variable |   Groups    Minimum    Average    Maximum
> ----------------+------------------------------------------
>       hospid |       56          1      231.8        949
>      patient |     6959          1        1.9          4
> -----------------------------------------------------------
> 
> Integration method: mvaghermite                 Integration points =         7
> 
>                                              Wald chi2(1)       =      5.68
> Log likelihood =  -9058.715                     Prob > chi2        =    0.0172
> --------------------------------------------------------------------------------
>         inf |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
> ---------------+----------------------------------------------------------------
>         apd |  -.0734712   .0308306    -2.38   0.017     -.133898   -.0130444
>       _cons |  -6.505974   .0244448  -266.15   0.000    -6.553884   -6.458063
> ln(duration) |          1  (exposure)
> ---------------+----------------------------------------------------------------
> hospid         |
>   var(_cons)|   .3186859   .0910349                      .1820586    .5578463
> ---------------+----------------------------------------------------------------
> hospid>patient |
>   var(_cons)|    .912644   .0565245                      .8083181    1.030435
> --------------------------------------------------------------------------------
> LR test vs. Poisson regression:      chi2(2) = 10940.55   Prob > chi2 = 0.0000
> 
> Note: LR test is conservative and provided only for reference.
> 
> . est store mep1
> 
> . meqrpoisson inf apd, exp(duration) || hospid: || patient:
> 
> Refining starting values: 
> 
> Iteration 0:   log likelihood = -13532.955  
> Iteration 1:   log likelihood = -13514.626  
> Iteration 2:   log likelihood = -13503.088  
> 
> Performing gradient-based optimization: 
> 
> Iteration 0:   log likelihood = -13503.088  
> Iteration 1:   log likelihood = -13502.469  
> Iteration 2:   log likelihood = -13502.468  
> Iteration 3:   log likelihood = -13502.468  
> 
> Mixed-effects Poisson regression                Number of obs      =     12978
> 
> --------------------------------------------------------------------------
>              |   No. of       Observations per Group       Integration
> Group Variable |   Groups    Minimum    Average    Maximum      Points
> ----------------+---------------------------------------------------------
>       hospid |       56          1      231.8        949           7
>      patient |     6959          1        1.9          4           7
> --------------------------------------------------------------------------
> 
>                                              Wald chi2(1)       =      1.50
> Log likelihood = -13502.468                     Prob > chi2        =    0.2202
> 
> ------------------------------------------------------------------------------
>       inf |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
> -------------+----------------------------------------------------------------
>       apd |   .0452761   .0369316     1.23   0.220    -.0271085    .1176606
>     _cons |  -6.922041   .0735074   -94.17   0.000    -7.066113   -6.777969
> ln(duration) |          1  (exposure)
> ------------------------------------------------------------------------------
> 
> ------------------------------------------------------------------------------
> Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
> -----------------------------+------------------------------------------------
> hospid: Identity             |
>                var(_cons) |   .1840679   .0504352      .1075836    .3149271
> -----------------------------+------------------------------------------------
> patient: Identity            |
>                var(_cons) |   1.010816   .0477642      .9214039    1.108904
> ------------------------------------------------------------------------------
> LR test vs. Poisson regression:      chi2(2) =  2053.05   Prob > chi2 = 0.0000
> 
> Note: LR test is conservative and provided only for reference.
> 
> . est store meqrp1
> 
> . 
> . collapse (sum) inf duration, by(hospid apd patient)
> 
> . poisson inf apd, exp(duration)
> 
> Iteration 0:   log likelihood = -10969.711  
> Iteration 1:   log likelihood = -10969.378  
> Iteration 2:   log likelihood = -10969.378  
> 
> Poisson regression                                Number of obs   =       9756
>                                                LR chi2(1)      =      20.79
>                                                Prob > chi2     =     0.0000
> Log likelihood = -10969.378                       Pseudo R2       =     0.0009
> 
> ------------------------------------------------------------------------------
>       inf |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
> -------------+----------------------------------------------------------------
>       apd |  -.1149283   .0252019    -4.56   0.000    -.1643231   -.0655334
>     _cons |  -6.439011    .017778  -362.19   0.000    -6.473855   -6.404167
> ln(duration) |          1  (exposure)
> ------------------------------------------------------------------------------
> 
> . est store p2
> 
> . mepoisson inf apd, exp(duration) || hospid: || patient:
> 
> Fitting fixed-effects model:
> 
> Iteration 0:   log likelihood = -18163.874  
> Iteration 1:   log likelihood = -11003.634  
> Iteration 2:   log likelihood = -10969.486  
> Iteration 3:   log likelihood = -10969.378  
> Iteration 4:   log likelihood = -10969.378  
> 
> Refining starting values:
> 
> Grid node 0:   log likelihood = -9019.1437
> 
> Fitting full model:
> 
> Iteration 0:   log likelihood = -9019.1437  
> Iteration 1:   log likelihood = -9009.7733  
> Iteration 2:   log likelihood = -8995.3399  
> Iteration 3:   log likelihood = -8986.9478  
> Iteration 4:   log likelihood = -8986.5939  
> Iteration 5:   log likelihood = -8986.5907  
> Iteration 6:   log likelihood =  -8986.591  
> 
> Mixed-effects Poisson regression                Number of obs      =      9756
> 
> -----------------------------------------------------------
>              |   No. of       Observations per Group
> Group Variable |   Groups    Minimum    Average    Maximum
> ----------------+------------------------------------------
>       hospid |       56          1      174.2        716
>      patient |     6959          1        1.4          2
> -----------------------------------------------------------
> 
> Integration method: mvaghermite                 Integration points =         7
> 
>                                              Wald chi2(1)       =      0.61
> Log likelihood =  -8986.591                     Prob > chi2        =    0.4365
> --------------------------------------------------------------------------------
>         inf |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
> ---------------+----------------------------------------------------------------
>         apd |   .0277205   .0356266     0.78   0.437    -.0421064    .0975474
>       _cons |  -6.738195   .0441765  -152.53   0.000    -6.824779   -6.651611
> ln(duration) |          1  (exposure)
> ---------------+----------------------------------------------------------------
> hospid         |
>   var(_cons)|   .2065251   .0586674                      .1183512    .3603903
> ---------------+----------------------------------------------------------------
> hospid>patient |
>   var(_cons)|   .9873731   .0491168                        .89565    1.088489
> --------------------------------------------------------------------------------
> LR test vs. Poisson regression:      chi2(2) =  3965.57   Prob > chi2 = 0.0000
> 
> Note: LR test is conservative and provided only for reference.
> 
> . est store mep2
> 
> . meqrpoisson inf apd, exp(duration) || hospid: || patient:
> 
> Refining starting values: 
> 
> Iteration 0:   log likelihood = -9972.7637  
> Iteration 1:   log likelihood = -9949.2459  
> Iteration 2:   log likelihood = -9943.3971  
> 
> Performing gradient-based optimization: 
> 
> Iteration 0:   log likelihood = -9943.3971  
> Iteration 1:   log likelihood = -9942.2696  
> Iteration 2:   log likelihood = -9942.2532  
> Iteration 3:   log likelihood = -9942.2532  
> 
> Mixed-effects Poisson regression                Number of obs      =      9756
> 
> --------------------------------------------------------------------------
>              |   No. of       Observations per Group       Integration
> Group Variable |   Groups    Minimum    Average    Maximum      Points
> ----------------+---------------------------------------------------------
>       hospid |       56          1      174.2        716           7
>      patient |     6959          1        1.4          2           7
> --------------------------------------------------------------------------
> 
>                                              Wald chi2(1)       =      1.49
> Log likelihood = -9942.2532                     Prob > chi2        =    0.2217
> 
> ------------------------------------------------------------------------------
>       inf |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
> -------------+----------------------------------------------------------------
>       apd |   .0451192   .0369239     1.22   0.222    -.0272503    .1174888
>     _cons |  -6.921458   .0734339   -94.25   0.000    -7.065386    -6.77753
> ln(duration) |          1  (exposure)
> ------------------------------------------------------------------------------
> 
> ------------------------------------------------------------------------------
> Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
> -----------------------------+------------------------------------------------
> hospid: Identity             |
>                var(_cons) |   .1835775   .0503448      .1072467    .3142355
> -----------------------------+------------------------------------------------
> patient: Identity            |
>                var(_cons) |   1.009257   .0477619      .9198554    1.107347
> ------------------------------------------------------------------------------
> LR test vs. Poisson regression:      chi2(2) =  2054.25   Prob > chi2 = 0.0000
> 
> Note: LR test is conservative and provided only for reference.
> 
> . est store meqrp2
> 
> . 
> . est table p1 p2 mep1 mep2 meqrp1 meqrp2, keep(apd) eq(1) b se p
> 
> --------------------------------------------------------------------------------------------
>  Variable |     p1           p2          mep1         mep2        meqrp1       meqrp2    
> -------------+------------------------------------------------------------------------------
>       apd | -.11492828   -.11492828   -.07347119     .0277205    .04527607    .04511924  
>           |  .02520192    .02520192    .03083056    .03562662    .03693156     .0369239  
>           |     0.0000       0.0000       0.0172       0.4365       0.2202       0.2217  
> --------------------------------------------------------------------------------------------
>                                                                            legend: b/se/p
> 
> 
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
> 
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
References:
- st: mepoisson vs meqrpoisson
  - From: Phil Clayton <[email protected]>
- Re: st: mepoisson vs meqrpoisson
  - From: Steve Samuels <[email protected]>
Prev by Date: Re: st: Merging w/ substituting for years with missing values
Next by Date: Re: st: How do I suppress computation of standard errors in stata gmm function?
Previous by thread: Re: st: mepoisson vs meqrpoisson
Next by thread: st: mepoisson vs meqrpoisson
Index(es):
- Date
- Thread