Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Hierarchical model


From   Evans Jadotte <[email protected]>
To   [email protected]
Subject   Re: st: Hierarchical model
Date   Sat, 03 Oct 2009 12:18:18 +0200

Roberto G. Gutierrez, StataCorp wrote:
Evans Jadotte <[email protected]> provides more detail:

  
My original model is: 
    

  
.xi: xtmixed logequiv i.education i.mpsex mpage* i.sector non_farm
small_cattle large_cattle s_s_cattle shock_cattle dumland1 tech_land
electricity landline piped_water road_access death d_rem i.hhtype ||strata:
||cluster:, var
    

  
Strata is my level-3 identifier and cluster (which is nested into strata) is
my level-2 identifier. Then I have households nested into clusters.
    

  
Predict stub* produces the empirical Bayes (EB) prediction of the random
effects on strata ad clusters, which I hoped would coincide with the one
calculated manually after estimation via "restricted maximum
likelihood-REML".
    

  
.egen REML_id = mean(resid), by(id)
    

  
The residuals were calculated previously using the command: predict double
resid, residuals and account for both random intercepts above.  They are
supposed to be the total residuals, i.e. the raw residuals from level-1 for
households, plus the residuals for the two random intercepts (strata and
cluster). I understand that stub* should inferior or equal to REML_id, which
is not for level-3 (strata). Specifically, Stata gives me an EB for strata
that is more than 300 times greater than REML_strata
    

In the -egen- command above, I assume you mean -by(strata)- instead of
-by(id)-, because variable -id- is not part of your mixed model.

In any case, why would you expect the estimated random effects to be smaller
in magnitude than the group-averaged residuals?  Random effects can vary
significantly from group to group around their mean of zero.  Total residuals,
however, have mean zero over the entire dataset, and although they don't need
to have mean zero within each group, it would be weird for them to have a mean
significantly far from zero for any particular group.  After all, these
residuals do take into account the estimated random intercepts, and so all you
have left is observation-level variability.

As an example, consider 

. webuse pig, clear
(Longitudinal analysis of pig weights)

. xtmixed weight week || id:

(output omitted)

. predict r1, reffects

. predict resid, residuals

. egen reml_id = mean(resid), by(id)

. sum r1 reml_id

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
          r1 |       432   -1.98e-08    3.794276   -7.17375   9.079873
     reml_id |       432   -3.93e-09    .1223594  -.2313422   .2928116

This shows the estimated random effects to have larger magnitude (see the 
Std. Dev. column) than the group-averaged residuals, which is what you should
expect.

What I think you are trying to calculate for -reml_id- are group mean
residuals that DO NOT take into account random effects.  Try

. predict xbeta, xb			// No random effects

. gen resid2 = weight - xb 	        // Residual calculated manually

. egen reml_id2 = mean(resid2), by(id)

. sum r1 reml_id2

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
          r1 |       432   -1.98e-08    3.794276   -7.17375   9.079873
    reml_id2 |       432   -7.02e-07    3.916635  -7.405093   9.372684

No you see the behavior that you expect.  The key is that REML-based estimates
of random effects should be based on residuals that do not take into account
random effects.  After all, you are trying to estimate the random effect
itself -- if you leave it in, there is nothing left to estimate.

  
Additionally, the sum of the residuals provided by Stata in the random part
of the output should be the total residual (residual level-1+ residual
level-2 + residual level-3), whose variance I calculate as follows:
    

  
.nl(resid2=exp({xb: $list one}))
    

  
.predict sigma2, xb
    

  
where $list is a skedasticity function. The estimate I get is less than the
residual at level-1, which I did not expect. I do not see quite well what I
am doing wrong here. I would really appreciate your feedback.
    

Given the above, my guess is that if you redefine your residuals to be those
that do not take into account random effects, you'll see behavior more in line
with your expectations.

--Bobby
[email protected]
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
  
Hi Bobby,

(Time difference inhibited prompt response). Indeed many thanks for your feedback. Yes the 'id' stands for either strata or cluster. I shall replace it now. And yes I have tried your procedure of calculating the REMLs based on the raw residuals, but since the results did not make sense to me (they do not have zero mean) so I thought I should proceed with calculations based on total residuals, which gave me a mean zero. My calculations for instance are,

.xi: quietly xtmixed logequiv i.education i.mpsex mpage* i.sector non_farm
small_cattle large_cattle s_s_cattle shock_cattle dumland1 tech_land
electricity landline piped_water road_access death d_rem i.hhtype
||cluster:, var

estimates store model1

.xi: quietly xtmixed logequiv i.education i.mpsex mpage* i.sector non_farm
small_cattle large_cattle s_s_cattle shock_cattle dumland1 tech_land
electricity landline piped_water road_access death d_rem i.hhtype ||strata:
||cluster:, var

. estimates store model2 

 The random part of the model2 look like this:

------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
strata: Identity             |
                  var(_cons) |   .1586556   .0841078      .0561318    .4484368
-----------------------------+------------------------------------------------
cluster: Identity            |
                  var(_cons) |   .2896315   .0253914       .243906    .3439291
-----------------------------+------------------------------------------------
               var(Residual) |   1.025492   .0178371      .9911216    1.061055
------------------------------------------------------------------------------
LR test vs. linear regression:       chi2(2) =  1355.88   Prob > chi2 = 0.0000


. lrtest model1 model2

Likelihood-ratio test                             LR chibar2(01)   =    135.49
(Assumption: model1 nested in model2)             Prob > chibar2   =    0.0000


The last test strongly suggests that cluster is nested into strata and that the latter should be in the model. I tried estimating the other way around (strata nested into cluster) just to check, and convergence failed. So, based on the three-level model:


. predict xbeta, xb                                                                    

. gen res_fix=logequiv-xbeta                                              //raw residuals calculated manually

. predict res_tot, residuals                                                //get total residuals (which account for the random intercepts)

. egen random_c=mean(res_fix), by(cluster)                     //get the random effect for cluster

. egen random_s=mean(res_fix), by(strata)                      //get the random effect for strata

. predict EB_cluster, reffects level(cluster)                       //get the empirical Bayes of re_cluster

. predict EB_strata, reffects level(strata)                          //get the empirical Bayes of re_strata


. sum res_fix res_tot EB_cluster random_c EB_strata random_s

Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
     res_fix |      7157    .1470659    1.192167  -5.220241   5.231438
     res_tot |      7157    5.62e-10     .983868   -4.57931   4.469897
  EB_cluster |      7157    .0126679    .4600395  -1.764327   1.255301
    random_c |      7157    .1470659    .6867958   -2.18074   2.053363
   EB_strata |      7157    .1343979    .3803131  -.7883288   .6443056
-------------+--------------------------------------------------------
    random_s |      7157    .1470659    .3739612  -.8865975    .622515


As can be observed here, except res_tot (total residuals accounting for the random intercepts) have zero mean, and to some extend EB_cluster. I also expected the Std.Dev. of (res_fix^2 + random_c^2 +random_s^2) to be + or - equal to the sum of the variances of the random parameters from the Stata output, which in turn should be equal to res_tot^2 (its Std.Dev.). Also, the 300 times greater I mentioned in my previous mail regarding the EB and the REML for strata is for some observations. However, results seems to be correct here on balance so this can be disregarded.  Now to get the variances:


. gen res_tot2=res_tot^2

. xi: nl(res_tot2=exp({xb: Z one}))                                 //Z is skedasticity function, which is based on variables where skedasticity was detected.

. predict sigma2

. sum sigma2

 
    Variable |       Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------

      sigma2 |      7157     .967807    .1937023   .2968854   4.334954


The total (predicted) variance 0.97, which is the sum (Var(raw residuals) + Var(radom_c) + Var(random_s)) is inferior to the variance of the raw residuals 1.03

In any case, the nesting in the data seems to be correct, but the results are not what I expected according to theory even by trying to set Z to overlap totally the fixed part matrix of th original model. I am quite puzzled and the reference books (Rabe-Hesketh ad Skrondal, 2005); (Raudenbush and Bryck, 2002) do not help much.

Well, this is a long thread Bobby and  I do not want to take too much of your time with this and I will understand if you have better fish to fry. I will keep digging to see what I find but any further suggestions would be very helpful!

Thanks,

Evans

   











© Copyright 1996–2025 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index