Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | "Antonio Rodriguez Andres" <Antonio.Andres@emu.edu.tr> |
To | <statalist@hsphsun2.harvard.edu> |
Subject | st: Xtmixed, Multilevel models and design weights |
Date | Mon, 3 Feb 2014 17:13:20 +0200 |
Dear Stata users I am working on a multilevel model using the European Social Survey, third round. It is a two level model with individuals (level 1) and country (level 2). Based on a previous thread I type xtset cntry idno In the original dataset, there are two types of weights: . Design weight: The design weights are inclusion probabilities for individuals i in countries j. The design weight corrects for slightly different probabilities of selection, thereby making the sample more representative of a 'true' sample of individuals from each country. . Population size weights: The population size weight makes an adjustment to ensure that each country is represented in proportion to its population size. The population size weight is calculated as PWEIGHT= [Population size]/[(Net sample size in data file)*10 000] My question is: do I need to specify the population size weights when I run the multilevel model? I tend to get different results. Below is the regression with design weights applied xtmixed dprt age age2 gender married separated divorced widowed seced terted chldhm missinc medinc highinc ihealth iuemp5yr iuemp12m gender_index06[pw=dweight] || cntry: , mle (32181 missing values generated) Obtaining starting values by EM: Performing gradient-based optimization: Iteration 0: log pseudolikelihood = -29698.399 Iteration 1: log pseudolikelihood = -29698.399 Computing standard errors: Mixed-effects regression Number of obs = 10819 Group variable: cntry Number of groups = 23 Obs per group: min = 190 avg = 470.4 max = 879 Wald chi2(17) = 1667.56 Log pseudolikelihood = -29698.399 Prob > chi2 = 0.0000 (Std. Err. adjusted for 23 clusters in cntry) ---------------------------------------------------------------------------- ---- | Robust dprt | Coef. Std. Err. z P>|z| [95% Conf. Interval] ---------------+------------------------------------------------------------ ---- age | .0531949 .0246902 2.15 0.031 .004803 .1015868 age2 | -.0006209 .0002598 -2.39 0.017 -.0011302 -.0001116 gender | -.4561274 .0618772 -7.37 0.000 -.5774044 -.3348504 married | -.7286765 .1096062 -6.65 0.000 -.9435007 -.5138522 separated | .9733665 .2900381 3.36 0.001 .4049023 1.541831 divorced | .2673798 .1585851 1.69 0.092 -.0434412 .5782009 widowed | 1.378241 .2714682 5.08 0.000 .8461734 1.910309 seced | -.3752529 .096655 -3.88 0.000 -.5646931 -.1858126 terted | -.4058087 .1418846 -2.86 0.004 -.6838973 -.12772 chldhm | .0646216 .0830391 0.78 0.436 -.0981321 .2273752 missinc | -.5729247 .2264561 -2.53 0.011 -1.016771 -.129079 medinc | -.8394265 .2025874 -4.14 0.000 -1.236491 -.4423624 highinc | -1.333281 .2068405 -6.45 0.000 -1.738681 -.9278808 ihealth | -1.687627 .0660921 -25.53 0.000 -1.817165 -1.558089 iuemp5yr | .3617495 .0849113 4.26 0.000 .1953263 .5281726 iuemp12m | .4095986 .1104699 3.71 0.000 .1930816 .6261157 gender_index06 | -5.556036 2.26839 -2.45 0.014 -10.002 -1.110074 _cons | 16.54468 1.720549 9.62 0.000 13.17246 19.91689 ---------------------------------------------------------------------------- ---- ---------------------------------------------------------------------------- -- | Robust Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval] -----------------------------+---------------------------------------------- -- cntry: Identity | sd(_cons) | .4603558 .1069094 .2920231 .7257215 -----------------------------+---------------------------------------------- -- sd(Residual) | 3.742975 .068211 3.611643 3.879082 ---------------------------------------------------------------------------- -- Warning: Sampling weights were specified only at the first level in a multilevel model. If these weights are indicative of overall and not conditional inclusion probabilities, then results may be biased. And here is the regression with both design weights, population size weights and scaling applied: . . xtmixed dprt age age2 gender married separated divorced widowed seced terted chldhm missinc medinc highinc ihealth iuemp5yr iuemp12m gender_index06[pw=dweight] || cntry: , mle pweight(pweight) pwscale(size) Obtaining starting values by EM: Performing gradient-based optimization: Iteration 0: log pseudolikelihood = -36222.212 Iteration 1: log pseudolikelihood = -36219.813 Iteration 2: log pseudolikelihood = -36219.697 Iteration 3: log pseudolikelihood = -36219.693 Iteration 4: log pseudolikelihood = -36219.693 Computing standard errors: Mixed-effects regression Number of obs = 10819 Group variable: cntry Number of groups = 23 Obs per group: min = 190 avg = 470.4 max = 879 Wald chi2(17) = 56602.23 Log pseudolikelihood = -36219.693 Prob > chi2 = 0.0000 (Std. Err. adjusted for 23 clusters in cntry) ---------------------------------------------------------------------------- ---- | Robust dprt | Coef. Std. Err. z P>|z| [95% Conf. Interval] ---------------+------------------------------------------------------------ ---- age | .0625417 .0318887 1.96 0.050 .0000409 .1250425 age2 | -.0007252 .0003449 -2.10 0.035 -.0014012 -.0000493 gender | -.3903829 .0778959 -5.01 0.000 -.5430561 -.2377097 married | -.7816344 .0755469 -10.35 0.000 -.9297036 -.6335652 separated | 1.393169 .3505622 3.97 0.000 .7060796 2.080258 divorced | .4096387 .1762837 2.32 0.020 .064129 .7551483 widowed | 1.67463 .2974701 5.63 0.000 1.091599 2.257661 seced | -.4187948 .0915211 -4.58 0.000 -.5981729 -.2394167 terted | -.3965199 .1042436 -3.80 0.000 -.6008336 -.1922063 chldhm | .0937831 .1176234 0.80 0.425 -.1367546 .3243208 missinc | -.4813444 .3198694 -1.50 0.132 -1.108277 .1455882 medinc | -.8523854 .2845472 -3.00 0.003 -1.410088 -.2946831 highinc | -1.39177 .2832477 -4.91 0.000 -1.946925 -.8366147 ihealth | -1.807828 .086356 -20.93 0.000 -1.977083 -1.638573 iuemp5yr | .4177802 .0926448 4.51 0.000 .2361997 .5993607 iuemp12m | .5078261 .0996891 5.09 0.000 .3124391 .7032132 gender_index06 | -1.416382 1.961742 -0.72 0.470 -5.261326 2.428561 _cons | 13.66673 1.514102 9.03 0.000 10.69915 16.63432 ---------------------------------------------------------------------------- ---- ---------------------------------------------------------------------------- -- | Robust Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval] -----------------------------+---------------------------------------------- -- cntry: Identity | sd(_cons) | .3021236 .1031259 .1547527 .5898356 -----------------------------+---------------------------------------------- -- sd(Residual) | 3.802415 .0817433 3.645529 3.966052 Any suggestions? Regards Antonio -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Alfonso Sánchez-Peñalver Sent: Friday, January 31, 2014 6:24 PM To: Stata List Subject: Re: st: gllamm or xtmixed models? Antonio, before you do the fixed effects estimation you have to run -xtset- as xtset country id to follow the example you gave. Country is the grouping variable, and id is the individual observation variable. You have changed the example in the last email with respect to the previous ones. Going back to the previous ones you can run the fixed effects estimation as xtreg depression x1 x2 x3, fe. Now, in your last email you use -vce(cluster country)-. Using cluster robust variance accounts for correlation of the errors within the clusters, in your case the countries, but not across clusters. The question is whether once you have stripped the errors of the different intercepts by using fixed effects, why do you expect the errors to be correlated within the countries? Consider for example that there is an unobservable variable which measures severity of winter. We expect that the more severe the winter is the more cases of depression, so while controlling for everything else it would make sense that the number of patients with depression in Sweden or Norway is larger than the number of patients with depression in Spain or Italy. Since we cannot control for the severity of the winter because we don't have a measure for it, the errors would capture the effect of this variable on the dependent variable, and thus you would expect the errors for Sweden to be larger than the! errors for Spain, which creates the correlation between the errors in Sweden, and the correlation between the errors in Spain, but not the correlation between errors of Spain and Sweden. Now, when you use fixed effects estimation you are in fact controlling for the average effect of all unobserved characteristics of the countries, so you would be controlling for the average effect of the severity of the winter (among other unobservables) in the countries. Therefore, unless you think there is something else causing the correlation between the errors within the different countries, you don't need the -vce(cluster country)- option. Best, Alfonso Sánchez-Peñalver, PhD Visiting Assistant Professor Suffolk University Senior Instructor UMass Boston On Jan 31, 2014, at 4:26 AM, Antonio Rodriguez Andres <Antonio.Andres@emu.edu.tr> wrote: > Alfonso > > Thank you for your answer. As far as I understood, as the observations > are clustered within countries. I have to account this in my model and > use a two multilevel model. What I can try is a fixed effects model > with clustering at country level > > xtreg dv iv, fe vce (cluster country) > > I should also use the xtset command but I do not have a real panel. > Usually we declare with xtset id year (both dimensions of the panel > data ) but here it is only a cross section > > Can I type > > xtset id country (1 level and second level)? > > > -----Original Message----- > From: owner-statalist@hsphsun2.harvard.edu > [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Alfonso > Sánchez-Peñalver > Sent: Thursday, January 30, 2014 10:31 PM > To: Stata List > Subject: Re: st: gllamm or xtmixed models? > > Hi again Antonio, > > I haven't used -gllamm- (SSC) but my understanding is that you will > also be able to estimate the random effects with it. The fixed effects > can be estimated in two different ways: > > 1. Pooled OLS (-regress-) with a dummy variable for each country and > no constant (-nocons- option) 2. -xtreg- with fe option > > For the second option you will have to first use -xtset- to identify > which is the level 2 (cluster) variable (country) and the level 1 > variable (the individuals). > > As for random slopes, consider the random effects model. The random > effects model assumes that the intercept is a random variable across > countries. What if the intercept is not the only thing that varies > across countries? What if the effect (slope) of a certain variable > (age let's say) also varies across countries? You can include that > variable in the random part of the command to let the slope be a > random variable as well. So for example, going back to your syntax, > assume that you believe the coefficient on x2 to be random as well, you can type: > > xtmixed depression x1 x3 || country: x2 > > Best, > > Alfonso Sánchez-Peñalver, PhD > > Visiting Assistant Professor > Suffolk University > Senior Instructor > UMass Boston > > > > On Jan 30, 2014, at 3:09 PM, Antonio Rodriguez Andres > <Antonio.Andres@emu.edu.tr> wrote: > >> Alfonso >> >> Thank you for your answer. On this way, can I estimate the fixed >> effects for each country? What do they mean by random slopes for all data? >> This can be done using the xtmixed or gllamm command? >> >> >> >> -----Original Message----- >> From: owner-statalist@hsphsun2.harvard.edu >> [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Alfonso >> Sánchez-Peñalver >> Sent: Thursday, January 30, 2014 9:58 PM >> To: Stata List >> Subject: Re: st: gllamm or xtmixed models? >> >> Hola Antonio, >> >> I believe the correct syntax for the random effects model estimated >> via maximum likelihood would be >> >> xtmixed depression x1 x2 x3 || country: >> >> Alfonso Sánchez-Peñalver, PhD >> >> Visiting Assistant Professor >> Suffolk University >> Senior Instructor >> UMass Boston >> >> >> >> On Jan 30, 2014, at 2:52 PM, Antonio Rodriguez Andres >> <Antonio.Andres@emu.edu.tr> wrote: >> >>> Dear stata users >>> >>> I want to estimate multilevel models as I have observations for >>> individuals across countries. My dependent variable İs a measure of >>> mental health ranging from 0 to 24. I want to use hierarchical >>> linear models with fixed effects and random effects for countries. >>> The correct syntax is: >>> >>> xtmixed depression x1 x2 x3 || i(country) >>> >>> Any clue >>> >>> Regards >>> >>> Antonio >>> >>> >>> >>> * >>> * For searches and help try: >>> * http://www.stata.com/help.cgi?search >>> * http://www.stata.com/support/faqs/resources/statalist-faq/ >>> * http://www.ats.ucla.edu/stat/stata/ >> >> >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/faqs/resources/statalist-faq/ >> * http://www.ats.ucla.edu/stat/stata/ >> >> >> >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/faqs/resources/statalist-faq/ >> * http://www.ats.ucla.edu/stat/stata/ > > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ > > > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/