Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: Xtmixed, Multilevel models and design weights
From
"Antonio Rodriguez Andres" <[email protected]>
To
<[email protected]>
Subject
st: Xtmixed, Multilevel models and design weights
Date
Mon, 3 Feb 2014 17:13:20 +0200
Dear Stata users
I am working on a multilevel model using the European Social Survey, third
round. It is a two level model with individuals (level 1) and country (level
2). Based on a previous thread I type
xtset cntry idno
In the original dataset, there are two types of weights:
. Design weight: The design weights are inclusion probabilities for
individuals i in countries j. The design weight corrects for slightly
different probabilities of selection, thereby making the sample more
representative of a 'true' sample of individuals from each country.
. Population size weights: The population size weight makes an
adjustment to ensure that each country is represented in proportion to its
population size. The population size weight is calculated as PWEIGHT=
[Population size]/[(Net sample size in data file)*10 000]
My question is: do I need to specify the population size weights when I run
the multilevel model? I tend to get different results. Below is the
regression with design weights applied
xtmixed dprt age age2 gender married separated divorced widowed seced terted
chldhm missinc medinc highinc ihealth iuemp5yr iuemp12m
gender_index06[pw=dweight] || cntry: , mle
(32181 missing values generated)
Obtaining starting values by EM:
Performing gradient-based optimization:
Iteration 0: log pseudolikelihood = -29698.399
Iteration 1: log pseudolikelihood = -29698.399
Computing standard errors:
Mixed-effects regression Number of obs =
10819
Group variable: cntry Number of groups =
23
Obs per group: min =
190
avg =
470.4
max =
879
Wald chi2(17) =
1667.56
Log pseudolikelihood = -29698.399 Prob > chi2 =
0.0000
(Std. Err. adjusted for 23 clusters in
cntry)
----------------------------------------------------------------------------
----
| Robust
dprt | Coef. Std. Err. z P>|z| [95% Conf.
Interval]
---------------+------------------------------------------------------------
----
age | .0531949 .0246902 2.15 0.031 .004803
.1015868
age2 | -.0006209 .0002598 -2.39 0.017 -.0011302
-.0001116
gender | -.4561274 .0618772 -7.37 0.000 -.5774044
-.3348504
married | -.7286765 .1096062 -6.65 0.000 -.9435007
-.5138522
separated | .9733665 .2900381 3.36 0.001 .4049023
1.541831
divorced | .2673798 .1585851 1.69 0.092 -.0434412
.5782009
widowed | 1.378241 .2714682 5.08 0.000 .8461734
1.910309
seced | -.3752529 .096655 -3.88 0.000 -.5646931
-.1858126
terted | -.4058087 .1418846 -2.86 0.004 -.6838973
-.12772
chldhm | .0646216 .0830391 0.78 0.436 -.0981321
.2273752
missinc | -.5729247 .2264561 -2.53 0.011 -1.016771
-.129079
medinc | -.8394265 .2025874 -4.14 0.000 -1.236491
-.4423624
highinc | -1.333281 .2068405 -6.45 0.000 -1.738681
-.9278808
ihealth | -1.687627 .0660921 -25.53 0.000 -1.817165
-1.558089
iuemp5yr | .3617495 .0849113 4.26 0.000 .1953263
.5281726
iuemp12m | .4095986 .1104699 3.71 0.000 .1930816
.6261157
gender_index06 | -5.556036 2.26839 -2.45 0.014 -10.002
-1.110074
_cons | 16.54468 1.720549 9.62 0.000 13.17246
19.91689
----------------------------------------------------------------------------
----
----------------------------------------------------------------------------
--
| Robust
Random-effects Parameters | Estimate Std. Err. [95% Conf.
Interval]
-----------------------------+----------------------------------------------
--
cntry: Identity |
sd(_cons) | .4603558 .1069094 .2920231
.7257215
-----------------------------+----------------------------------------------
--
sd(Residual) | 3.742975 .068211 3.611643
3.879082
----------------------------------------------------------------------------
--
Warning: Sampling weights were specified only at the first level in a
multilevel model. If these weights are indicative of overall and not
conditional inclusion probabilities, then results may be biased.
And here is the regression with both design weights, population size weights
and scaling applied:
. . xtmixed dprt age age2 gender married separated divorced widowed seced
terted chldhm missinc medinc highinc ihealth iuemp5yr iuemp12m
gender_index06[pw=dweight] || cntry: , mle pweight(pweight) pwscale(size)
Obtaining starting values by EM:
Performing gradient-based optimization:
Iteration 0: log pseudolikelihood = -36222.212
Iteration 1: log pseudolikelihood = -36219.813
Iteration 2: log pseudolikelihood = -36219.697
Iteration 3: log pseudolikelihood = -36219.693
Iteration 4: log pseudolikelihood = -36219.693
Computing standard errors:
Mixed-effects regression Number of obs =
10819
Group variable: cntry Number of groups =
23
Obs per group: min =
190
avg =
470.4
max =
879
Wald chi2(17) =
56602.23
Log pseudolikelihood = -36219.693 Prob > chi2 =
0.0000
(Std. Err. adjusted for 23 clusters in
cntry)
----------------------------------------------------------------------------
----
| Robust
dprt | Coef. Std. Err. z P>|z| [95% Conf.
Interval]
---------------+------------------------------------------------------------
----
age | .0625417 .0318887 1.96 0.050 .0000409
.1250425
age2 | -.0007252 .0003449 -2.10 0.035 -.0014012
-.0000493
gender | -.3903829 .0778959 -5.01 0.000 -.5430561
-.2377097
married | -.7816344 .0755469 -10.35 0.000 -.9297036
-.6335652
separated | 1.393169 .3505622 3.97 0.000 .7060796
2.080258
divorced | .4096387 .1762837 2.32 0.020 .064129
.7551483
widowed | 1.67463 .2974701 5.63 0.000 1.091599
2.257661
seced | -.4187948 .0915211 -4.58 0.000 -.5981729
-.2394167
terted | -.3965199 .1042436 -3.80 0.000 -.6008336
-.1922063
chldhm | .0937831 .1176234 0.80 0.425 -.1367546
.3243208
missinc | -.4813444 .3198694 -1.50 0.132 -1.108277
.1455882
medinc | -.8523854 .2845472 -3.00 0.003 -1.410088
-.2946831
highinc | -1.39177 .2832477 -4.91 0.000 -1.946925
-.8366147
ihealth | -1.807828 .086356 -20.93 0.000 -1.977083
-1.638573
iuemp5yr | .4177802 .0926448 4.51 0.000 .2361997
.5993607
iuemp12m | .5078261 .0996891 5.09 0.000 .3124391
.7032132
gender_index06 | -1.416382 1.961742 -0.72 0.470 -5.261326
2.428561
_cons | 13.66673 1.514102 9.03 0.000 10.69915
16.63432
----------------------------------------------------------------------------
----
----------------------------------------------------------------------------
--
| Robust
Random-effects Parameters | Estimate Std. Err. [95% Conf.
Interval]
-----------------------------+----------------------------------------------
--
cntry: Identity |
sd(_cons) | .3021236 .1031259 .1547527
.5898356
-----------------------------+----------------------------------------------
--
sd(Residual) | 3.802415 .0817433 3.645529
3.966052
Any suggestions?
Regards
Antonio
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Alfonso
Sánchez-Peñalver
Sent: Friday, January 31, 2014 6:24 PM
To: Stata List
Subject: Re: st: gllamm or xtmixed models?
Antonio,
before you do the fixed effects estimation you have to run -xtset- as
xtset country id
to follow the example you gave. Country is the grouping variable, and id is
the individual observation variable. You have changed the example in the
last email with respect to the previous ones. Going back to the previous
ones you can run the fixed effects estimation as
xtreg depression x1 x2 x3, fe.
Now, in your last email you use -vce(cluster country)-. Using cluster robust
variance accounts for correlation of the errors within the clusters, in your
case the countries, but not across clusters. The question is whether once
you have stripped the errors of the different intercepts by using fixed
effects, why do you expect the errors to be correlated within the countries?
Consider for example that there is an unobservable variable which measures
severity of winter. We expect that the more severe the winter is the more
cases of depression, so while controlling for everything else it would make
sense that the number of patients with depression in Sweden or Norway is
larger than the number of patients with depression in Spain or Italy. Since
we cannot control for the severity of the winter because we don't have a
measure for it, the errors would capture the effect of this variable on the
dependent variable, and thus you would expect the errors for Sweden to be
larger than the!
errors for Spain, which creates the correlation between the errors in
Sweden, and the correlation between the errors in Spain, but not the
correlation between errors of Spain and Sweden. Now, when you use fixed
effects estimation you are in fact controlling for the average effect of all
unobserved characteristics of the countries, so you would be controlling for
the average effect of the severity of the winter (among other unobservables)
in the countries. Therefore, unless you think there is something else
causing the correlation between the errors within the different countries,
you don't need the -vce(cluster country)- option.
Best,
Alfonso Sánchez-Peñalver, PhD
Visiting Assistant Professor
Suffolk University
Senior Instructor
UMass Boston
On Jan 31, 2014, at 4:26 AM, Antonio Rodriguez Andres
<[email protected]> wrote:
> Alfonso
>
> Thank you for your answer. As far as I understood, as the observations
> are clustered within countries. I have to account this in my model and
> use a two multilevel model. What I can try is a fixed effects model
> with clustering at country level
>
> xtreg dv iv, fe vce (cluster country)
>
> I should also use the xtset command but I do not have a real panel.
> Usually we declare with xtset id year (both dimensions of the panel
> data ) but here it is only a cross section
>
> Can I type
>
> xtset id country (1 level and second level)?
>
>
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Alfonso
> Sánchez-Peñalver
> Sent: Thursday, January 30, 2014 10:31 PM
> To: Stata List
> Subject: Re: st: gllamm or xtmixed models?
>
> Hi again Antonio,
>
> I haven't used -gllamm- (SSC) but my understanding is that you will
> also be able to estimate the random effects with it. The fixed effects
> can be estimated in two different ways:
>
> 1. Pooled OLS (-regress-) with a dummy variable for each country and
> no constant (-nocons- option) 2. -xtreg- with fe option
>
> For the second option you will have to first use -xtset- to identify
> which is the level 2 (cluster) variable (country) and the level 1
> variable (the individuals).
>
> As for random slopes, consider the random effects model. The random
> effects model assumes that the intercept is a random variable across
> countries. What if the intercept is not the only thing that varies
> across countries? What if the effect (slope) of a certain variable
> (age let's say) also varies across countries? You can include that
> variable in the random part of the command to let the slope be a
> random variable as well. So for example, going back to your syntax,
> assume that you believe the coefficient on x2 to be random as well, you
can type:
>
> xtmixed depression x1 x3 || country: x2
>
> Best,
>
> Alfonso Sánchez-Peñalver, PhD
>
> Visiting Assistant Professor
> Suffolk University
> Senior Instructor
> UMass Boston
>
>
>
> On Jan 30, 2014, at 3:09 PM, Antonio Rodriguez Andres
> <[email protected]> wrote:
>
>> Alfonso
>>
>> Thank you for your answer. On this way, can I estimate the fixed
>> effects for each country? What do they mean by random slopes for all
data?
>> This can be done using the xtmixed or gllamm command?
>>
>>
>>
>> -----Original Message-----
>> From: [email protected]
>> [mailto:[email protected]] On Behalf Of Alfonso
>> Sánchez-Peñalver
>> Sent: Thursday, January 30, 2014 9:58 PM
>> To: Stata List
>> Subject: Re: st: gllamm or xtmixed models?
>>
>> Hola Antonio,
>>
>> I believe the correct syntax for the random effects model estimated
>> via maximum likelihood would be
>>
>> xtmixed depression x1 x2 x3 || country:
>>
>> Alfonso Sánchez-Peñalver, PhD
>>
>> Visiting Assistant Professor
>> Suffolk University
>> Senior Instructor
>> UMass Boston
>>
>>
>>
>> On Jan 30, 2014, at 2:52 PM, Antonio Rodriguez Andres
>> <[email protected]> wrote:
>>
>>> Dear stata users
>>>
>>> I want to estimate multilevel models as I have observations for
>>> individuals across countries. My dependent variable İs a measure of
>>> mental health ranging from 0 to 24. I want to use hierarchical
>>> linear models with fixed effects and random effects for countries.
>>> The correct syntax is:
>>>
>>> xtmixed depression x1 x2 x3 || i(country)
>>>
>>> Any clue
>>>
>>> Regards
>>>
>>> Antonio
>>>
>>>
>>>
>>> *
>>> * For searches and help try:
>>> * http://www.stata.com/help.cgi?search
>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>> * http://www.ats.ucla.edu/stat/stata/
>>
>>
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>> * http://www.ats.ucla.edu/stat/stata/
>>
>>
>>
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>> * http://www.ats.ucla.edu/stat/stata/
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
>
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/