Following this helpful thread, I have a similar question and appreciate any help with it.
Our data consist of the combined responses of a survey administered across countries, where multi-state sampling design was used in each country. Countries were not selected randomly. The survey data for each country thus have stratum and psu variables, corresponding to school and class, respectively; and an individual-level weight variable, pweight. Country-level random effects and individual-level fixed effects will be examined in the modeling procedure. I understand from previous advice and posts that gllamm might be the most suitable command to use in Stata.
Could anyone advise with regard to the following two questions?
How should grouping and/or clustering be specified (e.g. something like i(psu stratum, country)?
How should weights be applied (eg. pweight for individuals, average of pweight per psu and/or stratum, 1 for country something along the lines of the previous thread)?
Many thanks for any help.
Hillel
[email protected]
From "Kanter, Rebecca" <[email protected]>
To "[email protected]" <[email protected]>
Subject RE: st: gllamm with pweights
Date Thu, 23 Jul 2009 15:00:39 -0400
Hi Stas and Steve and the rest of statalist,
I spoke with a statistician yesterday (that did assist with some of the making of the original pweigts) and agreed that my first level should be individuals, level 2-census tract (codeupm), and level 3 if i wanted it, state.
And that the L1 weights should be the original individual pweights
And that the L2 census tract weights should be the average of the pweights for each specific census tract
And L3 state given a constant weight of 1
So i constructed the glamm pweight as follows and yet, I still cannot get a basic random-intercept only model to converge...i also tried including state...thoughts? Thanks again for all your help, I really appreciate it!
*MLM-level pweights
generate pwadulsr1=adul_sr
*cenus tract average pw adul_sr
by codeupm, sort: egen pwadulsr2=mean(adul_sr)
generate pwadulsr3=1
gllamm bmi2 if diettag==1 & exwt==1, i(codeupm) pweight(pwadulsr) adapt nip(20)
Running adaptive quadrature
Convergence not achieved: try with more quadrature points
gllamm bmi2 if diettag==1 & exwt==1, i(codeupm) pweight(pwadulsr) adapt nip(15) cluster(state)
Running adaptive quadrature
Convergence not achieved: try with more quadrature points
From [email protected]
To [email protected]
Subject Re: st: gllamm with pweights
Date Fri, 17 Jul 2009 18:10:30 -0400
No, I mean make "state," not "urstate", your level 3 unit (with weight
1). The "census tracts" within state should be classified as urban
and rural. States are natural units; census tracts are natural units;
the "urstates" agglomerations of census tracts are artificial. I
cannot see a justification for designating them as level 3 units.
With state as a random effect you can estimate a between-state
standard deviation. You can also estimate the extent to which
urban/rural differences vary across states if you add a random
state/urban-rural term to your model.
Steve
On Fri, Jul 17, 2009 at 5:12 PM, Kanter, Rebecca<[email protected]> wrote:
> To follow up, I am not using state as a level (i.e. my level 2 now level 3, but the urban or rural part of the state...so 32 states = 64 urstate units) do you mean to give each one of these a weight of 1 and then use the method outlined on p814 for the census tract (what is i guess my new level 2)/ the PWIGLS program for the (new) level 1 and level 2 weights (and not use the original L1 pweights at all)?
>
________________________________________
> From: [email protected] [[email protected]] On Behalf Of [email protected] [[email protected]]
> Sent: Friday, July 17, 2009 1:53 PM
> To: [email protected]
> Subject: Re: st: gllamm with pweights
>
> --
>
> It looks like a three-level model is the most appropriate for you.
> However compute scaled sampling weights only for the "census tract"
> level"; "state", the highest level, gets a weight of 1. You can
> use Korn and Graubard's method D referenced on page 814 of
> Rabe-Hesketh and Skrondal, 2006, p 814, or one of the others. (Be
> sure to cite the original sources, not just Chantala.) Scaling the
> weight for "tracts" is needed to properly estimate the between-tract
> component of variance.
>
> You do not need the -cluster()- option in -gllamm-.
>
> Good luck!
>
> -Steve
>
>
> Rabe-Hesketh, S. & Skrondal, A. (2006). Multilevel modelling of
> complex survey data. Journal of the Royal Statistical Society: Series
> A (Statistics in Society), 169(4), 805-827.
>
> On Fri, Jul 17, 2009 at 12:38 PM, Kanter, Rebecca<[email protected]> wrote:
>> Hi Steve and list,
>>
>> The original survey design is a multi-stage stratified design. The PSU is essentially the equivalent of a U.S. census tract (the probability that one of these tracts was selected was proportional to the number of households within it and the number of tracts selected corresponded to the sample size in the strata within the state) ...from which households are selected (with probability proportional to size). For each census tract selected six "blocks" are selected with probability proportional to the number of houses in each block; within each chosen block 6 households are selected via systematic random sampling and then individuals within the household via simple random sampling.
>>
>> I would just use the original individual survey pweights for the gllamm, but the pweight command for the gllamm does not work unless weights for all levels are specified.
>>
>> Thus, I go back to a previous suggestion on this thread...should I just set the pweight for L2 just equal to a constant 1.
>>
>> Or do I need to use the method by Chantala (as my advisor reminded me for example that while I am taking into account the urban and rural area of each state within the country that I only have a "sample" of x number of rural or urban tracts of the total number of rural or urban tracts within each state)?
>>
>> Or what?
>>
>> Thanks so much!
> ________________________________________
>> From: [email protected] [[email protected]] On Behalf Of [email protected] [[email protected]]
>> Sent: Friday, July 17, 2009 10:27 AM
>> To: [email protected]
>> Subject: Re: st: gllamm with pweights
>>
>> Rebecca, I didn't follow the original thread, so I apologize. As
>> there was no sampling of your level-two units, you do not need
>> sampling weights for them, nor, therefore, the weights computed by
>> Chantala's code. We could have been more helpful if you had
>> described the original design. What was it, and what were the PSUs?
>> The PSUs are the units which should be designated as clusters in
>> -gllamm-. They need not be part of the two-level model, but might be
>> interesting as units in a three-level model.
>>
>> You can use the original sampling weights, but perhaps you have
>> enough information to post-stratify the weights for individuals, for
>> example by gender and state. This is less necessary if gender is a
>> predictor for your multi-level model.
>>
>>
>> On Thu, Jul 16, 2009 at 3:08 PM, Kanter, Rebecca<[email protected]> wrote:
>>> Thanks Steven, these resources are a big help.
>>>
>>> I am now trying to apply this method to my 2 level model (L1 = individual L2 = urban or rural part of state they live in; 64 units based on 32 states).
>>>
>>> In the method by Chantala et al, if I am interpreting this correctly...the PSU takes on a new meaning here (from the original complex survey design)...
>>>
>>> whereby PSU_wtj = 1 / Pr(urstate j selected) --> so if I am including all urban and rural parts of states (i.e. all 64 units that in turn make up the 32 states in a country) then is 1 for every urstate ?
>>>
>>> Furthermore, then, if FSU_wt i|j = 1 / Pr(person i selected / urstate j selected) then is FSU_wt i|j = 1 / Pr ( (1 / total number of people in urstate j) / 1) as in their example with schools = j each "student selected from school j will have a sampling weight equal to the number of students within school j represented by that student."?
>>>
>>> And in the end the original survey individual pweight is not used?
>>>
>>> Thanks so much for all your help,
>>>
> ________________________________________
>>> From: [email protected] [[email protected]] On Behalf Of [email protected] [[email protected]]
>>> Sent: Thursday, July 16, 2009 12:24 PM
>>> To: [email protected]
>>> Subject: Re: st: gllamm with pweights
>>>
>>> --
>>>
>>> Also, see: http://www.stata.com/meeting/4nasug/Chantala.ppt and
>>> http://www.cpc.unc.edu/restools/data_analysis/ml_sampling_weights.
>>> These contain links to the Stata program -pwigls- which will scale the
>>> weights. Rabe-Hesketh and Skrondal (2006), the second citation that
>>> Stas listed, compute the "Method 1" weights by hand and illustrate an
>>> analysis in GLLAMM.
>>>
>>> Rabe-Hesketh, S. & Skrondal, A. (2006). Multilevel modelling of
>>> complex survey data. Journal of the Royal Statistical Society: Series
>>> A (Statistics in Society), 169(4), 805-827.
>>>
>>> On Thu, Jul 16, 2009 at 12:28 PM, Stas Kolenikov<[email protected]> wrote:
>>>> Oh, I see. With 64 second level units, you are in a much better shape.
>>>> I would probably have an urban/rural dummy as an explanatory variables
>>>> for those second levels with -feq- option.
>>>>
>>>> If you sum up the weights, you are using the weights twice. And that's
>>>> hardly a great idea: you are overcompensating for unequal
>>>> probabilities of selection, if there were any. Were these
>>>> states/ruran/urban areas selected via a sampling procedure? Or what
>>>> you have is a complete list? In the latter case, you surely would need
>>>> to specify unit weights at the second level.
>>>>
>>>> On the issue of weights in multilevel models, see:
>>>> http://www.citeulike.org/user/ctacmo/article/711637,
>>>> http://www.citeulike.org/user/ctacmo/article/850244,
>>>> http://www.citeulike.org/user/ctacmo/article/3158754. There's probably
>>>> more by now, but I am not tracking this literature very closely.
>>>>
>>>> On Thu, Jul 16, 2009 at 11:18 AM, Kanter, Rebecca<[email protected]> wrote:
>>>>> Hi Stan and statalist,
>>>>>
>>>>> Regarding my second level it is more than 2 values...as there are 32 states in the country...that makes 64 values (or areas/clusters that i illustrate via one variable called urstate...e.g. if urstate=1 it is the urban area of the 1st state and if urstate=33 it is the rural area of the 1st state and so on) if one divides each state into its urban and rural areas, respectively. Each one I want to take its own intercept and slopes etc to better account and visualize the urban and rural differences in the country....
>>>>>
>>>>> Thus, is it better to sum the individual weights per urstate (1-64) or let all weights for this second level equal one and keep my individual pweights as is for the individual level (level 1)?
>>>>>
>>> ________________
>>>>> From: [email protected] [[email protected]]
>>>>> On Wed, Jul 15, 2009 at 5:37 PM, Kanter, Rebecca<[email protected]> wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I am running 2 level multi-level models using gllamm. Level one is individuals and Level two is either the urban or rural part of the country's state (i.e. urstate).
>>>>>>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/