I'm struggling with a question of how to efficiently set up a complex
survey analysis. After collecting the data (with simple random
sampling, kind of) it is clear that two variables (simplifying here)
matter for the kinds of outcomes I'm examining: the % low English
proficiency (lep) in a school and the gender of the respondent. I
have auxiliary data that tells me, for all schools in the population,
what the school size is and what its lep and gender numbers are.
To reweight my sample to (hopefully) make it somewhat more like the
population, I could, create a pweight that indicates, for each person
in my data, how many people in the population they represent that are
of the same gender and in a school of the same (median split) category
of lep. I can then use the svy commands for estimation. The problem,
however, is that I have a fair number of partially complete surveys.
Thus, depending on what variables go into a particular analysis, my N
varies. Consequently, the pweights would have to be recalculated for
almost every analysis. Very time consuming.
An alternative I've considered is to define strata that identify
unique combinations of lep and gender and then feeding this
information to the poststratification options in svyset. Problem here
is that each PSU, school, now overlaps two strata--one for each gender
in that school--and it's not clear what the FPC numbers should be for
each strata. Am guessing this arrangement will probably violate
assumptions behind svy.
Does anyone know of a better way to address this problem?
Peter
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/