Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | melissa daniels <melisdaniels@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | st: DHS Womens Data Survey Setup |
Date | Sat, 16 Jul 2011 00:23:09 -0500 |
Hello fellow stata-users, I am working on an analysis of DHS women's data (Ghana, 2008) using STATA 11.2. My sample includes only women with infants in the 0-23 month age range. DHS data are collected as a two-stage stratified sample of households. I want to identify all necessary survey vars I may need and use proper dataset construction for a survey analysis. I am still constructing the dataset, but am planning to use the following variables (as defined in DHS recode 5) and survey set statement. gen psu = v021 *this variable indicates enumeration areas for the survey. gen strata1 = v022 *this variable defines pairing or groupings of primary sampling units using in taylor series expansion gen strata2=v023 *this variable indicates the sample domain, or the basic geographic units wherein the sample was self-weighted. gen m_weight=v005/10^6 *(decimal correction as directed by DHS) this variable includes probability weights for the sample. svyset: psu (pweight=m_weight), strata(strata1) I have a couple questions: 1) I understand variance estimation is based on the taylor series expansion method, so I assume v022 (strata1 above) is the strata var I am most interested in. In what cases would the sample domain var v023 be of use to me? Is it important for survey estimation? 2) I believe I need data on the full sample of women in order to estimate corrected variances on the subset of women I am interested in. Does that mean I need to create my dataset with all women, or all individuals in the larger dataset? Or is my dataset complete since the subsample should be evenly dispersed throughout regions? If I need a larger dataset, do I just use a variable to flag women with children of the correct age for my subsample then and restrict all estimation commands to the subsample using an if statement? 3) I am interested in looking at biomarkers on a separate subsample who consented to a blood draw. However, there are no weights that I can locate for this subsample. Do I use the same weights as above, or do I need to create some sort of weight using the rate of consent? 4) I haven't been able to find any variables related to finite population control, likely because the sampling fraction is small for DHS. According to my understanding, FPC is not a concern for this analysis - please correct me if I'm wrong. Thank you sincerely, Melissa Daniels * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/