Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Steve Samuels <sjsamuels@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: Pooled Datasets (DHS) - use of syvset & regional controls |
Date | Fri, 3 Jan 2014 18:56:33 -0500 |
Jörg: It looks like you are analyzing Demographic and Health Survey (DHS) data. 1. It is quite possible to pool surveys even if the strata and PSUs were not defined in the same way each year. The key is to make year part of the stratum definition: . egen stratid = group(year v024 v025) Correspondence cited at http://www.stata.com/statalist/archive/2012-07/msg00878.html raised doubts about whether, in one instance, v024 was the correct variable to use in defining strata. You should check whether this issue applies to your data. 2. In order to do a multi-year analysis that assesses and controls for regional differences, you will have to examine how regions are defined in each survey and, if necessary, create new "regions" that are as identical as possible over the survey years. Some of these new regions might be combinations of the original regions. If a region was not represented in some surveys, because of civil war, for example, it can still be analyzed for the years in which it was represented. Confining yourself to analyses of individual years would be a mistake, in my opinion, because you would lose the ability to look at temporal changes. See Emma Slaymaker's advice at: www.stata.com/statalist/archive/2009-07/msg00906.html and this from Stas Kolenikov: http://www.stata.com/statalist/archive/2007-08/msg00032.html Steve Steven J. Samuels 18 Cantine's Island Saugerties NY 12477 USA Tel: 845-246-0774 On Jan 3, 2014, at 6:04 PM, Jörg Kattner wrote: Dear Stata list serve members, For a research paper we would like to pool household surveys (DHS) from different years into a single dataset. We experience the following two challenges: 1) In order to account for the complex survey design we think we have to correctly specify the weights, stratification and clusters for each survey. Even though each survey is from the same country, they can differ slightly depending on the year. Thus when we pool them, we still want to correctly specify the survey design. However now the question arises how to do it. Before when doing each year by its own, we used code along the following line: gen weight = v005 / 1000000 egen stratid = group (v024 v025), label svyset [pweight=weight], psu(v021) strata(stratid) The main thing that differs between the surveys is the stratification variables. Sometimes there exists already a stratification variable, sometimes we had to create one like above. Also sometimes the variable v024 (region) for example has 6 values in one year and 10 in the next year. Is it even possible to correctly stratify our dataset when we pool different surveys? 2) Since we also want to control for regional / community effects later on in our regression models (using svy: reg or svy: logit/clogit) it can be problematic if the defined regions and clusters differ between the surveys. The only solution we see, is performing single regressions for each year/survey. The drawback is that one cannot directly see whether differences in the constant term or the coefficient of maternal education between the different years/surveys are significant. Is there any other statistical method that could deal with this dilemma? Any help is much appreciated. Thanks a lot in advance! Best regards, Lukas & Jörg * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/