Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Multiple imputation with survey replicate weights
From
Stas Kolenikov <[email protected]>
To
"[email protected]" <[email protected]>
Subject
Re: st: Multiple imputation with survey replicate weights
Date
Thu, 20 Feb 2014 09:33:52 -0600
Stata is doing the right job in preventing you from doing dubious
things. The interface of complex survey data inference and multiple
imputation is surprisingly poorly studied given its ubiquity. The
statistically appropriate way to combine imputation and replicate
weights that I am aware of is to use the bootstrap or BRR approach;
create a single imputation within each bootstrap/BRR replicate; and
re-estimate your model with that replicate weight based on imputed
data. See Shao and Sitter (1996;
http://www.citeulike.org/user/ctacmo/article/1269394). At the moment,
this requires custom programming of an estimation command that
combines one imputation iteration with the command of interest. I am
vaguely planning to develop a Stata Journal paper to describe the
process, but it is only at the conceptualization stage now. Here's an
example (not particularly stable, the combinations of -mi- and -svy-
are still tricky, as they have contradicting expectations of what is
known about the data, and I have to force one to ignore the other, and
vice versa):
webuse nhanes2brr, clear
gen age2 = age*age
cap pro drop mymireg
program define mymireg, properties( svyb )
syntax [varlist] [if] [in] [pw iw /] , [*]
* local macro `weight' contains the type
* local macro `exp' contains the weight variable
* local macro varlist contains the list of explanatory variables for
the final regression
* it is used to circumvent Stata from thinking that estimation has
already been done
preserve
mi set wide
mi register regular region1 region2 region3 rural black orace age
age2 tibc tcresult
mi register imputed lead zinc copper vitaminc albumin tgresult
mi impute chained (pmm) lead zinc copper vitaminc albumin tgresult =
region1 region2 region3 ///
rural black orace age age2 tibc tcresult [pw=`exp'], add(1)
mi extract 1, clear
logistic highbp lead `varlist' [pw=`exp']
restore
end
svy brr, saving( lead_imputed_logit, replace ) : mymireg height weight
age female
use lead_imputed_logit, clear
sum
Use at your own risk. Let me repeat: USE AT YOUR OWN RISK. May be like that:
use at_your_own_risk, clear?
A few caveats:
1. -svy brr- will report point estimates based on a single imputation;
these are useless, and would need to be discarded
2. The right coefficients and the standard errors come out of the
-summarize- in the end. I used to be able to produce them with -bs4rw-
followed by -estat bootstrap-, but for whatever reasons it stopped
working (it used to in 2010) -- probably the internal format of what
-bootstrap- expects changed, and what -bs4rw- supplies is no longer
compatible with it.
3. I used the equivalence between the bootstrap and BRR; things will
not work appropriately with jackknife, as it does not provide enough
sampling variability, and the imputation model will be too close to
that based on the full data. Hence, sampling variability in the
imputation model will be insufficient, and the standard errors will be
underestimated. Likewise, the compressed replicate weight variability
methods (BRR with Fay's adjustment; mean bootstrap) may not be able to
generate enough sampling variability in the imputation process,
either.
4. As you clearly see, the code is cumbersome, and probably not
particularly efficient -- I may have been able to better deal with -mi
extract-, for instance, and all these -preserve-s are obviously going
to eat up a good fraction of computing time with large data sets.
-- Stas Kolenikov, PhD, PStat (ASA, SSC)
-- Principal Survey Scientist, Abt SRBI
-- Opinions stated in this email are mine only, and do not reflect the
position of my employer
-- http://stas.kolenikov.name
On Wed, Feb 19, 2014 at 4:41 PM, Joshua Mitts <[email protected]> wrote:
> Has anyone found a way to use survey replicate weights with multiply
> imputed data? The svy manual states:
>
> mi estimate may be used with svy linearized if the estimation command
> allows mi estimate; it may not be used with svy bootstrap, svy brr,
> svy jackknife, or svy sdr.
>
> And I receive this error when trying to fit a logit model:
>
> vce(brr) previously set by mi svyset is not allowed with mi estimate
>
> Thanks very much,
> Josh
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/