I think the best idea with the bootstrap in complex surveys is to
relegate everything to the weights and modifications of those, as
described in Rao, Wu and Yue (1992) paper that I can send. It starts
looking very different from the bootstrap, but it indeed is. It also
requires a totally different machinery on computing side, and it is
actually easier to accomplish through -brr-, as Jeff Pitblado
suggested.
Doing the stuff properly with -pweights- would require something like
PPS sampling, which is another disaster in the survey statistics world
-- it is far more computationally intensive than it might seem. The
question was raised on the list a few days ago.
On 3/13/07, Ben Jann <[email protected]> wrote:
Sabrina asked about using pweights with the bootstrap and Jeff answered:
> Question: Why doesn't Stata allow weights with -bootstrap-?
>
> Besides the book by Shao and Tu (1995), there are papers in the survey
> literature on using the Bootstrap with complex survey data. Unfortunately
> there doesn't appear to be a single satisfactory method for Bootstrapping
> data with sampling weights.
[...]
> Shao, J. and D. Tu. 1995. The Jackknife and Bootstrap. New York: Springer.
Stas gave some more references:
> Shao, J. (1996), 'Resampling methods in sample surveys', Statistics
> 27, 203–254. with discussion -- big and nice review.
[...]
I am not an expert on bootstrap or complex survey methods and I don't
have Shao and Tu (1995) at hand. But from Shao (1996) it appears that
sampling weights per se are not a problem for the naive bootstrap (see
page 222). Problems arise (1) if the units are sampled without
replacement and the sample size is large compared to the population
size and (2) if the number of sampled clusters per stratum is small.
It is clear from this that there is no easy solution for the bootstrap
with complex survey data in general, but the weights do not seem to
matter much. So the question remains: "Why doesn't Stata allow weights
with -bootstrap-?" It may be reasonable to disallow weights, if
cluster() or strata() is specified, but I don't see anything that
speaks against using weights if there is no clustering or
stratification. Or do I misunderstand Shao (1996)?
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/