|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: Matching, bootstrapping, sub-sampling
Dear List:
This is both a Stata related and a statistics question.
Short version:
If bootstrapping is invalid for estimating the standard errors for
the ATT after nearest neighbor matching, does sub-sampling help, and
if so, how?
Long version:
psmatch2 is rather popular among many of us (according to download
statistics). Although the help file warns that it is "unclear whether
the bootstrap is valid in this context" bootstrapping is popular to
estimate the standard errors of the Average Treatment Effect on the
Treated (ATT), too. But the times they are (expected to be)
a-changin' : In the November 2008 issue of the Econometrica Alberto
Abadie and Guido Imbens published a paper entitled "On the failure of
the bootstrap for Matching Estimators" arguing that bootstrap
standard errors are not valid as a basis for inference with simple
nearest-neighbor matching estimators with replacement and a fixed
number of neighbors. This result is popularized in a recent survey by
Imbens and Jeffrey Wooldridge (Recent developments in the
econometrics of program evaluation, published in the Journal of
Economic Literature in March 2009). (For those of you who are working
in different fields let me add that both journals are among the top
journals in economics/econometrics.)
What is to be done? One suggestion found in both articles goes like
this (Imbens and Wooldridge, p. 42): "In cases where bootstrapping is
not valid, often subsampling (..) remains valid, but this has not
been applied in practice." The authors refer to Dimitris N. Politis
et al., Subsampling, New York: Springer 1999. Subsampling means using
only a fraction, say, 75 percent, of the sample for a bootstrap draw.
Contrary to what Imbens and Wooldridge say there are some (working)
papers using sub-sampling and bootstrapping to compute the standard
errors of the ATT. They use ca. 75 percent of the sample in doing
so. Nobody (as yet) told me why - the authors argue that others do so
as well, or they do not reveal the somewhat secret formula, or rule
of thumb, applied.
Two questions:
1. Can someone please explain in (more or less) plain English why
subsampling is a solution?
2. How large should the subsamples be, and why?
Many thanks in advance for any comments etc.
Joachim
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/