Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: Bootstrapping & clustered standard errors (-xtreg-)
From
"Tobias Pfaff" <[email protected]>
To
<[email protected]>
Subject
RE: st: Bootstrapping & clustered standard errors (-xtreg-)
Date
Thu, 15 Sep 2011 16:54:07 +0200
Dear Cam,
Thanks for the references!
However, I think I will give up on bootstrap with panel data and clustered
standard errors. It's too much a blackbox for me and maybe still an
"embryonic research field" (http://economics.ca/2008/papers/0985.pdf).
Apart from the previously described error with "insufficient observations",
I also get a warning that collinearity of some of my dummies changes with
bootstrap. And while having only 77,627 obs. in my sample, one bootstrap
iteration shows 86,212 observations?? All things which I cannot understand
easily.
Anyway, concerning the violation of the assumption of normally distributed
residuals, I found a nice paper (*, in German) and transformation of the
dependent variable helps me to sufficiently attenuate the violation.
Thanks again for all your efforts!
Tobias
(*)
http://www.bwl.uni-kiel.de/bwlinstitute/grad-kolleg/new/typo3conf/ext/naw_se
curedl/secure.php?u=0&file=/fileadmin/publications/pdf/pdf_03.gif&t=13161882
29&hash=7ffb63ce635a228ad74e460767ebc04d)
-----Ursprüngliche Nachricht-----
> Date: Mon, 12 Sep 2011 15:05:59 -0400
> Subject: RE: st: Bootstrapping & clustered standard errors (-xtreg-)
> From: Cameron McIntosh <[email protected]>
> To: STATA LIST <[email protected]>
Hi Tobias,
Ok, well your comments below remind me of:
Wang, J., Carpenter, J.R., & Kepler, M.A. (2006). Using SAS to conduct
nonparametric residual bootstrap multilevel modeling with a small number of
groups. Computer Methods and Programs in Biomedicine, 82(2), 130-143.
I don't know if Stata offers a similar procedure. In conjunction with the
above paper, I also strongly recommend taking a look at:
Maas, C.J.M., & Hox, J.J. (2004a). The influence of violations of
assumptions on multilevel parameter estimates and their standard errors.
Computational Statistics & Data Analysis, 46,
427?440.http://igitur-archive.library.uu.nl/fss/2007-1004-200713/Maas(2004)_
influence%20of%20violations.pdf
Maas, C.J.M., & Hox, J.J. (2004b). Robustness issues in multilevel
regression analysis. Statistica Neerlandica, 58,
127?137.http://joophox.net/publist/sn04.pdf
Cam
> From: [email protected]
> To: [email protected]
> Subject: RE: st: Bootstrapping & clustered standard errors (-xtreg-)
> Date: Mon, 12 Sep 2011 17:51:48 +0200
>
> Dear Stas, Bryan,
>
> I was maybe not clear why I want to bootstrap at all:
>
> My fixed effects regression with clustered SE works fine.
> [-xtreg depvar indepvars, fe vce(cluster region) nonest dfadj-]
>
> However, my predicted residuals (-predict res_ue, ue-) are not normally
> distributed.
> Am I mistaken that I need normally distributed residuals for the
> t-statistics to be unbiased?
>
> If I'm not mistaken then I would like to do a robustness check with
> bootstrapped standard errors (where the normal distribution of residuals
> doesn't matter for the z-statistics to be unbiased) to see if my results
> change or not.
> And I still get the error message of insufficient observations when trying
> to bootstrap with clustered SE. Using -idcluster()- does not help.
> I have 76,000 obs., 8100 individuals, 108 clusters, and 36 regressors. I
> don't think that the bootstrap would produce a sample with fewer cluster
> id's than regressors.
> So I still don't know why I get the error message after -xtreg depvars
> indepvars, fe vce(bootstrap, reps(3) seed(1)) cluster(region_svyyear)
nonest
> dfadj-?
>
> WEIGHTS:
> Your arguments regarding the usage of weights were convincing. However,
> -xtreg- only allows for weights that do not change for the individuals
over
> the years. Our panel dataset has a variable for the design weight that
does
> not change over the years, but this weight does not contain information on
> non-response. Another weight variable in the dataset contains information
on
> selection probabilities and non-response, but it obviously changes over
the
> years for each individual, and cannot be used with -xtreg-. So I wouldn't
> know how to incorporate information on non-response with -xtreg-?
>
> Earlier in this thread Cameron said that bootstrap only makes sense in my
> case if I would use "custom bootstrap weights computed by a statistical
> agency for a complex sampling frame". It seems that bootstrap cannot be
used
> with weights, anyway. I guess that weighted sampling is still not
> implemented in bootstrap, as stated 8 years ago
> (http://www.stata.com/statalist/archive/2003-09/msg00180.html).
>
> Thanks very much for your help,
> Tobias
>
> P.S.: I cited the PNAS paper since it is a rare exception in my field
> (happiness economics) that an empirical paper says something about
> regression diagnostics at all.
>
>
> -----Ursprüngliche Nachricht-----
> > Date: Thu, 08 Sep 2011 17:20:35 -0400
> > Subject: Re: st: Bootstrapping & clustered standard errors (-xtreg-)
> > From: Bryan Sayer <[email protected]>
> > To: [email protected]
>
> ... The
> sampling weights control mostly for unequal probabilities of
> selection, and for well-designed and well-conducted surveys,
> non-response adjustments are not that large, while probabilities
of
> selection might differ quite notably.
>
>
> I disagree with the part about non-response adjustments not being that
> large. It really depends on the survey. Surveys in the U.S. may have
> response rates as low as 25 to 30%, meaning that the non-response
> adjustments may be pretty large.
>
> However, it is really the difference in response rates for different
groups
> that matters. For example a survey I am working with shows a noticeable
> difference in response rates between the land-line phone and the cell
phone
> only group.
>
> The design effects for surveys can be broken into pieces for clustering,
> stratification, and weighting. And weighting can be further classified
into
> the design weights and the non-response adjustments. If one really wanted
to
> pursue the matter.
>
> But more related to the point Stas is making, often the elements of the
> survey design and weights that are incorporated into the survey will
reflect
> information that is not available to the user. Simple put, it may not be
> possible to fully condition on the true sample design. This is because
some
> of the elements used in the sample design and weighting process cannot be
> disclosed in public files for confidentiality reasons.
>
> Working in sampling, I am obviously biased toward using the weights. But
> fundamentally, I believe that it is often impossible for the user to know
> whether they have fully conditioned on the sample design or not.
>
> Most likely, lots of smart people worked hard on the sample design and
> everything that goes into producing the data that you are using. Accept
that
> they (hopefully) did their job well. So if you have the sample design
> information available to you, I don't see any reason to *not* use it.
>
> My impression is that bootstrapping of complex survey design data, while
> possibly past its infancy, is probably still not very fully developed. I
> know lots of very smart people who work on it, but it just does not seem
to
> generalize very well, at least not as well as a Taylor series
linearzation.
>
> Just my 2 cents worth.
>
> Bryan Sayer
> Monday to Friday, 8:30 to 5:00
> Phone: (614) 442-7369
> FAX: (614) 442-7329
> [email protected]
>
>
> On 9/8/2011 4:28 PM, Stas Kolenikov wrote:
>
> Tobias,
>
> I would say that you are worried about exactly the wrong things. The
> sampling weights control mostly for unequal probabilities of
> selection, and for well-designed and well-conducted surveys,
> non-response adjustments are not that large, while probabilities of
> selection might differ quite notably. While it is true that if you can
> fully condition on the design variables and non-response propensity,
> you can ignore the weights, I am yet to see an example where that
> would happen. Believing that your model is perfect is... uhm... naive,
> let's put it mildly; if anything, econometrics moves away from making
> such strong assumptions as "my model is absolutely right" towards
> robust methods of inference that would allow for some minor deviations
> from the "absolutely right" scenario. There are no assumptions of
> normality made anywhere in the process of calculating the standard
> errors. All arguments are asymptotic, and you see z- rather than
> t-statistics in the output. In fact, the arguments justifying the
> bootstrap are asymptotic, as well. You can still entertain the
> bootstrap idea, but basically the only way to check that you've done
> it right is to compare the bootstrap standard errors with the
> clustered standard errors. If they are about the same, any of them is
> usable; if they are wildly different (say by more than 50%), I would
> not either of them, but I would first check to see that the bootstrap
> was done right.
>
> I know that PNAS is a huge impact factor journal in natural sciences,
> but a statistics journal? or an econometrics journal? I mean, it's
> cool to have a paper there on your resume, but I doubt many statalist
> subscribers look at this journal for methodological insights (some
> data miners or bioinformaticians or other statisticians on the margin
> of computer science do publish in PNAS, though). I would not turn to
> an essentially applied psychology paper for advice on clustered
> standard errors.
>
> The error that you report probably comes from the bootstrap producing
> a sample with fewer cluster identifiers than regressors in your model.
> Normally, this would be rectified by specifying -idcluster()- option;
> however in some odd cases, the bootstrap samples may still be
> underidentified. I don't know whether the fixed effects regression
> should be prone to such empirical underidentification. It might be,
> given that not all of the parameters of an arbitrary model are
> identified (the slopes of the time-invariant variables aren't).
>
> On Thu, Sep 8, 2011 at 3:30 AM, Tobias Pfaff
> <[email protected]> wrote:
>
> Dear Stas, Cam,
>
> Thanks for your input!
>
> I want to bootstrap as a robustness check since my residuals of
the
> FE
> regression are not normally distributed.
> And bootstrapping as a robustness check because it does not assume
> normality
> of the residuals
> (e.g., Headey et al. 2010, appendix p. 3,
> http://www.pnas.org/content/107/42/17922.full.pdf?with-ds=yes).
>
> If I do bootstrapping with clustered standard errors as Jeff has
> explained I
> get the following error message:
>
> - insufficient observations
> an error occurred when bootstrap executed xtreg, posting missing
> values -
>
> Cam, you say that I would need custom bootstrap weights. My
dataset
> provides
> individual weights with adjustments
> for non-response etc. I do not use weights for the regression
> because the
> possible selection bias is mitigated due
> to the fact that the variables which could cause the bias are
> included as
> control variables (e.g., income, employment
> status). Thus, I would argue that my model is complete and the
> unweighted
> analysis leads to unbiased estimators.
>
> 1. Would you still include weights for the bootstrapping?
>
> 2. Does bootstrapping need more degrees of freedom than the normal
> estimation of -xtreg- so that I get the above error message?
>
> 3. If bootstrapping is not a good idea in this case, what can I do
> to
> encounter the breach of the normality assumption of the residuals?
> (I already checked transformation of the variables, but that
doesn't
> help)
>
> Regards,
> Tobias
>
>
> -----Ursprüngliche Nachricht-----
>
> Date: Wed, 7 Sep 2011 10:24:33 -0400
> Subject: RE: st: Bootstrapping& clustered standard errors
> (-xtreg-)
> From: Cameron McIntosh<[email protected]>
> To: [email protected]
>
> Stas, Tobias
> I agree with Stas that there is not much point in using the
> bootstrap in
> this case, unless you have custom bootstrap weights computed by a
> statistical agency for a complex sampling frame, which would
> incorporate
> adjustments for non-response and calibration to known totals, etc.
I
> don't
> think that is the case here, so I would go with the -cluster- SEs
> too.
> My two cents,
> Cam
>
>
> Date: Wed, 7 Sep 2011 09:03:27 -0500
> Subject: Re: st: Bootstrapping& clustered standard errors
> (-xtreg-)
> From: [email protected]
> To: [email protected]
>
> Tobias,
>
> can you please explain why you need the bootstrap at all? The
> bootstrap standard errors are equivalent to the regular
> -cluster-
> standard errors asymptotically (in this case, with the number
of
> clusters going off to infinity), and, if anything, it is
easier
> to get
> the bootstrap wrong than right with difficult problems. If
> -cluster-
> option works at all with -xtreg-, I see little reason to use
the
> bootstrap. (Very technically speaking, in my simulations, I've
> seen
> the bootstrap standard errors to be more stable than -robust-
> standard
> errors with large number of the bootstrap repetitions that
have
> to be
> in an appropriate relations with the sample size; whether that
> carries
> over to the cluster standard errors, I don't know.)
>
> On Tue, Sep 6, 2011 at 12:25 PM, Tobias Pfaff
> <[email protected]> wrote:
>
> Dear Statalisters,
>
> I do the following fixed effects regression:
>
> xtreg depvar indepvars, fe vce(cluster region) nonest
dfadj
>
> Individuals in the panel are identified by the variable
> "pid". The
> time variable is "svyyear". Data were previously declared
as
> panel
> data with -xtset pid svyyear-.
> Since one of my independent variables is clustered at the
> regional
> level (not at the individual level), I use the option
> -vce(cluster
>
> region)-.
>
> Now, I would like to do the same thing with bootstrapped
> standard
>
> errors.
>
> I tried several commands, however, none of them works so
> far. For
>
> example:
>
> xtreg depvar indepvars, fe vce(bootstrap, reps(3) seed(1)
>
> cluster(region))
>
> nonest dfadj
> .where I get the error message "option cluster() not
> allowed".
>
> None of the hints in the manual (e.g., -idcluster()-,
> -xtset,
> clear-,
>
> -i()-
>
> in the main command) were helpful so far.
>
> How can I tell the bootstrapping command that the standard
> errors
> should
>
> be
>
> clustered at the regional level while using "pid" for
panel
> individuals?
>
> Any comments are appreciated!
>
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/