Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Bootstrap sampling for evaluating hypothesis tests
From
Margaret MacDougall <[email protected]>
To
[email protected]
Subject
Re: st: Bootstrap sampling for evaluating hypothesis tests
Date
Mon, 10 Jun 2013 17:13:42 +0100
Hello
I refer to the recommendation from Maarten below, which was sent in
response to a query I raised about testing the robustness of a new
hypothesis test to Type I errors.
As I indicated in my previous email, I have a very large sample to work
from. However, the idea is, more precisely, to take bootstrap samples of
varying sizes (e.g. 50, 100, ...) from the available data to see how
robust the Type I error rate is to variations in sample size. I
appreciate that when using bootstrapping techniques for estimating
standard errors and confidence intervals for an effect size, it is
recommended that these samples should be the same size as the original.
However, as you will appreciate, I have a vested interest in choosing
much smaller sample sizes. Also, as I am evaluating a hypothesis test, I
shall require to adapt the original sample to satisfy the null hypothesis.
I am going to present two possible approaches below to obtaining the
proportion of bootstrap samples for a given sample size in which the
test statistic is more extreme than the one observed in the data from
which the bootstrap replications were taken. I would be most grateful
for advice on which is more statistically sound.
Method 1
Generate bootstrap samples of size 50, 100, .... from the original much
larger sample once it has been adjusted to meet the null hypothesis.
Method 2
Construct a random samples of size 50, 100, ... from the raw data. Next,
adapt these samples to meet the null hypothesis. Now take take bootstrap
samples of size 50, 100, ..., respectively from these adapted samples.
Many thanks
Best wishes
Margaret
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Dr Margaret MacDougall
Medical Statistician and Researcher in Education
Centre for Population Health Sciences
University of Edinburgh Medical School
Teviot Place
Edinburgh EH8 9AG
Tel: +44 (0) 131 650 3211
Fax: +44 (0) 131 650 6909
E-mail: [email protected]
http://www.chs.med.ed.ac.uk/cphs/people/staffProfile.php?profile=mmacdoug
On 13/03/2013 15:45, Maarten Buis wrote:
On Wed, Mar 13, 2013 at 4:04 PM, Margaret MacDougall wrote:
I would value receiving recommendations on literature explaining the
application of bootstrap sampling to assess robustness to Type I errors of a
proposed new hypothesis test. Better still, if the recommended references
contain corresponding computer syntax!
Rich Williams and I are currently working on such a project. In
general I would not say that a test is "robust" against Type I errors
but that the Type I error rate corresponds to your prespecified level
of significance. Type I errors will occur, but the chance of it
occuring should be the same as the level of significance you have
chosen. This means that if we change the data such that the null
hypothesis is true and bootstrap from that changed dataset the
p-values should follow a uniform distribution. This changing the data
is inevitable when assessing the Type I error rate: in order to assess
the probability of rejecting a true null hypothesis you first need to
make sure that the null hypothesis is true.
Here are two examples of how to do this in Stata:
*------------------ begin example ------------------
clear all
sysuse auto
recode rep78 1/2=3
logit foreign price
predict double pr
gen byte ysim = .
keep foreign price rep78 pr ysim
keep if !missing(foreign,price,rep78)
program define sim
replace ysim = runiform()< pr
logit ysim price ib3.rep78
test 4.rep78 = 5.rep78 = 0
end
simulate chi2=r(chi2) p=r(p), reps(1000) : sim
simpplot p
qchi chi2, df(2) name(q)
*------------------- end example -------------------
(For more on examples I sent to the Statalist see:
http://www.maartenbuis.nl/example_faq )
*------------------ begin example ------------------
clear all
sysuse auto
gen lnprice = ln(price)
reg turn mpg i.rep78 foreign
predict double mu1
reg turn mpg i.rep78 foreign weight lnprice
predict double mu2
gen double ysim = turn - mu2 + mu1
keep ysim mpg rep78 foreign weight lnprice
keep if !missing(ysim, lnprice, mpg, rep78, foreign, weight)
tempfile temp
save `temp'
program define qenv_sim_F
use `1', clear
bsample
reg ysim mpg i.rep78 foreign weight lnprice
test weight lnprice
end
simulate F=r(F) p=r(p), reps(1000): qenv_sim_F `temp'
simpplot p
*------------------- end example -------------------
(For more on examples I sent to the Statalist see:
http://www.maartenbuis.nl/example_faq )
Hope this helps,
Maarten
---------------------------------
Maarten L. Buis
WZB
Reichpietschufer 50
10785 Berlin
Germany
http://www.maartenbuis.nl
---------------------------------
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/
--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/