Paul Seed wrote:
As I don't have access to a decent Stats library here, I tried to obtain
the recommended paper (Brown, Cai, & DasGupta.
Interval Estimation for a Binomial Proportion. Statistical Science, 2001,
16, pp. 101-133.) over the internet; but it is currently behind a "rolling
firewall", until 2005.
Would anyone who has seen it hazard a comment on which of the
new methods - Wilson, Jeffreys or Agresti they would prefer for
small samples.
----------------------------------------------------------------------------
In the absence of access to the article, you can run -simulate- calling a
program such as the -exbinci- ditty in the do-file below, and make a choice
suitable to your circumstances based on the results. I wrote the do-file
below in an attempt to illustrate Bobby Gutierrez's point to the list. In
order to run it, you'll need to install Joseph Hilbe's -rnd- suite from SSC.
In the do-file below, with 10 trials and a population mean of 50% (these are
options in the program that you can change to suit your circumstances), the
true parameter lies within the 95% confidence interval 9797 times out of
10000 experiments for each of the methods. This compares with a 95%
confidence interval's expectation to contain the parameter 9500 times out of
the 10000 experiments. (A 95% confidence interval is supposed to contain
the population parameter 95% of the time over the long run.)
With more trials (100) in the experiment, the 95% confidence intervals by
the Jeffreys, Wilson or Agresti methods are reasonably good: each, 9452
times out of 10000 experiments. At 9652 times out of 10000 experiments, the
Clopper-Pearson method is still a just a little conservative in its
probability of coverage.
Joseph Coveney
----------------------------------------------------------------------------
clear
set more off
local seed = date("2004-09-08", "ymd")
set seed `seed'
set seed0 `seed'
macro drop seed
program define exbinci, rclass
version 8.2
syntax , N(integer) Pi(real)
rndbin `n' `pi' 1
foreach method in exact wilson jeffreys agresti {
ci xb, binomial `method'
// you can trap for the possibility that UL or LL is missing here
return scalar `method'_covered = (0.5 >= r(lb)) & (0.5 <= r(ub))
}
end
* population (true) parameter = 0.5; m + n = 10
simulate "exbinci, n(10) pi(0.5)" ///
exact_covered = r(exact_covered) ///
wilson_covered = r(wilson_covered) ///
jeffreys_covered = r(jeffreys_covered) ///
agresti_covered = r(agresti_covered), reps(10000)
summarize
drop _all
* population parameter = 0.5; m + n = 100
simulate "exbinci, n(100) pi(0.5)" ///
exact_covered = r(exact_covered) ///
wilson_covered = r(wilson_covered) ///
jeffreys_covered = r(jeffreys_covered) ///
agresti_covered = r(agresti_covered), reps(10000)
summarize
exit
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/