To answer my own question with a simulation: it matters only in very
small samples (20 or so), and the coverage rate isn't that good anyhow
for extreme proportions.
*----------------- begin example ------------------
set more off
capture program drop sim
program define sim, rclass
syntax, n(integer) p(real)
drop _all
set obs `n'
local df = `n'-1
gen x = uniform()<.95
sum x, meanonly
tempname m se
scalar `m' = r(mean)
scalar `se' = sqrt(`m'*(1-`m')/`df')
return scalar true_z = ///
`m' - invnormal(0.975)*`se' < .95 & ///
`m' + invnormal(0.975)*`se' > .95
return scalar true_t = ///
`m' - invttail(`df',0.025)*`se' < .95 & ///
`m' + invttail(`df',0.025)*`se' > .95
end
simulate true_z=r(true_z) true_t=r(true_t), ///
reps(10000) nodots: sim, n(100) p(.95)
sum true*
simulate true_z=r(true_z) true_t=r(true_t), ///
reps(10000) nodots: sim, n(50) p(.95)
sum true*
simulate true_z=r(true_z) true_t=r(true_t), ///
reps(10000) nodots: sim, n(20) p(.95)
sum true*
*--------------------- end example ---------------------------
(For more on how to use examples I sent to the Statalist, see
http://home.fsw.vu.nl/m.buis/stata/exampleFAQ.html )
-- Maarten
--- Maarten buis <[email protected]> wrote:
> Martin may have a point, though I am not sure: I have always taught
> that the reason we compare the test-static to the t-distribution and
> not the Gaussian distribution is that we have additional uncertainty
> due to the fact that we not only estimate the mean but also the
> standard devation (to get to the standard error). In case of a
> proportion we know that the standard deviation is a deterministic
> function of the mean, so why should we compare the test-statistic to
> the t-distribution instead of the Gaussian distribution?
>
> -- Maarten
>
> --- Martin Weiss <[email protected]> wrote:
>
> > Jeff,
> >
> > thanks for the reply, but am I still missing something here? I did
> > experiment with the " r(N)-1", but discarded the possibility as it
> > did not
> > provide the correct lower and upper bound... Indeed,
> >
> > ************************
> > sysuse auto, clear
> > proportion rep78
> > matrix define A=e(b)
> > count if rep78!=.
> > *Std error
> > local stderr= sqrt(A[1,1]*(1-A[1,1])/`=`r(N)'-1')
> > *Upper/Lower Bound for proportion of "1"
> > di A[1,1]+invnormal(1-0.05/2)*`stderr'
> > di A[1,1]-invnormal(1-0.05/2)*`stderr'
> > ************************
> >
> > still gives the wrong numbers. Have you told us the whole story?
> >
> > Martin Weiss
> > _________________________________________________________________
> >
> > Diplom-Kaufmann Martin Weiss
> > Mohlstrasse 36
> > Room 415
> > 72074 Tuebingen
> > Germany
> >
> > Fon: 0049-7071-2978184
> >
> > Home: http://www.wiwi.uni-tuebingen.de/cms/index.php?id=1130
> >
> > Publications:
> http://www.wiwi.uni-tuebingen.de/cms/index.php?id=1131
> >
> > SSRN:
> http://papers.ssrn.com/sol3/cf_dev/AbsByAuth.cfm?per_id=669945
> >
> >
> > -----Original Message-----
> > From: [email protected]
> > [mailto:[email protected]] On Behalf Of Jeff
> > Pitblado,
> > StataCorp LP
> > Sent: Tuesday, March 11, 2008 7:08 PM
> > To: [email protected]
> > Subject: Re: st: Confidence Interval for Proportion
> >
> > Martin Weiss <[email protected]> is using the
> > -proportion-
> > command
> > and has a question about how standard errors are computed:
> >
> > > Dear Statalisters,
> > >
> > > try this in Stata:
> > >
> > > ************************
> > > sysuse auto, clear
> > > proportion rep78
> > > matrix define A=e(b)
> > > matrix define B=e(V)
> > > count if rep78!=.
> > > *Upper/Lower Bound for proportion of "1"
> > > di A[1,1]+invnormal(1-0.05/2)*sqrt(A[1,1]*(1-A[1,1])/`r(N)')
> > > di A[1,1]-invnormal(1-0.05/2)*sqrt(A[1,1]*(1-A[1,1])/`r(N)')
> > > *Standard Error for "1"
> > > *Mistake obviously there...
> > > di sqrt(A[1,1]*(1-A[1,1])/`r(N)')
> > > ************************
> > >
> > > Then let me know: why do I not hit the correct CI for the
> > proportion of
> > "1"
> > > in the repair record? Something`s wrong with the standard error,
> I
> > do not
> > > know what, though...
> >
> > Using Martin's example Stata code, -proportion- effectively
> computes
> > the
> > standard error via
> >
> > sqrt(A[1,1]*(1-A[1,1])/(r(N)-1))
> >
> > This is explained (rather tersely, I'll admit) in the 'Methods and
> > Formulas'
> > section of -[R] proportion-.
> >
> > "Proportions are means of indicator variables; see -[R] mean-."
> >
> > From the 'Methods and Formulas' section of -[R] mean-, the variance
> > is
> > calculated as
> >
> > V(ybar) = (1/(N*(N-1))) Sum_{j=1}^N (y_j - ybar)^2
> >
> > If the y_j are observations of an indicator variable, this is
> > algebraically
> > equivalent to
> >
> > V(ybar) = ybar(1-ybar)/(N-1)
> >
> > --Jeff
> > [email protected]
-----------------------------------------
Maarten L. Buis
Department of Social Research Methodology
Vrije Universiteit Amsterdam
Boelelaan 1081
1081 HV Amsterdam
The Netherlands
visiting address:
Buitenveldertselaan 3 (Metropolitan), room Z434
+31 20 5986715
http://home.fsw.vu.nl/m.buis/
-----------------------------------------
___________________________________________________________
Rise to the challenge for Sport Relief with Yahoo! For Good
http://uk.promotions.yahoo.com/forgood/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/