Susan Cochran <[email protected]> asks why -svy: mean- is not
reproducing a hand calculated value:
> I am trying to get Stata 9 to reproduce the analysis of Table 9.1 In
> Scheaffer et al., 6th edition, p. 307.
>
> There are 90 plants, 10 are sampled SRS without replacement at the first
> stage, and within plants machines (m) are sampled SRS without replacement
> and measured for the hours of being broken. The known population size is
> about 4500 machines. M=number of machines in the plant sampled.
>
> The calculations by hand reveal a mean of 4.8 hours.
>
> I created a raw data file with the following structure
>
> (removed for brevity)
>
> When I specified the following set up
>
> svyset plant [pweight=pwt], fpc(nplant) vce(linearized) || _n, fpc(M)
>
> The total calculated correctly, as did the SE, but the mean is incorrect
> (showing the simple mean of the dataset 4.6 not the mean of 4.8 which is
> correct). This is because (?) the population size is seen as 4698 (the
> sum of the weights) not 4500 and the total hours/population size is then
> 4.6.
>
> What should the correct design setup in STATA be?
The crux of the issue is that Susan wants to get back
21601.49 / 4500 = 4.8
however, -svy: mean- is computing
21601.49 / 4697.97 = 4.598
Here 4697.97 is the sum total of the sampling weights used to produce
21601.49, which is the estimate for the population total.
The sum of the sampling weights estimates the population size.
These are two different methods for estimating the population mean, since the
estimate Susan wants assumes the population size is know. -svy: mean- does
not implement this method.
*** Some background information:
The population mean estimator is a special case of the population ratio
estimator. By definition, the population mean Ybar is
Ybar = Y / N
where Y is the population total and N is the population size. -svy: mean-
estimates Ybar using
Ybarhat = Yhat / Nhat
where Yhat estimates the population total, and Nhat estimates the population
size.
If you know the value of N, you can simply compute Yhat using -svy: total- and
divide it by N. The point here is that -svy: mean- does not compute
Yhat / N
--Jeff
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/