|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: svy:tab why standard error and confidence intervals for count are different than for
From |
[email protected] (Jeff Pitblado, StataCorp LP) |
To |
[email protected] |
Subject |
Re: st: svy:tab why standard error and confidence intervals for count are different than for |
Date |
Thu, 04 Feb 2010 14:01:02 -0600 |
Pramod Adhikari <[email protected]> asks why the standard errors of
weighted counts and weight percentages do not follow the same relationship
as the corresponding point estimates:
> I am using svy:tab to generate estimate for a variable; one in terms of
> percentages and another in terms of weighted numbers. The results in the
> first table show that 95.3% said yes to H1, with a standard error of
> 0.51896%. Given the size of the population (weighted population size of
> 9577.258); I should be able to estimated the weighted population with
> the estimated percentage. In terms of weighted counts, 95.3196% of
> 9577.258=9129.0 said yes to H1. This weighted number is available in the
> second table.
> Since the standard error of the estimated prevalence is 0.51896%, I
> would have thought that I can convert this percent to count. In terms of
> weighted count it should be 0.51896%*9577.258=49.70. However, the
> results in the second table show that the standard error of the weighted
> count is whooping 390.4 compared to 49.7.
> Are the variance estimation methods different for counts and
> percentages? I would appreciate any pointer to the literature or any
> explanation to this anomaly.
> Thanks in advance.
>
> (Stata output omitted)
The estimated percentages are really -mean- estimators and Pramod's "weighted
numbers" are really -total- estimators. -svy: tabulate- uses -svy: mean- and
-svy: total- to perform most of it's point and variance estimation. So this
boils down to the following discussion.
For simple random sampling (SRS), we have the following relationship between
the mean and total estimators:
mean = total/N
where it is understood that 'N' is the sample size. Furthermore, this
relationship is supported in their standard errors:
SE(mean) = SE(total)/N
This happens because 'N' is fixed before sampling occurs, so
Var(mean) = Var(total/N) = Var(total)/N^2
So why doesn't this relationship hold for complex survey data?
The answer to the question is in -[SVY] variance estimation-.
The mean estimator is
mean = total/W
where 'W' is the sum of the sampling weights, which is rarely a known or fixed
quantity (prior to sampling). Thus 'W' itself is a total estimator and 'mean'
is the ratio of two total estimators. The variance of 'mean' is then
Var(mean) = { Var(total) - 2*mean*Cov(total,W) + mean^2*Var(W) } / W^2
which is at the bottom of page 160 in [SVY] Stata Survey Data Reference Manual
Release 11.
Note that when the sampling weights are all constant, then
Var(W) = 0
Cov(total,W) = 0
and we are back to the SRS relationship between 'mean' and 'total'.
--Jeff
[email protected]
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/