Suppose all N clusters in the population contain M individuals. Our design is
a random sample of n clusters, with a subsample of m individuals from each
selected cluster. Let ss1 be the sample variance of the n cluster means, and
ss2 the average of the sample variances from the n clusters. Further let SS1
and SS2 be the respective population values. The FPC's for the two stages are
f1 = n/N and f2 = m/M, respectively.
We are looking at three different variance estimators here:
Stata without FPC (recommended):
v0 = ss1/n
Stata with FPC (not recommended):
v1 = (1-f1)*ss1/n
Unbiased 2 Stage clustered estimator (not yet implemented in Stata):
v2 = (1-f1)*ss1/n + f1*(1-f2)*ss2/(m*n)
The expected values of the estimators are:
E(v0) = SS1/n + (1-f2)*SS2/(m*n)
E(v1) = (1-f1)*SS1/n + (1-f1)*(1-f2)*SS2/(m*n)
E(v2) = (1-f1)*SS1/n + (1-f2)*SS2/(m*n)
The bias of v0 is:
E(v0) - E(v2) = f1*SS1/n > 0
whereas the bias of v1 is:
E(v0) - E(v2) = -f1*(1-f2)*SS2/(m*n) < 0
Reference:
Cochran, W. G. 1977. Sampling Techniques. Wiley: New York.
--Jeff
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/