ROBERT BOZICK <[email protected]> is interested in computing the
subpopulation standard deviations for one or more subpops:
> The reason for needing the correct sample size is that I need to compute the
> population standard deviation. I have been using the command suggested in
> http://www.stata.com/support/faqs/stat/supweight.html
>
> The post estimation command after I use the svymean is:
> di sqrt(e(N) * el(e(V_srs),1,1))
>
> Additionally, I will be using some by commands as well. For example:
>
> svymean var1, subpop(samp) by(sex)
>
> and then I want to compute the population standard deviation for var1 for
> both categories of sex. Without the sample size, I cannot get the correct
> standard deviation using the suggested post estimation command.
Stata 8:
In the FAQ, subpopulations are not really mentioned, but we can use the
discussion about estimating the population standard deviation to derive how we
would estimate the subpopulation standard deviation.
When you specify the -subpop()- and/or -by()- options to -svymean-, the
subpopulation sample sizes are stored in e(_N). Thus the formula becomes
sqrt(el(e(_N),1,1) * el(e(V_srs),1,1))
Although not mentioned in the FAQ, this formula only applies if you have not
-svyset- using the -fpc()- option. If you -svyset- using the -fpc()- option,
then the formula is
sqrt(el(e(_N),1,1) * el(e(V_srswr),1,1))
In the case where you would use the -srssubpop- option for looking at the DEFF
and DEFT design effects, the above formulas apply; just note that you will get
a different e(V_srs) and e(V_srswr) when the -srssubpop- option of -svymean-
is specified than when it isn't.
The above formulas only apply to the first subpop of the first variable. To
get a row vector for all the subpopulations and variables in the call to
-svymean- try
matrix var = hadamard(e(_N), vecdiag(e(V_srs)))
or
matrix var = hadamard(e(_N), vecdiag(e(V_srswr)))
Then take the square root of each variance
local cols = colsof(var)
matrix sd = J(1,`cols',0)
forval i = 1/`cals' {
matrix sd[1,`i'] = sqrt(var[1,`i'])
}
matrix list sd
---
Stata 9:
There are two differences for Stata 9 in the above discussion.
1. -svy: mean- has an -over()- option in place of the -by()- option of
-svymean-.
2. Although -svy- does not have an -srssubpop- option, -svy: mean- stores the
'srssubpop' standard errors in -e(V_srssub)- and -e(V_srssubwr)-; so for those
interested in standard deviation estimates assuming SRS sampling within the
specified subpopulations the formulas are
Without -fpc()- in the first stage:
matrix var = hadamard(e(_N), vecdiag(e(V_srssub)))
With -fpc()- in the first stage:
matrix var = hadamard(e(_N), vecdiag(e(V_srssubwr)))
--Jeff
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/