First of all, your original weights are wrong, anyway: they should add
up to the population size, may be up to some sampling variability. If
you have 8664 units and your population size is 8664, it means that
you have a census! If you are dealing with ratios and regression
models, the issue isn't of great importance, but you still would want
to have everything implemented properly.
Then by dropping units, you are confusing the software in terms of
thinking how the data were collected. In particular, the pairwise
probabilities of selection (leading to the variances of the estimates)
will be way off (and so will your variance). If you had some nice
design (with some sort of proportional allocations, etc.), then those
properties will be lost, the cluster sizes will get wrong, etc. DON'T
DO THAT, as the bottom line.
I am pretty sure there are other, and better, explanations in the
[SVY] manual. Also, the FAQ
http://www.stata.com/support/faqs/stat/zerowgt.html might be helpful,
as essentially with the -subpop- analysis, you are zeroing out the
weights for non-subpopulation units.
On 3/18/07, Jason Ferris <[email protected]> wrote:
I have a large dataset with weights calculated as PPS based on household
size, stratified by sex. The age group respondents are from 16-64.
I am interested in looking at data only from those aged 16-24. I can
use the subpop command "subpop(if age>=16 & age<=24)" for all the
commands. But I am wondering if I can drop all other cases (keep if
age>=16 & age<=24) and the 'reset' my weights based only on those aged
16-24.
In the original form (with all data) I have the following summary data:
(note the survey design is quiet a simple one)
Svyset
pweight: pps
VCE: linearized
Strata 1: sex
SU 1: <observations>
FPC 1: <zero>
. svy: tab sex
(running tabulate on estimation sample)
Number of strata = 2 Number of obs = 8664
Number of PSUs = 8664 Population size = 8664
Design df = 8662
-----------------------
sex | proportions
----------+------------
female | .5046
male | .4954
|
Total | 1
-----------------------
Key: proportions = cell proportions
If I select the subgroup (age 16-24):
. svy,subpop(if age<=24): tab sex
(running tabulate on estimation sample)
Number of strata = 2 Number of obs = 8664
Number of PSUs = 8664 Population size = 8664
Subpop. no. of obs = 999
Subpop. size = 1438.7586
Design df = 8662
-----------------------
sex | proportions
----------+------------
female | .4599
male | .5401
|
Total | 1
-----------------------
Key: proportions = cell proportions
When I reset my weights with data only representing those 16-24 years of
age (ie., as if this was the way I original designed my study) I get the
following results:
. svy: tab sex
(running tabulate on estimation sample)
Number of strata = 2 Number of obs = 999
Number of PSUs = 999 Population size = 999
Design df = 997
-----------------------
sex | proportions
----------+------------
female | .4655
male | .5345
|
Total | 1
-----------------------
Key: proportions = cell proportions
As it can be seen there is now a difference in the proportions between
using subpop and resetting my weights. Is this a problem?
Jason
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
--
Stas Kolenikov
http://stas.kolenikov.name
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/