Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: Number of Obs with svy , suppop()
From
Michael Mitchell <[email protected]>
To
[email protected]
Subject
st: Number of Obs with svy , suppop()
Date
Thu, 18 Mar 2010 16:20:59 -0700
Greetings
I am flummoxed by the output of "svy : tab" with respect to the
population size. I hope someone can help. For example, consider the
"highschool" dataset used in the [SVY] manual, with a couple of tweaks
as shown below...
. webuse highschool, clear
. svyset [pw=sampwgt]
. replace race = . in 1/71
Here is the tabulation of race and sex by race.
. tab race, missing
1=white, |
2=black, |
3=other | Freq. Percent Cum.
------------+-----------------------------------
White | 3,500 85.97 85.97
Black | 431 10.59 96.56
Other | 69 1.69 98.26
. | 71 1.74 100.00
------------+-----------------------------------
Total | 4,071 100.00
. tab sex race, missing
1=male, | 1=white, 2=black, 3=other
2=female | White Black Other . | Total
-----------+--------------------------------------------+----------
male | 1,676 193 35 34 | 1,938
female | 1,824 238 34 37 | 2,133
-----------+--------------------------------------------+----------
Total | 3,500 431 69 71 | 4,071
Now I run a "svy : tab" on race, and the "Number of obs" is 4000, as
I expect since that is the number of valid observations on race.
. svy : tab race, count format(%13.2fc)
(running tabulate on estimation sample)
Number of strata = 1 Number of obs = 4000
Number of PSUs = 4000 Population size = 7880496.9
Design df = 3999
------------------------
1=white, |
2=black, |
3=other | count
----------+-------------
White | 6,930,316.91
Black | 754,879.69
Other | 195,300.31
|
Total | 7,880,496.91
------------------------
Key: count = weighted counts
.
But now I want to analyze just the sub-population of males (sex==1)
and it shows that the number of obs is now 4037 (see below). How can
the number of observations increase when adding a -subpop()- option?
There are suddenly 37 extra observations. Note this corresponds to the
number of females with a missing race.
. svy , subpop(if sex==1): tab race, count format(%13.2fc)
(running tabulate on estimation sample)
Number of strata = 1 Number of obs = 4037
Number of PSUs = 4037 Population size = 7932333.9
Subpop. no. of obs = 1904
Subpop. size = 3780355.3
Design df = 4036
------------------------
1=white, |
2=black, |
3=other | count
----------+-------------
White | 3,367,920.96
Black | 324,487.42
Other | 87,946.89
|
Total | 3,780,355.27
------------------------
Key: count = weighted counts
Just to make sure that this was not coincidence, I repeated this
process again with a different number of missing values on race. The
output below shows, again, when adding the -subpop() option, the
number of observations increases, again by the number of women who
have a missing value on race (from 4061 to 4065, and 4 women have a
missing value on race).
. webuse highschool, clear
. svyset [pw=sampwgt]
pweight: sampwgt
VCE: linearized
Single unit: missing
Strata 1: <one>
SU 1: <observations>
FPC 1: <zero>
.
. replace race = . in 1/10
(10 real changes made, 10 to missing)
. tab race, missing
1=white, |
2=black, |
3=other | Freq. Percent Cum.
------------+-----------------------------------
White | 3,542 87.01 87.01
Black | 450 11.05 98.06
Other | 69 1.69 99.75
. | 10 0.25 100.00
------------+-----------------------------------
Total | 4,071 100.00
. tab sex race, missing
1=male, | 1=white, 2=black, 3=other
2=female | White Black Other . | Total
-----------+--------------------------------------------+----------
male | 1,696 201 35 6 | 1,938
female | 1,846 249 34 4 | 2,133
-----------+--------------------------------------------+----------
Total | 3,542 450 69 10 | 4,071
. svy : tab race, count format(%13.2fc)
(running tabulate on estimation sample)
Number of strata = 1 Number of obs = 4061
Number of PSUs = 4061 Population size = 7972647.7
Design df = 4060
------------------------
1=white, |
2=black, |
3=other | count
----------+-------------
White | 7,000,891.28
Black | 776,456.11
Other | 195,300.31
|
Total | 7,972,647.70
------------------------
Key: count = weighted counts
. svy , subpop(if sex==1): tab race, count format(%13.2fc)
(running tabulate on estimation sample)
Number of strata = 1 Number of obs = 4065
Number of PSUs = 4065 Population size = 7979171.9
Subpop. no. of obs = 1932
Subpop. size = 3827193.3
Design df = 4064
------------------------
1=white, |
2=black, |
3=other | count
----------+-------------
White | 3,404,730.57
Black | 334,515.81
Other | 87,946.89
|
Total | 3,827,193.27
------------------------
Key: count = weighted counts
Can someone explain why the number of observations increases based
on the number of people who are excluded based on the -subpop()-
option who are also missing on the tabulated variable?
Many thanks,
Michael N. Mitchell
See the Stata tidbit of the week at...
http://www.MichaelNormanMitchell.com
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/