Heather E. Ridolfo <[email protected]> asks about a Statalist exchange we had in
July of last year:
> I posted a message in July 2008 about problems I was encountering when
> using svy mean:
> "Using svy: mean- with option -subpop() I noticed that it is reporting a
> smaller estimation sample than the number of observations in my
> dataset."
>
> The reply I got back said:
> "We have verified that -svy: mean- is incorrectly dropping out-of-subpop
> observations that contain missing values in the variables of the
> varlist. The only other affected commands are -svy: proportion-, -svy:
> ratio-, and -svy: total-. We hope to have this fixed in the next Stata
> update (within the next few weeks)"
>
> However, I continue to experience this problem a year and half later
> when trying to run the following command:
> Svyset PSU [pweight = nweight], strata(STRATUM) singleunit(centered)
> Svy, subpop(allsp): mean RA sevimpft ADLS IADLS help UseAD
>
> The number of observations I get back is smaller than the number of
> actual observation in the dataset. I am using Stata 10 and as far as I
> can tell it's up-to-date.
>
> Does anyone have any suggestions on how I can fix this problem?
In the Stata 10 whatsnew, the update on 18aug2009 contains the following item:
48. svy: mean, svy: proportion, svy: ratio, and svy: total would
mark out observations with missing values in the summary
variables even when the sampling weight was zero, which is a
surrogate for identifying out-of-subpopulation observations.
This has been fixed.
Given Heather's example, -svy- will drop observations containing missing
values in any of the following variables:
PSU
nweight
STRATUM
-svy- will then only check the following variables for missing values within
the subpopulation observations:
RA
sevimpft
ADLS
IADLS
help
UseAD
The following simple example illustrates that -svy- is only dropping
observations with missing values within the subpopulation.
. sysuse auto
. tabulate rep78 foreign, missing nolabel
. svyset _n
. svy, subpop(if for==0): mean rep78
. svy, subpop(if for==1): mean rep78
In the following output from Stata 10, -tabulate- shows that -rep78- is
missing in 5 observatsion, 4 observations where foreign=0 and 1 observation
where foreign=1. The two calls to -svy: mean- show that the sample size is 70
and 73, respectively.
***** BEGIN:
. sysuse auto
(1978 Automobile Data)
. tabulate rep78 foreign, missing nolabel
Repair |
Record | Car type
1978 | 0 1 | Total
-----------+----------------------+----------
1 | 2 0 | 2
2 | 8 0 | 8
3 | 27 3 | 30
4 | 9 9 | 18
5 | 2 9 | 11
. | 4 1 | 5
-----------+----------------------+----------
Total | 52 22 | 74
. svyset _n
pweight: <none>
VCE: linearized
Single unit: missing
Strata 1: <one>
SU 1: <observations>
FPC 1: <zero>
. svy, subpop(if for==0): mean rep78
(running mean on estimation sample)
Survey: Mean estimation
Number of strata = 1 Number of obs = 70
Number of PSUs = 70 Population size = 70
Subpop. no. obs = 48
Subpop. size = 48
Design df = 69
--------------------------------------------------------------
| Linearized
| Mean Std. Err. [95% Conf. Interval]
-------------+------------------------------------------------
rep78 | 3.020833 .1205044 2.780434 3.261233
--------------------------------------------------------------
. svy, subpop(if for==1): mean rep78
(running mean on estimation sample)
Survey: Mean estimation
Number of strata = 1 Number of obs = 73
Number of PSUs = 73 Population size = 73
Subpop. no. obs = 21
Subpop. size = 21
Design df = 72
--------------------------------------------------------------
| Linearized
| Mean Std. Err. [95% Conf. Interval]
-------------+------------------------------------------------
rep78 | 4.285714 .1537776 3.979164 4.592264
--------------------------------------------------------------
***** END:
--Jeff
[email protected]
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/