| |
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
st: Survey analysis: Why does PSU affect proportions?
I am using a published dataset (from CDC) and I want to verify that I
can duplicate the published results before I begin my analysis. I
found that including clustering and stratification variables changed
proportions, not only standard errors. In the output that follows, I
first used only the weights. This gives results that agree with the
published estimates of the proportions (produced using SUDAAN). Next
I used the PSU and STRATA, and get different proportions, but
confidence intervals that agree with the published results, with
minor variations in other runs. I don't think the clustering by
strata and psu should influence proportions, only their variance.
The dataset documentation says that a three-state cluster sample was
used, but only the WEIGHT, PSU, and STRATA are included in the public
dataset. The FAQ (Survey-data analysis, first item) says that this
sort of svyset statement will produce appropriate variance estimates
for multistate designs. Am I doing something wrong?
Craig L. Anderson, PhD (epidemiology)
Department of Emergency Medicine
University of California, Irvine
STATA output:
. clear
. set memory 16M
(16384k)
.
. use yrbs05
. svyset[pw=weight]
pweight: weight
VCE: linearized
Strata 1: <one>
SU 1: <observations>
FPC 1: <zero>
. svy: tab q2 qn9, row se ci
(running tabulate on estimation sample)
Number of strata = 1 Number of obs
= 13837
Number of PSUs = 13837 Population size =
13852.064
Design df
= 13836
-------------------------------------------------------
what is | never/rarely wore seat belt
your sex | 1 2 Total
----------+--------------------------------------------
female | .0778 .9222 1
| (.004) (.004)
| [.0703,.0861] [.9139,.9297]
|
male | .125 .875 1
| (.0051) (.0051)
| [.1153,.1354] [.8646,.8847]
|
Total | .1017 .8983 1
| (.0033) (.0033)
| [.0954,.1083] [.8917,.9046]
-------------------------------------------------------
Key: row proportions
(linearized standard errors of row proportions)
[95% confidence intervals for row proportions]
Pearson:
Uncorrected chi2(1) = 84.4318
Design-based F(1, 13836) = 52.6547 P = 0.0000
. svyset psu [pw=weight], strata(stratum)
pweight: weight
VCE: linearized
Strata 1: stratum
SU 1: psu
FPC 1: <zero>
. svy: tab q2 qn9, row se ci
(running tabulate on estimation sample)
Number of strata = 4 Number of obs
= 13704
Number of PSUs = 51 Population size =
13730.762
Design df
= 47
-------------------------------------------------------
what is | never/rarely wore seat belt
your sex | 1 2 Total
----------+--------------------------------------------
female | .0778 .9222 1
| (.0076) (.0076)
| [.0638,.0944] [.9056,.9362]
|
male | .1234 .8766 1
| (.0109) (.0109)
| [.103,.1471] [.8529,.897]
|
Total | .1006 .8994 1
| (.0087) (.0087)
| [.0844,.1195] [.8805,.9156]
-------------------------------------------------------
Key: row proportions
(linearized standard errors of row proportions)
[95% confidence intervals for row proportions]
Pearson:
Uncorrected chi2(1) = 78.7114
Design-based F(1, 47) = 49.8319 P = 0.0000
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/