|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Analyzing a subpopulation in Stata 10.1
This is kind of long, but I hope that some folks, particularly those
with expertise on poststratification and people from StataCorp will hear
me out.
Fundamentally, Figen's question is about how Stata handles missing
values under poststratification. It's one I don't know the answer to,
but that perhaps somebody from StataCorp could help answer.
To illustrate the problem, I've included a program and edited output.
For those not wanting to do too much scanning, I'll summarize what it
does and shows.
I create a fictional dataset of men and women (femV1) who are randomly
assigned to be either native or immigrant status (native; about 50% are
native), and among the women I've created an ever-given-birth (everV1)
variable (about 55% have). Although the sample is about 50% native/50%
immigrant, the hypothetical population is 75% native and I've created a
poststratification weight to deal with that. The everV1 variable is by
definition missing for all males, but I also created a new variable
everV2 that is missing at random for 15% of females.
The first table T1 below shows that there are, after weighting, 2374
females (110 obs) and 1626 males (90 obs) in the population of 4000 (200
obs). If I tabulate everV2 only (T2), without specifying a
subpopulation, we learn that there are 2179 (50) people who have given
birth and 1821 (44) who have not. Since there are only 2374 females and
only 55% of them have given birth, 2179 is clearly too big a number. Of
course, Stata doesn't know that some of the cases are missing by
definition while others are missing at random; *it has simply reweighted
the sample to the full population size.*
Now, if I tabulate everV2 for the subpopulation of females (T3), it
shows that 1242 (50) have ever had a child and 1009 (44) have not, for a
total of 2251 (94). Obviously, 2251 != 2374. *Why hasn't Stata adjusted
the weights so that they add up to the full subpopulation size?* I don't
know.
If I repeat T3 with the "missing" option (T4), I get different results.
These are the same results that I would get if I were using a static
poststratification weight: 1168 (50) yeses, 946 (44) nos, and 260 (16)
missing, adding up to the subpopulation size of 2374 (110). (Note that
this is the same as including a "if ! missing(everV2)" in the subpop()
option.) This is probably better than the seemingly arbitrary result I
get in T2, but I'd really like at least the option for my result to be
adjusted up to the subpopulation size.
So, Stata does adjust the subpopulation weights, but it doesn't adjust
them to the subpopulation size. What precisely is it doing? I wish I
knew. It seems to me that adjusting to the full subpopulation size is
the correct thing to do, but maybe I'm missing something.
Of course, Figen isn't calculating counts, he's calculating proportions.
Nevertheless, the size of errors and proportions depends on how Stata is
counting things internally.
Does this make sense? Is Stata doing the right thing? (And what *is* it
doing in T3?)
Michael
-----------------------------------------------------------------------------------------
clear
set obs 200
set seed 06272009
gen byte femV1 = _n <= 110 // pop, 55% female
gen byte everV1 = (uniform() < .55) if (femV1==1) // females only
clonevar everV2 = everV1
replace everV2 = . if (uniform() < .15) // add missing values
gen byte native = (uniform() <= .5) // about 50% native in sample
label define Lyes01 0 "0-No" 1 "1-Yes"
label val femV1 native everV1 everV2 Lyes01
gen postwt = cond(native, 3000, 1000) // 75% native in population
svyset, poststrata(native) postweight(postwt)
svy: tab femV1, count format(%10.0f) obs // [T1] femV1 only
svy: tab everV2, count format(%10.0f) obs // [T2] ever2 only
svy, subpop(femV1): tab everV2, count format(%10.0f) obs // [T3] with
subpop
svy, subpop(femV1): tab everV2, count format(%10.0f) obs miss // [T4]
with missing
-----------------------------------------------------------------------------------------
. svy: tab femV1, count format(%10.0f) obs // [T1] femV1 only
(running tabulate on estimation sample)
Number of strata = 1 Number of obs
= 200
Number of PSUs = 200 Population size
= 4000
N. of poststrata = 2 Design df
= 199
----------------------------------
femV1 | count obs
----------+-----------------------
0-No | 1626 90
1-Yes | 2374 110
|
Total | 4000 200
----------------------------------
Key: count = counts
obs = number of observations
. svy: tab everV2, count format(%10.0f) obs // [T2] everV2 only
(running tabulate on estimation sample)
Number of strata = 1 Number of obs
= 94
Number of PSUs = 94 Population size
= 4000
N. of poststrata = 2 Design df
= 93
----------------------------------
everV2 | count obs
----------+-----------------------
0-No | 1821 44
1-Yes | 2179 50
|
Total | 4000 94
----------------------------------
Key: count = counts
obs = number of observations
. svy, subpop(femV1): tab everV2, count format(%10.0f) obs // [T3]
with subpop
(running tabulate on estimation sample)
Number of strata = 1 Number of obs
= 184
Number of PSUs = 184 Population size
= 4000
N. of poststrata = 2 Subpop. no. of obs
= 94
Subpop. size =
2251.0638
Design df
= 183
----------------------------------
everV2 | count obs
----------+-----------------------
0-No | 1009 44
1-Yes | 1242 50
|
Total | 2251 94
----------------------------------
Key: count = counts
obs = number of observations
. svy, subpop(femV1): tab everV2, count format(%10.0f) obs miss //
[T4] with miss too
(running tabulate on estimation sample)
Number of strata = 1 Number of obs
= 200
Number of PSUs = 200 Population size
= 4000
N. of poststrata = 2 Subpop. no. of obs
= 110
Subpop. size =
2374.4374
Design df
= 199
----------------------------------
everV2 | count obs
----------+-----------------------
0-No | 946 44
1-Yes | 1168 50
. | 260 16
|
Total | 2374 110
----------------------------------
Key: count = counts
obs = number of observations
-----------------------------------------------------------------------------------------
--
Michael I. Lichter, Ph.D. <[email protected]>
Research Assistant Professor & NRSA Fellow
UB Department of Family Medicine / Primary Care Research Institute
UB Clinical Center, 462 Grider Street, Buffalo, NY 14215
Office: CC 126 / Phone: 716-898-4751 / FAX: 716-898-3536
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/