(1) Regarding the difference in p-values, all this seems to suggest is
that the -test- being implemented using your (first) procedure is the same
as that being implemented after -svy:reg- and -suest- and is not the same
as the -test- being implemented after -svy:logit-. If you reran the
first -suest- and -test- commands while using svy:reg to compare the two
subgroups, rather than one subgroup to the total, you'd get exactly the
same results as your first procedure. To the extent that the current
-suest & test- results for -svy:reg- are almost identical to your first
test seems quite reasonable. So the question seems to be: For the
comparison at hand using a dichotomous outcome, is the -test- following
the first two procedures more appropriate than the -test- following the
last procedure (syvy:logit)? Had the outcome been continuous, the issue
of gross differences in p-values would likely have not come up.
(2) Regarding the following questions:
"I reiterate my original concern and ask if there is no "statistical
difference between MI and non-MI," but there is a "statistical difference
between MI and the nation" (the
USA, being the one that contains Statacorp) or vice versa, what should we
conclude? "
I doubt that your scenario would arise in practice, but wouldn't it depend
on the proportion of the total sample represented by MI?
Nonetheless, there are cases where it makes sense to compare a subgroup of
cases to all cases rather than the remaining cases. The pattern of
results would likely be opposite to what you propose in your question.
Let's say we want to look whether nonresponse in a survey leads to
nonresponse bias in one's estimate of a proportion in the population, say
proportion with affective disorders. Nonresponse is a necessary but not
sufficient condition for nonresponse bias, so we need to test this. So
does nonresponse lead to nonresponse bias in our estimate based on those
who participated?
Let's say that 10% of the selected sample doesn't participate, but we do
some intensive follow-up with a sample of nonrespondents to get an
estimate of the proportion. To see if nonresponse bias exists, we *do
not* want to compare the following:
p(affective disorder | respondents) vs. p(affective disorder |
nonrespondents)
It may very well be that the proportion with affective disorders is
significantly higher among nonrespondents than among respondents. But
this does not address the issue of whether nonresponse bias exists. What
we want to compare is:
p(affective disorder | respondents & nonrespondents) vs. p(affective
disorder | respondents)
This may not be significantly (statistically or practically) different,
even though the first comparison is, thereby suggesting no significant
nonresponse bias.
Now let's say that 35% of the selected sample doesn't participate. We may
find that both comparisons are significantly different:
p(affective disorder | respondents) vs. p(affective disorder |
nonrespondents)
p(affective disorder | respondents & nonrespondents) vs. p(affective
disorder | respondents)
Nonetheless, it is only the latter comparison that directly addresses the
issue of nonreponse bias.
"Austin Nichols" <[email protected]>
Sent by: [email protected]
12/05/2006 01:21 PM
Please respond to
[email protected]
To
[email protected]
cc
Subject
Re: st: statistical test to compare two survey means from two estimating
equations
In summary:
Brent Fulton <[email protected]> asked How can one "compare the
survey-based means" for a subpop to the whole pop?
I <[email protected]> advised him to compare the subpop to the
balance of the pop.
Michael Frone <[email protected]> wrote "How about -suest-
followed by -test-"
But note that the various options outlined can lead to different
answers, as demonstrated below. I reiterate my original concern and
ask if there is no "statistical difference between MI and non-MI," but
there is a "statistical difference between MI and the nation" (the
USA, being the one that contains Statacorp) or vice versa, what should
we conclude?
webuse nhanes2
local m=35
svy, subpop(if age<=`m'): tab diab bl, col se
gen p21=.
gen p22=.
mat li e(b)
mat b1=e(b)
local a1=b1[1,3]
local b1=b1[1,4]
test p21=p22
local p1=r(p)
svy, subpop(if age<=`m'): reg diab
estimates store a1
svy, subpop(if age<=`m' & bl==1): reg diab
estimates store a2
suest a1 a2
mat b2=e(b)
local a2=b2[1,1]
local b2=b2[1,2]
test [a1]_cons=[a2]_cons
local p2=r(p)
svy, subpop(if age<=`m'): logit diab
estimates store a3
svy, subpop(if age<=`m' & bl==1): logit diab
estimates store a4
suest a3 a4
mat b3=e(b)
local a3=invlogit(b3[1,1])
local b3=invlogit(b3[1,2])
test [a3]_cons=[a4]_cons
local p3=r(p)
foreach v in a b p {
if "`v'"=="a" di "Diabetes" in gr " svy: tab svy: reg svy:logit" _c
if "`v'"=="a" di in gr _n " NB/All " _c
if "`v'"=="b" di in gr _n " Black " _c
if "`v'"=="p" di in gr _n " p-value" _c
forv i=1/3 {
di in ye _col(`=`i'*10') ``v'`i'' _c
}
}
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/