Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Resampling and compare full sample with subsamples


From   Steve Samuels <[email protected]>
To   [email protected]
Subject   Re: st: Resampling and compare full sample with subsamples
Date   Mon, 17 Mar 2014 19:35:46 -0400


Johannes:

Do I really think that there is *exactly* zero difference in the
prevalence rates from two parts of your population? No, I don't. To my
mind, the important question is "how different?". This question is the
one addressed by confidence intervals. Also it is the question you
appeared to ask when you stated that the purpose of your analysis is to
"give me an idea of what losing certain kinds of schools means for the
reliability of prevalence figures in other survey waves."


Steve
[email protected]


On Mar 17, 2014, at 3:47 PM, Johannes Thrul <[email protected]> wrote:

Thank you Steve and sorry for the delayed response. 

Could you do me a favor and explain briefly, why you would prefer confidence intervals over hypothesis testing in this case?

Thanks, Johannes





---------------------------------------------------------------------------------------------

Ah, you left out most of the detail; your explanation makes sense. To
answer your original question. You want to compare a part A to a whole C
But C = A U B, where B is the observations in C that are not in A. Let
pA and pB be the prevalnce rates in A and B and pW be the prevalence in
the whole. Then if nA, nB, and n are the sample sizes of A,B, and

(*) pC = W pA + (1 - W) pB where W = nA/n.

(**) pC - pA = (1-W)(pB - pA).

A one-sample test comparing A to C is not correct, C is itself a random
sample and pC and pA are correlated. as A is a SRS random sample of C
without replacement, B is also a SRS, pA and pB are slightly negatively
correlated becaus because of (*)

If pA and pB are different, then pA and pC are different (and
vice-versa). Looking at (**) you can see that the proper test is a
*two-sample* test that compares pA and pB. The standard error is
computed under the null hypothesis, and without a finite population
correction. (Cochran, 1977, problem 2.16, p. 48). I myself think that
confidence intervals are preferable to hypothesis tests here.

Reference: Cochran, W. G. (1977). Sampling techniques (3rd ed.). New
York: Wiley.


Steve
[email protected]

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index