Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Steve Samuels <sjsamuels@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: Resampling and compare full sample with subsamples |
Date | Mon, 17 Mar 2014 19:35:46 -0400 |
Johannes: Do I really think that there is *exactly* zero difference in the prevalence rates from two parts of your population? No, I don't. To my mind, the important question is "how different?". This question is the one addressed by confidence intervals. Also it is the question you appeared to ask when you stated that the purpose of your analysis is to "give me an idea of what losing certain kinds of schools means for the reliability of prevalence figures in other survey waves." Steve sjsamuels@gmail.com On Mar 17, 2014, at 3:47 PM, Johannes Thrul <Thrul@ift.de> wrote: Thank you Steve and sorry for the delayed response. Could you do me a favor and explain briefly, why you would prefer confidence intervals over hypothesis testing in this case? Thanks, Johannes --------------------------------------------------------------------------------------------- Ah, you left out most of the detail; your explanation makes sense. To answer your original question. You want to compare a part A to a whole C But C = A U B, where B is the observations in C that are not in A. Let pA and pB be the prevalnce rates in A and B and pW be the prevalence in the whole. Then if nA, nB, and n are the sample sizes of A,B, and (*) pC = W pA + (1 - W) pB where W = nA/n. (**) pC - pA = (1-W)(pB - pA). A one-sample test comparing A to C is not correct, C is itself a random sample and pC and pA are correlated. as A is a SRS random sample of C without replacement, B is also a SRS, pA and pB are slightly negatively correlated becaus because of (*) If pA and pB are different, then pA and pC are different (and vice-versa). Looking at (**) you can see that the proper test is a *two-sample* test that compares pA and pB. The standard error is computed under the null hypothesis, and without a finite population correction. (Cochran, 1977, problem 2.16, p. 48). I myself think that confidence intervals are preferable to hypothesis tests here. Reference: Cochran, W. G. (1977). Sampling techniques (3rd ed.). New York: Wiley. Steve sjsamuels@gmail.com * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/