Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Johannes Thrul <Thrul@ift.de> |
To | "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |
Subject | AW: st: Resampling and compare full sample with subsamples |
Date | Mon, 17 Mar 2014 19:47:39 +0000 |
Thank you Steve and sorry for the delayed response. Could you do me a favor and explain briefly, why you would prefer confidence intervals over hypothesis testing in this case? Thanks, Johannes --------------------------------------------------------------------------------------------- Ah, you left out most of the detail; your explanation makes sense. To answer your original question. You want to compare a part A to a whole C But C = A U B, where B is the observations in C that are not in A. Let pA and pB be the prevalnce rates in A and B and pW be the prevalence in the whole. Then if nA, nB, and n are the sample sizes of A,B, and (*) pC = W pA + (1 - W) pB where W = nA/n. (**) pC - pA = (1-W)(pB - pA). A one-sample test comparing A to C is not correct, C is itself a random sample and pC and pA are correlated. as A is a SRS random sample of C without replacement, B is also a SRS, pA and pB are slightly negatively correlated becaus because of (*) If pA and pB are different, then pA and pC are different (and vice-versa). Looking at (**) you can see that the proper test is a *two-sample* test that compares pA and pB. The standard error is computed under the null hypothesis, and without a finite population correction. (Cochran, 1977, problem 2.16, p. 48). I myself think that confidence intervals are preferable to hypothesis tests here. Reference: Cochran, W. G. (1977). Sampling techniques (3rd ed.). New York: Wiley. Steve sjsamuels@gmail.com * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/