Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Hitesh Chandwani <hchandwani.stata@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: svy subpop option and e(sample) |
Date | Fri, 27 May 2011 00:20:55 -0400 |
Steve, 300,000 is not the number of PSUs. One PSU has multiple observations...approximately 1800 PSUs account for the 300,000 observations. These are nationwide hospital billing records data for 3 years. It is a 20% stratified sample of state hospital data. Also, the subpopulation is defined by characteristics of observations within PSUs (more specifically, the observations are hospital events related to a specific diagnosis). So in the scenario I have presented, is 300,000 large enough? Regards, Hitesh On Thu, May 26, 2011 at 10:40 PM, Steven Samuels <sjsamuels@gmail.com> wrote: > > Hitesh, > > The relevant number would be the number of PSUs. If that is 300,000, I would think that it's much more than enough. If you don't mind my asking, what kind of sample had 75 million observations? I usually encounter numbers like that only in census data. > > Steve > sjsamuels@gmail.com > > > > Steve, > > You said in an earlier message: For a large enough subpopulation, the > correct standard error for the ratio is indistinguishable from the > standard error that assumes that the sample size was fixed (Lohr, > 2009, p. 135, shows the formula for a SRS). > > How large is large enough? I am facing a similar problem. I extracted > my subpopulation of interest and have 300,000 observations. My > original data had 75 million observations with 61 variables. I cannot > use the entire data due to insufficient RAM on my computer (I will > need about 30-odd GB of RAM to analyze the data as a whole). I had to > ask someone with access to such a powerful machine to extract the data > for me. > > If the standard errors for data this large are not going to be very > biased, I can report the variance estimation issue as a limitation of > the analysis. If the data are not large enough, then I will need to > compute dummy variables for all PSUs not represented in the extracted > data. > > I would appreciate any help on the matter. > > Regards, > -- > Hitesh S. Chandwani > University of Texas at Austin > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > -- Hitesh S. Chandwani University of Texas at Austin * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/