Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: svy subpop option and e(sample)
From
Hitesh Chandwani <[email protected]>
To
[email protected]
Subject
Re: st: svy subpop option and e(sample)
Date
Fri, 27 May 2011 00:20:55 -0400
Steve,
300,000 is not the number of PSUs. One PSU has multiple
observations...approximately 1800 PSUs account for the 300,000
observations.
These are nationwide hospital billing records data for 3 years. It is
a 20% stratified sample of state hospital data.
Also, the subpopulation is defined by characteristics of observations
within PSUs (more specifically, the observations are hospital events
related to a specific diagnosis).
So in the scenario I have presented, is 300,000 large enough?
Regards,
Hitesh
On Thu, May 26, 2011 at 10:40 PM, Steven Samuels <[email protected]> wrote:
>
> Hitesh,
>
> The relevant number would be the number of PSUs. If that is 300,000, I would think that it's much more than enough. If you don't mind my asking, what kind of sample had 75 million observations? I usually encounter numbers like that only in census data.
>
> Steve
> [email protected]
>
>
>
> Steve,
>
> You said in an earlier message: For a large enough subpopulation, the
> correct standard error for the ratio is indistinguishable from the
> standard error that assumes that the sample size was fixed (Lohr,
> 2009, p. 135, shows the formula for a SRS).
>
> How large is large enough? I am facing a similar problem. I extracted
> my subpopulation of interest and have 300,000 observations. My
> original data had 75 million observations with 61 variables. I cannot
> use the entire data due to insufficient RAM on my computer (I will
> need about 30-odd GB of RAM to analyze the data as a whole). I had to
> ask someone with access to such a powerful machine to extract the data
> for me.
>
> If the standard errors for data this large are not going to be very
> biased, I can report the variance estimation issue as a limitation of
> the analysis. If the data are not large enough, then I will need to
> compute dummy variables for all PSUs not represented in the extracted
> data.
>
> I would appreciate any help on the matter.
>
> Regards,
> --
> Hitesh S. Chandwani
> University of Texas at Austin
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
--
Hitesh S. Chandwani
University of Texas at Austin
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/