Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: svy subpop option and e(sample)

From	Hitesh Chandwani <[email protected]>
To	[email protected]
Subject	Re: st: svy subpop option and e(sample)
Date	Fri, 27 May 2011 00:20:55 -0400

Steve,

300,000 is not the number of PSUs. One PSU has multiple
observations...approximately 1800 PSUs account for the 300,000
observations.

These are nationwide hospital billing records data for 3 years. It is
a 20% stratified sample of state hospital data.

Also, the subpopulation is defined by characteristics of observations
within PSUs (more specifically, the observations are hospital events
related to a specific diagnosis).

So in the scenario I have presented, is 300,000 large enough?

Regards,
Hitesh

On Thu, May 26, 2011 at 10:40 PM, Steven Samuels <[email protected]> wrote:
>
> Hitesh,
>
> The relevant number would be the number of PSUs. If that is 300,000, I would think that it's much more than enough. If you don't mind my asking, what kind of sample had 75 million observations? I usually encounter numbers like that only in census data.
>
> Steve
> [email protected]
>
>
>
> Steve,
>
> You said in an earlier message: For a large enough subpopulation, the
> correct standard error for the ratio is indistinguishable from the
> standard error that assumes that the sample size was fixed (Lohr,
> 2009, p. 135, shows the formula for a SRS).
>
> How large is large enough? I am facing a similar problem. I extracted
> my subpopulation of interest and have 300,000 observations. My
> original data had 75 million observations with 61 variables. I cannot
> use the entire data due to insufficient RAM on my computer (I will
> need about 30-odd GB of RAM to analyze the data as a whole). I had to
> ask someone with access to such a powerful machine to extract the data
> for me.
>
> If the standard errors for data this large are not going to be very
> biased, I can report the variance estimation issue as a limitation of
> the analysis. If the data are not large enough, then I will need to
> compute dummy variables for all PSUs not represented in the extracted
> data.
>
> I would appreciate any help on the matter.
>
> Regards,
> --
> Hitesh S. Chandwani
> University of Texas at Austin
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>



-- 
Hitesh S. Chandwani
University of Texas at Austin

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: svy subpop option and e(sample)
  - From: Steven Samuels <[email protected]>
- Re: st: svy subpop option and e(sample)
  - From: Steven Samuels <[email protected]>

References:
- st: svy subpop option and e(sample)
  - From: Richard Williams <[email protected]>
- Re: st: svy subpop option and e(sample)
  - From: Steven Samuels <[email protected]>
- Re: st: svy subpop option and e(sample)
  - From: Richard Williams <[email protected]>
- Re: st: svy subpop option and e(sample)
  - From: Steven Samuels <[email protected]>
- Re: st: svy subpop option and e(sample)
  - From: Hitesh Chandwani <[email protected]>
- Re: st: svy subpop option and e(sample)
  - From: Steven Samuels <[email protected]>

Prev by Date: Re: st: date conversion
Next by Date: st: Proportional hazard assumption
Previous by thread: Re: st: svy subpop option and e(sample)
Next by thread: Re: st: svy subpop option and e(sample)
Index(es):
- Date
- Thread