Richard--
I think there is a legit way to make your dataset smaller, but it
*really can be* that horrible if you just extract cases rather than
using the subpop option:
webuse nhanes2f, clear
svy, subpop(highlead): logit heartatk female weight diabetes
keep if highlead
svy: logit heartatk female weight diabetes
though this is a somewhat perverse case because of the missing values.
If you're careful, the subset should give you identical coefs and
SEs:
webuse nhanes2f, clear
svy, subpop(highlead): logit heartatk female weight diabetes
est sto correct
keep if highlead==1
bys strat (psu): g coll=psu[1]==psu[_N]
egen c=group(strat psu)
g mstr=cond(coll==1,33,strat)
svyset c [pw=finalw], strat(mstr)
svy: logit heartatk female weight diabetes
est sto approx
esttab correct approx, mti nogaps sca(N_pop N_subpop F)
Note that now only the pop size and F are off--this too is fixable:
webuse nhanes2f, clear
svy, subpop(highlead): logit heartatk female weight diabetes
est sto correct
preserve
keep if highlead!=1
keep if !mi(heartatk,female,weight,diabetes)
collapse (sum) finalw, by(strat psu highlead)
foreach v in heartatk female weight diabetes {
g `v'=0
}
tempfile tmp
save `tmp'
restore
keep if highlead==1
bys strat (psu): g coll=psu[1]==psu[_N]
egen c=group(strat psu)
g mstr=cond(coll==1,33,strat)
svyset c [pw=finalw], strat(mstr)
svy: logit heartatk female weight diabetes
est sto approx
append using `tmp'
svyset psu [pw=finalw], strat(strat)
svy, subpop(highlead): logit heartatk female weight diabetes
est sto better
esttab correct approx better, mti nogaps sca(N_pop N_subpop F)
Note that the "better" data just contain one obs for each stratum/psu
containing the sum of weights for excluded obs, thus reducing the
total size of the data. It is tempting to write a -svysubset- package
to automate this subsetting procedure, but for any given model, the
pattern of missing values might be different, which means the
automatic-subsetting package could offer no savings in general over
keeping all the data in memory. Your student might be able to do
something like the above once, though, and then safely use the subset
for multiple analyses.
On 11/21/07, Richard Williams <[email protected]> wrote:
I know that when using the svy: prefix, you should use the subpop
option when analyzing subsamples, rather than using if or dropping
cases. However, I have a student who has this monstrous data set and
she only wants to analyze a small subset of it. I'm afraid that if
she has to keep all these unused cases in her file, her
not-so-powerful computer is going to have problems.
Is there a legit way to extract only the cases you want? Is it all
that horrible if you extract cases rather than use subpop?
Thanks for any info. And Happy Thanksgiving to all those who
celebrate it.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/