Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: svy subpop option and e(sample)
From
Steven Samuels <[email protected]>
To
[email protected]
Subject
Re: st: svy subpop option and e(sample)
Date
Wed, 25 May 2011 15:03:32 -0400
--
I have to agree with Stas. While the standard errors from subsetting and from using -subpop()- might be indistinguishable, it is not a given. My example was simplistic: there were no strata, and clusters were independent of the subpopulation, weight, and analysis variables.
Another, potentially more serious, problem for subpopulation analysis is not, I believe, mentioned in the SJ article: bias can arise if the supplied weights, appropriate for the whole sample, are wrong for the subpopulation. Levy & Lemeshow (2008, p.148) give a simple example.
Ref: Levy, Paul S, and Stanley Lemeshow. 2008. Sampling of Populations : Methods and Applications. 4th ed. Wiley Series in Survey Methodology. Hoboken, N.J: Wiley.
Steve
[email protected]
On May 25, 2011, at 12:35 PM, Stas Kolenikov wrote:
On Wed, May 25, 2011 at 10:10 AM, Richard Williams
<[email protected]> wrote:
> As a sidelight, one of the things that has always bothered me about subpop
> is that you are apparently never supposed to create an extract from your
> data, e.g. you could have 100 million cases and only be interested in a
> subpopulation of 10,000, but you are nonetheless supposed to keep all 100
> million cases in your data set so the standard errors are right. I always
> wonder how horrible it would be if you just made the extract or used -if-
> instead of subpop. If, say, the standard errors might be off by .01%, I
> suspect I could live with that.
If you have 100M cases, it is called a census ;).
See http://stata-journal.com/article.html?article=st0153. My
understanding of this (quite neat) article is that you are OK in few
selected situations: when your subpop == a stratum or a union of
several strata, or subpop cuts through all PSUs (i.e., every PSU has a
member from the subpopulation, so subsetting with -if- does not kill
any sampling units). That way, subsetting the data by -if- still
produces design-consistent standard errors. Read the article, though.
If you have a design that's more complicated than the standardized one
(stratified, two-stage clustered with replacement, as -webuse nhanes2-
is), things will get more complicated. The bottom line is, YOU MUST
HAVE OVERWHELMINGLY STRONG REASONS TO SUBSET YOUR DATA WITH IF instead
of using -subpop()- that is always appropriate. Going down from 100M
observations to 10K observations is not a very convincing reason to
me, frankly.
The subset used for subpop is passed through (-passthru-ed?) in
e(subpop), so your predicted probabilities can be restricted to the
subpopulation with
predict whatever `e(subpop)' , options
or
predict whatever if `e(subpop)' , options
depending on how -subpop()- option was specified. If you had no
-subpop()-, then of course it will be empty, so the things should work
out fine for you.
webuse nhanes2, clear
svy : logit highbp age
* specify the subpop as the -if- condition
svy, subpop(if diabetes==1) : logit highbp age
est store logit1
predict prob1 `e(subpop)', pr
* specify the subpop as the 0/1 variable
svy, subpop(diabetes) : logit highbp age
est store logit2
predict prob2 if `e(subpop)', pr
* why the heck are they different??? Because -diabetes- has missing values!
compare prob1 prob2
* is subsetting wrong here? It might be OK.
svy : logit highbp age if diabetes == 1
est store logit3
est tab logit1 logit2 logit3, se
P.S. I agree with Steve that this is the expected behavior of -svy-
and -e(sample)-, and I wouldn't want them to work otherwise.
--
Stas Kolenikov, also found at http://stas.kolenikov.name
Small print: I use this email account for mailing lists only.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/