Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Number of Obs with svy , suppop()
From
Phil Schumm <[email protected]>
To
[email protected]
Subject
Re: st: Number of Obs with svy , suppop()
Date
Fri, 19 Mar 2010 14:42:34 -0500
On Mar 19, 2010, at 3:17 AM, Michael Norman Mitchell wrote:
Thank you for your reply... I am still struggling to solidly
understand this. Perhaps I have a more fundamental question. What is
the formula for the "Number of obs" in the context of the -svy-
commands. It sounds like, in the absence of the -subpop()- option,
it is the number of observations with non-missing values on the
tabulated variable. And, in the presence of the -subpop()- option it
is the total number of observations minus the number of observations
that meet the -subpop()- option and are missing on the tabulated
variable. Am I on the right track here?
Yes, I believe this is correct (note however that I haven't looked
into this carefully, so if you need confirmation of Stata's behavior
WRT this issue, you'll need to get it from the manual or from someone
like Jeff). One more thing I should mention: How you proceed in cases
like this may depend on the reason(s) that the data are missing. For
example, suppose the missing values for race are due to respondents
refusing to answer the question or saying "I don't know." In that
case, Durbin argued that this should be taken into account when
defining the subpopulation (also referred to in the survey literature
as a domain).[1] IOW, in your example, the subpopulation of interest
would be "all males who, when asked, will provide an answer to this
question." In this case, you would augment your -subpop()-
specification like this:
svy, subpop(if sex==1 & !mi(race)):
in which case the "number of observations" reported by Stata should
now correspond to the total number of observations in your dataset.
More importantly, this would specify a slightly different variance
calculation, though the actual result may only differ very slightly
(if at all) depending on the circumstances. Note that I almost never
see anyone do this -- at least not in the applied social science
literature.
Of course, what I just described does nothing to address the possible
bias that might arise if those who don't respond differ (in terms of
race) from those who do...
-- Phil
[1] J. Durbin. Sampling theory for estimates based on fewer
individuals than the number selected. Bulletin of the International
Statistical Institute, 36(3):113–119, 1958.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/