Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Treatment of missing values in surveys in Stata (subpop)
From
Steve Samuels <[email protected]>
To
[email protected]
Subject
Re: st: Treatment of missing values in surveys in Stata (subpop)
Date
Fri, 8 Mar 2013 19:15:57 -0500
Ángel:
Rereading, I see that you asked about using the subpop() option when there are missing values. Leaving a particular question unanswered could happen for many reasons, including fatigue, haste, interviewer error, and data entry mistakes. So again, the theory of the subpopulation correction does not apply.
You didn't need to recode a missing numerical value to something like 999 in order to use it. Such 999 coding is used only for data forms these days.
. svy, subpop( if var < .)
would do the job. This takes care of extended missing values, like .a, since in Stata they order as: . , .a , .b ,..., .z
Multiple imputation is the approach for handling missing values.
Steve
Ángel:
The theory of subpopulation corrections does not apply to non-response.
A subpopulation is a subset of the population tht can be defined in
advance: (e.g. males, ages 30-40, living in rural areas). The number
selected by a sample will be random. For example, suppose a population
of N members contains a subpopulation of M members. A SRS of size n
taken. You should be able work out the exact probability that the sample
will contain exactly k members of the subpopulation. The theory of the
subpopulation correction is an extension of this, and can be found in
any good text.
In contrast, "responder" is not a characteristic, like gender, that is
known in advance. It is defined only in relation to the particular sample
design and protocol. For identical designs, better protocols can
increase response rates. Thus, sampling theory alone cannot
describe the numbers of responders and, consequently, the
subpopulation correction is not applicable.
Steve
[email protected]
On Mar 8, 2013, at 2:37 PM, Ángel Rodríguez Laso wrote:
Dear Statalisters,
I have found two recommended procedures for dealing with individuals
with missing items ('normal' missing answers like 'DK/DA' or equipment
failure) when analysing surveys with Stata:
1) One is based on the recommendation that, unless there is a very
strong reason to do otherwise, whenever you analyse a group of
individuals in a survey with Stata, you have to use subpop. (See for
example: http://www.stata.com/meeting/mexico10/mex10sug_canette.pdf).
Under this perspective, those with valid values would be a
subpopulation. From my point of view, this means that in order to
prevent Stata from dropping them from the calculation of standard
errors, missing codes (".") should be recoded to a numerical value
(like 999) and then a command issued this way:
svy, subpop(if var<999): command var
2) Nevertheless, most of the information I've read does not make any
statement about this, what implicitly means that missing codes don't
need to be recoded. I've even found this piece of advice
(http://www.stata.com/statalist/archive/2012-09/msg01028.html): 'I've
never seen a recommendation to consider observations with non-missing
values as a subpopulation'
I wonder if anyone could throw some ligth on this topic.
Thank you very much.
Angel Rodriguez-Laso
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/