We are in the middle of investigating this issue of "missings" (meaning Not
In Universe) with regard to sample design, but my view right now is that
Stata does not address this properly.
Anyway, one fix is assign values to missing cases (use something that will
cause problems if the cases get included like -999) and then use a subpop
statement to restrict the analysis to the proper cases.
For those of you interested in this issue, the philosphopical question is
this. Consider a survey designed to sample from a specific target
population, and assume that there are correct sample design variables for
that population and that these design variables meet the criteria of at
least two PSUs per stratum. Now assume that certain questions are asked
only of a subgroup of the population. For example, we have a question "Has
a doctor ever told you to quit smoking?" Clearly the question is asked only
of smokers, and in our case, smokers who have seen a physician in the past
12 months. Now are the group of people who are asked this question a
sub-population of the target population, or are they a population unto
themselves? If they are a sub-population, then the sample design variables
appropriate for the target population are sufficient to describe the
sub-population, and Stata ought to estimate properly without consideration
of missings for those not in the sub-group. But if this is a separate
population, then yes, a new set of sample design variables is necessary.
I'm collecting references. If enough people are interested I'll post them.
Bryan Sayer
Statistician, SSS Inc.
[email protected]
-----Original Message-----
From: Michael R. Smith [mailto:[email protected]]
Sent: Monday, August 26, 2002 11:47 AM
To: [email protected]
Subject: st: Bootstrap and percentages
I'm processing some data that was generated with a complex sample design but
to which, for reasons of confidentiality, I don't have direct access. This
means that the svy commands are not a practical option. Use of them requires
correction for PSUs with only one case - but the PSUs with missing values
that reduce them to one case will vary depending on variables in the
analysis. Submitting my code to the person who runs it in order to find out
when and where to merge PSUs would, then, become an extremely cumbersome
process.
So bootstrapping looks like the most practical method for inferential
purposes. It's clear how to do that with regression and related procedures.
But's it's not obvious to me how one should go about using the bs command to
generate standard errors for a percentage table. The bs command requires
specifying each coefficient to be bootstrapped. How does one specify cells
in a percentage table? Part of the analysis requires generating percentage
tables with quite large numbers of cells, so I need to generate a large
number of standard errors.
I've read what seem to be the relevant sections of the manual and rooted
around in the FAQs and other documentation for an answer, so far with no
success.
Michael Smith
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/