This is very interesting. It also illustrates one of my major complaints
about sampling statisticians: There is often a major break between the
design of the sample and the data that actually ends up in the hands of
users. Frankly I think we (sampling statisticians) should shut up about
telling users to estimate variances correctly until we make sure they have
the information they need to do so!
So this is what I see. While I have a hard time believing that the whole
population is "interviewed" I will still accept that parts are a census and
not a sample. So assuming that this is "the truth, the whole truth and
nothing but the truth":
Strata Sampling Fraction (fpc)
Rural 1
Small Cities 1
Medium Cities 0.2
Large Cities 0.1
So what should the sampling units be? And should the strata be city
specific (probably)? In the medium and large cities, it would appear that
maybe they should be the "segments". But if they are, we would need to know
the population count of "segments" for the fpc. And of course we need the
segment identifiers. It might be possible to get a weighted count of
segments within strata and use that as the population total. Plus we don't
truly know that each segment is a PSU or that all houses in a segment are
similar and thus should be considered a PSU. If the interviews are in
person, then it probably is true that a segment is a good representation of
a PSU, but maybe not.
Where does all that leave us? Probably with insufficient information, but
this is one approach. If the analysis is done separately for the "strata"
we can ignore them, and that is probably the thing to do here. If there is
a segment and a city indicator then combine those two pieces to create PSUs
i.e. each city specific segment is a PSU. Be conservative and ignore the
fpc. Then you can use any estimation command that has a cluster option with
pweights, and know that you are being as conservative as possible.
Good luck!
Bryan Sayer
Statistician, SSS Inc.
[email protected]
-----Original Message-----
From: Cruces,GA (pgr) [mailto:[email protected]]
Sent: Thursday, May 15, 2003 2:04 PM
To: [email protected]
Subject: st: Systematic sampling in Stata
Dear All,
I have a question about systematic sampling. I have read the Stata
manuals but I still cannot identify the PSUs and stratas in my data, and
I am not very sure about how to handle the systematic sampling. In broad
terms, I cannot match the textbook with my case...
Basically, I am using a census, in which the whole population for rural
areas and cities of less than 100.000 inhabitants were interviewed. In
cities with 100.000-500.000, I am told that 1 out of five "segments"
were systematically selected, and in cities of more than 500.000, 1 in
10 "segments" were selected. The "segments" are units of around forty
houses assigned to each each interviewer.
Finally, the weights are the inverse of the probability of being
included (so for rural areas and small cities they are just one). The
resulting sample (more than 16 million observations) is half the total
population.
I am unsure about how to deal with this within Stata, since for half of
my data I have the whole population whereas for the other I only have a
(complex) sample.
Any help will be appreciated.
Thank you all!
best,
g.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/