On 3/28/08, Angel Rodriguez Laso <[email protected]> wrote:
> Thank you for your answer, Stas.
>
> I�ve tried both specifications and the first surprise was that Stata 9
> ignores further stages when stage 1 is sampled with replacement.
That's right, if you sample with replacement, then your PSUs are
independent (provided that you sample independently from those PSUs
that are selected more than once).
> The problem with using age groups as second stage strata is that being 3 the
> number of people over 65 selected per census tract, whenever there are
> missing values in the variables some strata become single-PSU (person)
> strata, what prevents Stata from calculating standard errors.
See below -- I have questions about it.
> This is something I want to check with
> you: From the reading of Korn and Graubard "Analysis of health surveys" I�ve
> understood that in complex surveys degrees of freedom are calculated as
> #PSUs - #strata (624 for the first specification and 1244 for the second,
> because Stata duplicates the number of census tracts because each of them
> belongs to two different strata).
Well I understood from your initial posting that you had 7 strata, and
from each you've taken 7 "young" people and 3 elderly. But upon
re-reading it, I see that you never mentioned the number of census
tracts you are sampling per stratum -- which would be your PSUs, and
individuals will be your SSUs. If you indeed have 600+ PSUs/tracts,
then you don't need to worry that much about degrees of freedom -- but
there might still be asymptotic issues, as the conventional
asymptotics are the number of strata going to infinity, with #PSUs per
stratum being bounded from above. That's a rather esoteric issue
though; I think Krewski and Rao (1981) was a well known one that made
the distinction (http://www.citeulike.org/user/ctacmo/article/774883).
Then also if you have 600+ PSUs, then I don't see how you could get
singleton strata -- you really would need to have all of your tracts
to miss people 65+.
> It�s usual practice
> to work with such low numbers of individuals per PSU (10 in my case) and
> I�ve never heard that there was a problem of a small sample size then.
Yes. What matters most is the number of PSUs. I think what Korn and
Graubard don't like about d.f. = #PSU - #strata is that this is a very
low number for some important surveys or domains in those surveys,
like hispanics in NHANES where that number is something like 6, even
though there might be a few hundred cases. I think they had a
discussion in the book how to increase that number, although all their
strategies are ad hoc, and few are indeed justifiable from a rigorous
JNK Rao-style design perspective. They had another paper in JRSSa
(http://www.citeulike.org/user/ctacmo/article/933864) where they also
raise similar issues.
Steven Samuels asked some relevant questions, too.
--
Stas Kolenikov, also found at http://stas.kolenikov.name
Small print: Please do not reply to my Gmail address as I don't check
it regularly.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/