An interchange on the MEPS (Medical Expenditure Panel Study) list---
Begin forwarded message:
From: "Rhoades, Jeffrey" <[email protected]>
Date: October 11, 2005 11:15:07 AM EDT
To: [email protected]
Subject: [MEPS-L] Using STATA to adjust for Complex Survey Design -
Single PSU issu es--Additional Comments
Reply-To: AHRQ's question & discussion group regarding MEPS <MEPS-
[email protected]>
1. Page 47 in STATA v9 "Survey Data" manual gives an explanation,
including numerical examples from NHANES, of why ignoring strata
and/or PSU is really really bad. Even though you can get correct
point estimates, the tests, confidence intervals, and standard
errors will usualy be wrong. In the NHANES II numerical example in
the documentation, ignoring the strata but using weights and PSUs
gives standard errors that are 50% too small.
2. Lonely PSUs are a problem in any software but STATA has fewer
options than SUDAAN or R. Page 241 in STATA v9 "Survey Data"
manual gives an explanation, including numerical example from
NHANES, of dealing with single PSUs within strata in the STATA
survey functions. The recommendation from the STATA documentation
is to collapse strata. This is extremely dangerous when you don't
know how the strata are formed.
The best advice is to avoid single PSUs if possible. The most
common cause of single PSUs in MEPS is subsetting the data instead
of using the subpop function (see page 38). Single PSUs do not
naturally occur in the (full) Full Year Files but if you subset you
can create a single PSU.
Note that if you are using a MEPS file such an event level file
that only has records for individuals with the specific event you
should consider linking back to the full year file to get the
correct variance structure. You will need to define individuals
not in the specific event file as having a zero event as opposed to
dropping them. This will produce correct totals but means have a
denominator that is the overall population. If you want means
conditional on having the event, you will need to use the subpop
function but the analysis in the non-event group is non-sensical.
If there are very many single PSUs it is a very serious problem.
You might be able to form BRR replicates but you should seek the
advice of a statistician in doing so.
If you have a single PSU due to linking to an external file such as
NHIS or some other reason, you should seriously consider switching
to SUDAAN (missunit) or R survey ( 'options("survey.lonely.psu")' ).
If you use any of STATA, SUDAAN, or R and decide to collapse
strata, consider collapsing strata
a) within the same Census region - this will take a little data
dredging to determine
b) of like sampling type, i.e., certainities vs non-certainites -
this will definitely take some work
We would appreciate if any of your staff could advise on the issue
below. Could you please also post it onto the listserv? Many thanks,
Su-Ying
We are conducting analyses using NHIS and MEPS. We use the survey
commands in STATA to adjust for complex survey design and have to
decide how to deal with the problem with single psu. We understand
that it is important to adjust for all three levels of survey
design (weights, PSUs, and strata) in order to obtain correct
variance and standard errors. However, we are wondering what the
statistical implications are (magnitude and direction of standard
errors, in particular) if we only adjust for weights and PSUs but
not strata. Per the Stata manual, the variance estimates are based
only on computations at the primary sampling-unit level and do not
require information about the secondary sampling units. We thus
had considered the loss of efficiency without adjusting for strata
might have little impact on the variance estimates or standard errors.
We ran bivariates analyses with and without adjusting for strata.
For some analyses, results of the Pearson chi-square test were
similar. However, for other analyses, results were very different,
e.g., analysis without adjusting for strata is not significant
(p=0.16) while analysis adjusting for strata is highly significant
(p<0.001).
We are wondering whether it has been examined or determined how the
results may or may not vary without or without adjusting for
strata. Does it depend on the type of analyses (bivaraite or
regression), sample size, or sub-group analyses?
We would greatly appreciate your thoughts on this issue.
Su-Ying
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/