Dear Statalist member:
I have data from a complex survey. The appropriate svyset code is:
svyset [pweight = xweight] , strata (xstratum), psu (xpsu)
I have done some analyses of the data most often with SVYREG. These
analyses used data from only the survey's first wave. The survey has
about 100 different PSU's. it has about 50 cases within each PSU . I
am using version 8 of STATA.
the analysis that I need to do now is more complex in two ways: 1) the
survey has 3 waves and 2) I want to analyze simultaneously 6 different
dependent variables at each wave. the six different variables are
different measures of children's behavior. I will do some appropriate
transformations of the measures to make their scaling similar (I will
likely generate z scores of each variable). My primary interest is in
fixed effects, so I don't need to implement a random effects (linear
mixed model), though I would be open to doing so.
Again, in the more complex analysis I have:
6 different behavioral measurements within 3 different waves within (an
average of about) 50 cases within 100 psu's. There are numerous
situations of missing waves for cases and of missing measurements within
waves. Each line of my data set represents a different outcome measure
(so, I have up to 6 lines of data per wave, fewer when the outcome
measure is missing)
Let me say a couple of things (and I may be wrong). GEE won't work as
it build a specialized correlation matrix (very nice but not essential
for my work) at the level of the wave and my standard errors need to be
developed at the level of the PSU (GEE won't (to my knowledge) do
clustered standard errors at a higher level than the level of the
correlation matrix. Linear mixed modeling implemented in STATA 9 won't
work as it doesn't handle sampling weights and doesn't have robust
standard errors. GLLAMM would likely work but I probably don't want to
use it as it is so slow computationally.
Basically, my best options appear to be REGRESS and SVYREG.
Could I simply specify REGRESS with robust standard errors at the level
of the PSU and with the strata variables included as dummies. It seems
to me that the only problem that I would have would be that I couldn't
develop a specialized correlation matrix as in GEE.
On, the other hand, could I use SVYREG? If so, would I need to rewrite
the SVYSET statement to include the waves? the multiple observations
within waves? What other modifications would be needed?
Related to the just-mentioned procedures is my own lack of practical and
statistical knowledge about sampling weights. Here is my concern
(probably ungrounded): the weights for my survey were (so far as I know)
designed to be used under the presumption that there would be one
outcome variable for each case in any given analysis. Now, I could have
as many as 18 outcomes for each case (as many as 6 per wave). So, my
sample size (number of lines in the data set) will be many times larger
than the 5000 or so on which the sampling weights were presumably built.
So, if I don't alter the weights, won't standard errors be inaccurate?
Or will SVYREG somehow adjust to the just-described issue and generate
accurate standard errors.
I have the same concern if I use REGRESS. I worry that my standard
errors will be many times too small.
Any comments on how and whether this data can be analyzed in STATA will
be greatly appreciated.
Thank-you.
Jim Rosenthal
Professor
University of Oklahoma
School of Social Work
1005 Jenkins Avenue
Norman OK 73069
405-325-1401
fax: 405-325-7072
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/