|
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: Factor analysis(?) question - missing data
On Apr 22, 2008, at 1:06 PM, Glenn Hoetker wrote:
This is perhaps more of a statistical questions than a Stata
question. My situation is this. I have a large dataset in which
there are 5-6 indicators each for a bunch of latent variables. Let
me take as an example having 5 measures for innovative output, x1-
x5. The problem is that very few observations have all 5 measures;
some are missing x1, some x2, etc. Almost every observation has at
least 3 measures and most 4.
Is there anyway to optimally combine these indicators to measure
the underlying construct of innovative output that would use all
available measures for a given observation, i.e., x1-x4 for one
observation, [x1-x3,x5] for another, etc. If I thought these were
equally weighted, I could just average over the available variables
in each, setting aside issues of measurement error. However, I'm
not convinced they are equally weighted and would like to do this
in a more rigorous fashion.
How you approach this will depend critically on whether the missing
data are missing at random (MAR), or, more precisely, on whether you
are willing to assume that this is so. It is often difficult, if not
impossible, to investigate this rigorously.
If you are willing to assume MAR, then you have at least 3 options.
You can fit a factor analytic (or other similar) model directly using
an algorithm that can accommodate missing data (e.g., the EM
algorithm, or, better yet, the ECME algorithm; see, for example, Liu
and Rubin, Statistica Sinica 8 (1998), 729-747). I once programmed
this (EM) in Stata to handle multiple regression with missing data --
perhaps others have done more. Second, you can fit the model using -
gllamm-, which will accommodate missing data under the MAR
assumption. And finally, you could use multiple imputation, as
implemented for example in Royston's excellent -ice- package (try -
ssc describe ice-). In all cases, you could then use empirical Bayes
estimates of the latent factors in subsequent analyses, or go on to
fit a full structural model.
I'm sure others will have more to say...
-- Phil
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/