Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: Imputation of missing data in an unbalanced panel using ICE
From
James Bernard <[email protected]>
To
[email protected]
Subject
st: Imputation of missing data in an unbalanced panel using ICE
Date
Fri, 25 Oct 2013 19:46:54 +0800
Hi all,
I have been using imputation techniques. Stata offers a wide range of
commands to conduct imputation.
I have a unbalanced panel data. Several variables have missing values.
To benefit from the fact that the available observation of a variable
at certain times can help estimate the missing values at other times,
I changed the format of my data from long to wide and used ICE using
the instruction from this site:
http://www.ats.ucla.edu/stat/stata/faq/mi_longitudinal.htm
These instructions work for a balanced panel data set where all firms
are supposed to have values in all years.
But, imagine that one firm has to have values from 2000-2003, and
another from 1998-2003. And, suppose we have a variable (X) for which
some observations across these two firms are missing
Firm Year X
--------- --------- -------
A 2000 .
A 2001 10
A 2002 6
A 2003 .
B 1998 3
B 1999 .
B 2000 .
B 2001 4
B 2002 6
B 2003 2
Reshaping the data from long to wide would lead to: creation of 6 new
varibale named "X1998", "X1999",......"X2003".... and values of X1998
and X1999 will be missing for firm A
And running the ICE, it would predict values for X1998 and X1999 for
both firm A and B.
The next step is to get the data into long form and run the -mi-
commands to make the estimation which use Rubin rules for combining
the data on the m imputations made.
One may argue that I can let the ICE predict the values of X1998 and
X1999 for firm A. Reshape the data into long format and remove the
values of X from firm A in 1998 and in 1999, because firm A is not
supposed to have values in 1998 and 1999.
My question is: Does asking ICE to predict values of X1998 and X1999
for firm A affect the way it predicts the value of X2000 (which is the
main observation we have to impute)?
Does the technique I used make sense?
Also, how wrong is to use only the first imputation (M=1) to run the
model, instead of using all the imputations?
Thanks,
James
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/