Aside from statistical issues, list wise deletion could lead to selection
bias because units or subjects with missing data may be systematically
different from those without missing data. Naturally, what's best also
depends on specific circumstances. On the other hand, the nature of the
statistical problems with single imputation is very well understood
(artificially increased precision), but the direction and magnitude of the
potential bias resulting from case-wise deletion may be difficult to judge.
Leonelo
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Richard Williams
Sent: Wednesday, January 05, 2005 8:49 AM
To: [email protected]
Subject: st: impute command for missing data
In its documentation for the -impute- command, the Stata 8 reference manual
states that "[imputation] is not the only method for coping with missing
data, but it is often much better than deleting cases with any missing
data, which is the default."
I'm curious how much agreement there is with that statement. If your
choices were limited to (a) listwise (aka casewise) deletion of missing
data, or (b) filling in imputed values for the missing data (e.g. the
overall mean, a subgroup mean, or a regression estimate of the missing
value),
are their indeed situations in which (b) is "often much better?" Listwise
deletion, of course, causes you to lose cases; but imputation can lead to
misleading standard errors and test statistics because techniques don't
take into account the uncertainty about the values of the missing data. In
his monograph on Missing Data Allison seems to prefer listwise deletion
over conventional imputation procedures but I'm not sure what the consensus
is on this.
I realize that there are advanced methods that may be better than (a) or
(b); but if your choice is only between (a) and (b), is it really the case
that (b) is often much better (or did the manual writers just make that up)?
Also, just curious if people would agree with me that, rightly or wrongly,
listwise deletion is the most common strategy for dealing with missing
data? It seems like many of the more advanced techniques are not well
understood and/or are not well implemented in statistical software. For
example, Stata has some user-written routines (e.g. -hotdeck-) but the
built-in support for handling missing data seems pretty limited.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/