Dear Statalisters,
I have a normally distributed variable Y with 10% item missing data and fully observed covariates. I can use multiple imputation using the -ice- algorithm to postulate whether the missingness mechanism is missing at random (MAR). However, I want to explore nonignorable imputation methods with the assumption that missingness is not missing at random (NMAR).
In Little & Rubin (2002, Chapter 15), the authors discuss varieties of Heckman's selection model that can be used to impute nonignorable missing data. The logic here is that the postulated correlation in the error terms of the selection equation and the regression equation imply violation of the parameter distinctness assumption central to ignorable (MAR) nonresponse imputation methods. Therefore, heckman is an example of a nonignorable (NMAR) imputation method.
In Stata 8.2, I run the following command: -heckman Y X, select(Z)-, where X are covariates related to regression equation and Z to selection equation. I use the maximum likelihood version of -heckman- as I suspect that the assumption of normality underlying it is appropriate in this instance. After heckman I type -predict hkwage if e(sample)-.
Now, when I evaluate the imputed (predicted) wage data for Y using -kdensity hkwage if Y_mis==1-, the imputed data look very much normally distributed. In other words, heckman seems to be imputing based on the assumption that my missing data are missing at random, rather than not missing at random.
My question is: Is this evidence of a NMAR imputation method that is doing its job or evidence of a MAR imputation method doing its job? If the latter, my understanding is incorrect.
Thanks,
Reza
-----
Reference: Little, RJA & Rubin, DB (2002): "Statistical analysis with missing data", second edition, New York: Wiley.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/