created using -ice-, I get a parameter estimate of 0 and a standard
error of 0 for a continuous independent variable. However, fitting
models for each imputed dataset individually (without -mim-) produces
non-zero estimates with similar (to each other) magnitudes. This
suggests either that -mim- is telling me something I don't know how to
interpret, there is something wrong with -mim-, or there is something
nearly-invisible wrong with the data.
Longer version: I am looking at the relationship between adoption of a
particular kind of software system by physicians and a set of
independent variables. One of my independent variables is years of
experience, EXPER. EXPER is continuous and approximately normally
distributed. Unfortunately, 25% of my cases are missing on EXPER, so I
decided to try multiple imputation using the Galati, Carlin, and Royston
-ice- and -mim- commands available from SSC. After -ice-, the
distribution of EXPER in the imputed datasets looks fine (that is,
similar in shape, mean, and variance to the original), and its
relationship to the dependent variable HASEMR looks the same. If I use
-logit- (without -mim-) to look at the relationship between the two
variables in the original dataset (_mj==0) and in the first imputed
dataset (_mj==1), I get nearly identical results. (I've added
PCTPOVERTY, a continuous variable with no missing data, to show below
that my problem is just with EXPER.)
. logit hasemr exper pctPoverty if _mj==0
Iteration 0: log likelihood = -421.1634
Iteration 1: log likelihood = -398.42368
Iteration 2: log likelihood = -397.85135
Iteration 3: log likelihood = -397.84996
Logistic regression Number of obs
= 699
LR chi2(2) =
46.63
Prob > chi2 =
0.0000
Log likelihood = -397.84996 Pseudo R2 =
0.0554
------------------------------------------------------------------------------
hasemr | Coef. Std. Err. z P>|z| [95% Conf.
Interval]
-------------+----------------------------------------------------------------
exper | -.0505082 .0091315 -5.53 0.000 -.0684057
-.0326108
pctPoverty | -.0441709 .0135119 -3.27 0.001 -.0706537
-.0176881
_cons | 1.021042 .3025942 3.37 0.001 .4279685
1.614116
------------------------------------------------------------------------------
. logit hasemr exper pctPoverty if _mj==1
Iteration 0: log likelihood = -617.85901
Iteration 1: log likelihood = -588.0887
Iteration 2: log likelihood = -587.54423
Iteration 3: log likelihood = -587.5435
Logistic regression Number of obs
= 1001
LR chi2(2) =
60.63
Prob > chi2 =
0.0000
Log likelihood = -587.5435 Pseudo R2 =
0.0491
------------------------------------------------------------------------------
hasemr | Coef. Std. Err. z P>|z| [95% Conf.
Interval]
-------------+----------------------------------------------------------------
exper | -.0492512 .0074785 -6.59 0.000 -.0639089
-.0345936
pctPoverty | -.0324184 .0106465 -3.04 0.002 -.0532852
-.0115517
_cons | .9076523 .2400196 3.78 0.000 .4372224
1.378082
------------------------------------------------------------------------------
However, this is what happens when I try to estimate the same model with
-mim-:
. mim: logit hasemr exper pctPoverty
Multiple-imputation estimates (logit) Imputations
= 10
Logistic regression Minimum obs
= 1001
Minimum dof =
103.2
------------------------------------------------------------------------------
hasemr | Coef. Std. Err. t P>|t| [95% Conf.
Int.] FMI
-------------+----------------------------------------------------------------
exper | -0 0 -5.52 0.000 -0 -0
0.285
pctPoverty | -.036964 .011038 -3.35 0.001 -.058637 -.015291
0.059
_cons | .937575 .270402 3.47 0.001 .403621 1.47153
0.217
------------------------------------------------------------------------------
Obviously something is wrong. It can't be just that the uncertainty of
the estimate is high due to the high proportion of missing data, since
that should result in a large standard error. Note that the imputation
procedure does produce a small number of negative and therefore
nonsensical values for years of experience (14 total across 10 imputed
datasets), but this problem doesn't go away when I set those to 0. Also
note that the dependent variable HASEMR has about 8% missing in the
original dataset.
Any idea what's wrong, or suggestions for diagnostics? Thanks.
--
Michael I. Lichter, Ph.D.
Research Assistant Professor & NRSA Fellow
UB Department of Family Medicine / Primary Care Research Institute
UB Clinical Center, 462 Grider Street, Buffalo, NY 14215
Office: CC 125 / Phone: 716-898-4751 / E-Mail: [email protected]
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/