I am with the dataset now and it seems that you were complete right that
it was a distribution problem. After logarithimc transformation and back
the results of imputation with -ice- seem fine.
Nevertheless I am left with one last issue on imputation. I also want to
impute a discrete variable, namely the age of companies in years
(integers) with a maximum of 37 years (age has only been measured as of
1967). The distribution is for this variable is definately not normal, but
it is not extremely skewed as well.
Can I manipulate my data such that I can apply standard (OLS) regression
to impute with -ice- or should I apply ordinal logistic regression in this
case???
Again many thanks and greetings,
Ren�
--- Ren� Wevers <[email protected]> wrote:
> The basis of the statement is twofold, indeed one reason is coming
> from the hard to explain results I got from -ice- for the
(continuous)
> variables I mentioned yesterday.
They are quite easily explainable (though you mentioned that you still
needed to check that): The distribtion of sizes of companies is never
going to be anywhere near a Gaussian (normal) distribution, however
strange your sampling scheme may be.
> However, another reason comes from a simple test I performed with
> -ice-. I randomly created missing values (25%) for a dichotomous
> variable where there were none missing and imputed these 'missing'
> values with -ice-. Afterwards approx. 700 out of 3000 imputed
> values proved to be different from the original values. When I used
> -impute- and rounded the results only 350 out of 3000 imputed values
> were different from the original values. Naturally this is a very
> weak test, but 700 out of 3000 'faulty' imputed values does not give
> me a lot of confidence in -ice- for my case.
I like simulations as a means of gaining understanding of statistical
techniques, and you and me are in good company: There is a working
paper by Stef van Buuren, Jaap Brand, karin Groothuis-oudshoorn, and
Don Rubin (who invented multiple imputation) that does a simulation
study of MICE (R and Splus), -ice- (Stata), and IVEWARE (SAS). (full
reference below)
In the past I have posted a number of simulations of -ice- on the
statalist:
http://www.stata.com/statalist/archive/2007-04/msg00900.html
http://www.stata.com/statalist/archive/2007-05/msg00778.html
http://www.stata.com/statalist/archive/2007-12/msg00504.html
Neither Van Buuren et al. nor I could find something systematically
wrong with -ice-. The reason for the difference in our finding and your
finding is that you used the wrong criterium for success: multiple
imputation never claims to be able to recover not observed values, it
claims to be able (under the MAR assumption) to recover means,
proportions, variances, of variables and patterns of association
between variables. Counterintuitive as it may sound the "better"
performance of -impute- is actualy the result of the fact that is is
worse than -ice- (it ignores the uncertainty around the prediction).
Hope this helps,
Maarten
VAN BUUREN S, BRAND JPL, GROOTHUIS-OUDSHOORN CGM, RUBIN DB. Fully
Conditional Specification in Multivariate Imputation. Journal of
Statistical Computation and Simulation, in press. Simulation study on
the MICE algorithm.
http://web.inter.nl.net/users/S.van.Buuren/mi/hmtl/mice.htm
-----------------------------------------
Maarten L. Buis
Department of Social Research Methodology
Vrije Universiteit Amsterdam
Boelelaan 1081
1081 HV Amsterdam
The Netherlands
visiting address:
Buitenveldertselaan 3 (Metropolitan), room Z434
+31 20 5986715
http://home.fsw.vu.nl/m.buis/
-----------------------------------------
__________________________________________________________
Sent from Yahoo! Mail - a smarter inbox http://uk.mail.yahoo.com
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/