--- Mark Lunt <[email protected]> wrote:
> ICE assumes that continuous variables are normally distributed: if
> that is not the case, impossible values can appear. In particular, if
> you have lots of companies with a few employees and a few companies
> with lots of employees, ICE will impute negative numbers of
> employees. One possible solution is to use the "match" option of ICE.
Good point. An alternative would be to take the logarithm of the number
of employees.
> Alternatively, I have written some ado-files which convert variables
> to normal-scores and back: you can convert to normal scores (which
> are normally distributed), perform the imputation on these
> variables, then convert back to your original distribution.
I have had a quick look at this command and it would seem that you use
the rank of each observation and transform that as if it came from a
normal distribution. I think that that is too strong a transformation,
as you throw away all information about the distances between values
and only use the rank. This is most clearly visible when two or more
observations have the same value. In the way you programed this
procedure they are given different ranks, and thus different values on
your new variable:
*--------- begin example ---------
sysuse auto, clear
nscore rep78, gen(gauss)
twoway scatter gauss rep78
*---------- end example ----------
-- Maarten
-----------------------------------------
Maarten L. Buis
Department of Social Research Methodology
Vrije Universiteit Amsterdam
Boelelaan 1081
1081 HV Amsterdam
The Netherlands
visiting address:
Buitenveldertselaan 3 (Metropolitan), room Z434
+31 20 5986715
http://home.fsw.vu.nl/m.buis/
-----------------------------------------
__________________________________________________________
Sent from Yahoo! Mail - a smarter inbox http://uk.mail.yahoo.com
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/