hi
i have a data set with missing data on both continuous and categorical data
, for example smoking status (non smoker, ex smoker and smoker) i would
like to impute values for the missing data, my problem relates to the
categorical data, what techniques are available for imputing missing
categorical data and what programs would i need for the imputation, i have
access at the moment to spss and stata , if possible any references to the
use of missing data imputation would also be useful
I have found that Adrian Mander's hotdeck program, used appropriately, can
work very well. However it requires a -by(varlist)- to define strata for
the imputation. For smoking, my bylist would include age, sex, and PERHAPS
some other characteristic that identifies propensity (social class or
function). So I would categorize age by decades and similarly categorize
the last variable. The problem that one runs into is that soon you have a
great may groups. If this turns out to be too many for your sample then you
have to reduce the number of categories.