I need to do a relatively simple imputation, but am having trouble following
the examples given.
Here is the situation:
Dataset ~ 10,000 obs (non-weighted, 1 obs/subject)
Variable to be imputed:
EKG_abnormal --binary(yes/no), missing at random < 5% of observations.
Potential predictors with which to impute:
At least five, some binary (e.g. chestpain yes/no, first_cat (1-5), etc.)
some which are continuous but can be made categorical (e.g. age ==> age_cat)
Primary outcome being studied: Death yes/no
The questions:
(1) Should I use the outcome variable (death) as one of imputation
variables? Should I use many imputation variables since I can (large
dataset?_
(2) Most important: Can somebody give an example for the correct way to
issue the commands?
If I do the following:
. hotdeck ekg_abnormal using imp, by(agecat first_cat) store
keep(merge_variable) impute(5)
Then I end up with 5 files, imp1 imp2 imp3 imp4 imp5
Eventually I want to end up with imputed values for ekg_abnormal that I can
use the main logistic regression equation of interest. Not sure where the
options infile(), command(logit) fit into things.
Any help would be greatly appreciated.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/