Dear Statalist,
This may seem a stupid question for the statisticians among you but
I'd appreciate some help.
I want to run a regression on cross-section data with lots of
variables, some of which have missing values. When I do that, Stata
estimates the model using only the observations which have values for
all variables. I downloaded tabmiss and rmiss2 as in the relvant FAQ
and the commands would certainly help in enabling me to decide which
variables to drop. But is there any way that I could retain all the
variables with their missing values and make allowance for the missing
values by including a dummy for missing variables?
The way you retain the missing values is by recoding them to a
non-missing value, e.g. the variable's mean. This has all sorts of
problems though. The MD dummy variable indicator that you propose
used to be popular but has since been discredited. See Paul
Allison's Sage book "Missing Data."