Dear Statalist,
This may seem a stupid question for the statisticians among you but
I'd appreciate some help.
I want to run a regression on cross-section data with lots of
variables, some of which have missing values. When I do that, Stata
estimates the model using only the observations which have values for
all variables. I downloaded tabmiss and rmiss2 as in the relvant FAQ
and the commands would certainly help in enabling me to decide which
variables to drop. But is there any way that I could retain all the
variables with their missing values and make allowance for the missing
values by including a dummy for missing variables?
For example, assume that income and age have missing values. So I
construct "dincome" and "dage" each to take the value 1 if the
observation is missing and 0 otherwise and estimate the following:
regress income dincome male sex age dage education
I tried this and Stata insisted on dropping the observations with
missing values as well as my missing value dummies.
Can you please tell me:
(a) Is there a way I can force Stata to estimate my model and not drop
missing value dummies and observations with missing values?
(b) Would doing so bias my estimates or make the whole procedure inefficient?
Thanks so much,
Ramani
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/