Agreed. The coolest way to approach these
problems is to apply -ice-, and also to
compare results with those on the subset
with all non-missing. Or go out into
the field and fill in the missing values!
Nick
[email protected]
Richard Williams
> At 10:52 AM 11/2/2005, Ramani Gunatilaka wrote:
> >Dear Statalist,
> >This may seem a stupid question for the statisticians among you but
> >I'd appreciate some help.
> >I want to run a regression on cross-section data with lots of
> >variables, some of which have missing values. When I do that, Stata
> >estimates the model using only the observations which have values for
> >all variables. I downloaded tabmiss and rmiss2 as in the relvant FAQ
> >and the commands would certainly help in enabling me to decide which
> >variables to drop. But is there any way that I could retain all the
> >variables with their missing values and make allowance for
> the missing
> >values by including a dummy for missing variables?
>
> The way you retain the missing values is by recoding them to a
> non-missing value, e.g. the variable's mean. This has all sorts of
> problems though. The MD dummy variable indicator that you propose
> used to be popular but has since been discredited. See Paul
> Allison's Sage book "Missing Data."
>
> For a synopsis of basic strategies and their pros and cons, see
>
> http://www.nd.edu/~rwilliam/stats2/l12.pdf
>
> That handout is weak in discussing more advanced methods, although it
> does allude to them. You might check out Royston's -ice- package,
> which was recently updated and discussed in the Stata Journal. Use
>
> -findit ice-
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/