Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: heckprob using multiple imputation
From
Marcus L Britton <[email protected]>
To
[email protected]
Subject
Re: st: heckprob using multiple imputation
Date
Tue, 18 Mar 2014 09:56:34 -0500 (CDT)
I share Klaus' interest in this issue--any guidance from those more knowledgeable on this issue than I would be greatly appreciated.
Klaus, you may already be aware of this post on CrossValidated: http://stats.stackexchange.com/questions/65678/using-heckman-in-combination-with-mi-estimate-stata
But if not, perhaps it will be helpful.
Marcus Britton
------------------------------
Date: Sat, 15 Mar 2014 15:14:20 +0100
From: Klaus Nowotny <[email protected]>
Subject: st: heckprob using multiple imputation
Dear statalist users,
I want to estimate a probit model where y is a function of income and
other explanatory variables X. However, y is only observed for a subset
of observations where z==1, so I want to estimate a probit model with
sample selection:
heckprob y income X, select(z = income X W)
where W is a set of variables not related to y.
My problem is that income is unobserved for about 25% of all
observations (and about 24% of the observations where z==1), a problem I
want to solve using multiple imputation. Now the MI literature
recommends that all variables used in the subsequent analysis should
also be included in the imputation model, including the dependent
variable. But what if the dependent variable is not observed for the
full sample? Is it okay to impute (log-)income as:
mi impute regress income = X W z?
Or would I have to impute both income and y using, for example,
multivariate normal regression:
mi impute mvn y income = X W z?
Or, would it be better to jointly model the probability that z==1 &
income!=. as the selection step in the probit with sample selection:
gen v=(z==1 & income!=.)
heckprob y income X, select(v = income X W)?
Even if there is no correlation between the error terms in the selection
and outcome models (and my preliminary evidence suggests that this is
the case), if I would impute income just for those observations where
the dependent variable is observed, it would still be inefficient since
it does not use all of the available data.
Any help is greatly appreciated!
Klaus
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/
------------------------------
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/