| |
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
st: Re: Re: 2sls with multiply-imputed data sets
The idea sounds good, but I didn't check the code. R
----- Original Message -----
From: "Viola Angelini" <[email protected]>
To: <[email protected]>
Sent: Tuesday, May 29, 2007 3:57 AM
Subject: st: Re: 2sls with multiply-imputed data sets
Thanks Rodrigo!
As regards the Hausman test, the only solution I can think of is to use
the regression-based form of the test where I combine the results of the
regression only at the second stage.
Does it make any sense or would it be better to just have 5 different
Hausman tests?
Suppose that the multiply-imputed dataset is stored in 5 separate files:
mydata1.dta, mydata2.dta, mydata3.dta, mydata4.dta, mydata5.dta
The Stata code would be the following:
forvalues i=1(1)5 {
use mydata`i'.dta
regress y2 z1 z2 x1 x2
predict res if e(sample), resid
save, replace
}
clear
set memory 500m
mimstack, m(5) so("id") nomj0 istub(mydata)
mim: regress y1 y2 x1 x2 res, cluster(sampid2)
[y2 is the endogenous variable and z1, z2 are the excluded instruments]
Best
Viola
-------------------------------------------------------------------
From "Rodrigo A. Alfaro" <[email protected]>
To <[email protected]>
Subject st: Re: 2sls with multiply-imputed data sets
Date Fri, 25 May 2007 18:23:15 -0400
We had a similar discussion on the list this week. In that case, the topic
was the R2 for Multiple Imputation (MI). Maarten proposed (for R2 case) to
compute the geometric average instead of arithmetic one, based on Donald
Rubin's reply somewhere else. Hansen J test is asymptotically distributed
as chi-square, maybe a similar suggestion applies for your case.
My own suggestion for the R2 was to report your simple average and write a
small note with the min/max R2 along your regressions. In your case, I
suggest to analyze more in deep the figures for Hansen J tests and the
p-values associated with these. I think that is perfectly OK to have
pvalues of 0.01 0.008, etc. (similar magnitud)... and I don't expect to
see very different values for Hansen J test as well. If so... then you
have problems with the model and/or the method of MI.
All this works if your # of missing over the total observations is few and
if you imputed all the variables (including the variables used in the
first step) at once. Finally, MI methods are based on simulations then in
practice I generate more than 5 datasets and play with some combinations
of 5 datasets (2nd to 6th, etc) and with more datasets (8, 10 or 12) to
see if the results change.
R
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/