Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | "Schaffer, Mark E" <M.E.Schaffer@hw.ac.uk> |
To | "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |
Subject | st: RE: Too high R2, when interacting endogenous regressor in -ivreg2-? |
Date | Tue, 10 Dec 2013 19:18:21 +0000 |
JZ, > -----Original Message----- > From: owner-statalist@hsphsun2.harvard.edu [mailto:owner- > statalist@hsphsun2.harvard.edu] On Behalf Of Jen Zhen > Sent: 10 December 2013 16:36 > To: statalist@hsphsun2.harvard.edu > Subject: st: Too high R2, when interacting endogenous regressor in -ivreg2-? > > Dear Statalist members, > > after running an -ivreg2- estimation, I wanted to test formally whether results > differ between 2 subsamples defined by the exogenous dummy "ex". I have > followed the procedure explained by Kit Baum in the earlier post by Jana von > Stein, Kit Baum and Vassilis Monastiriotis > http://www.stata.com/statalist/archive/2012-05/msg01165.html > and estimated the following equation (where I've added a long list of further > exogenous controls, excontrols): > > ivreg2 y ex excontrols (en en_ex = z z_ex) > > This largely seems to work. I obtain two first stage equation outputs for the > outcomes en and en_ex respectively, each with both z and z_ex amongst the set > of regressors, plus ex and excontrols. > > What troubles me though is that for the second first-stage regression, that for > en_ex, I get an R2 of 0.998. Despite having a long list of controls and good data > quality, that makes me wonder whether something is wrong here or how I > could explain this high R2? You need to tell us more about your data and setup. Are you using time-series data? You can sometimes get very high R2s in a time-series setting and nothing is actually wrong. But if you are using cross-section data, then you are probably right to be worried. FWIW, my first guess would be that you have a huge outlier. In 2-D space, a scatterplot would have all but one datapoint bunched closely together, and the outlier is way off in the distance somewhere. The regression line is basically connecting the outlier to the rest of the datapoints (and connecting them almost perfectly in terms of squared residuals, though of course this is an illusion). HTH, Mark > > Thank you so much and kind regards, > JZ > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ ----- Sunday Times Scottish University of the Year 2011-2013 Top in the UK for student experience Fourth university in the UK and top in Scotland (National Student Survey 2012) We invite research leaders and ambitious early career researchers to join us in leading and driving research in key inter-disciplinary themes. Please see www.hw.ac.uk/researchleaders for further information and how to apply. Heriot-Watt University is a Scottish charity registered under charity number SC000278. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/