Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: Too high R2, when interacting endogenous regressor in -ivreg2-?


From   "Schaffer, Mark E" <[email protected]>
To   "[email protected]" <[email protected]>
Subject   st: RE: Too high R2, when interacting endogenous regressor in -ivreg2-?
Date   Tue, 10 Dec 2013 19:18:21 +0000

JZ,

> -----Original Message-----
> From: [email protected] [mailto:owner-
> [email protected]] On Behalf Of Jen Zhen
> Sent: 10 December 2013 16:36
> To: [email protected]
> Subject: st: Too high R2, when interacting endogenous regressor in -ivreg2-?
> 
> Dear Statalist members,
> 
> after running an -ivreg2- estimation, I wanted to test formally whether results
> differ between 2 subsamples defined by the exogenous dummy "ex". I have
> followed the procedure explained by Kit Baum in the earlier post by Jana von
> Stein, Kit Baum and Vassilis Monastiriotis
> http://www.stata.com/statalist/archive/2012-05/msg01165.html
> and estimated the following equation (where I've added a long list of further
> exogenous controls, excontrols):
> 
>  ivreg2 y ex excontrols (en en_ex = z z_ex)
> 
> This largely seems to work. I obtain two first stage equation outputs for the
> outcomes en and en_ex respectively, each with both z and z_ex amongst the set
> of regressors, plus ex and excontrols.
> 
> What troubles me though is that for the second first-stage regression, that for
> en_ex, I get an R2 of 0.998. Despite having a long list of controls and good data
> quality, that makes me wonder whether something is wrong here or how I
> could explain this high R2?

You need to tell us more about your data and setup.  Are you using time-series data?  You can sometimes get very high R2s in a time-series setting and nothing is actually wrong.

But if you are using cross-section data, then you are probably right to be worried.  FWIW, my first guess would be that you have a huge outlier.  In 2-D space, a scatterplot would have all but one datapoint bunched closely together, and the outlier is way off in the distance somewhere.  The regression line is basically connecting the outlier to the rest of the datapoints (and connecting them almost perfectly in terms of squared residuals, though of course this is an illusion).

HTH,
Mark

> 
> Thank you so much and kind regards,
> JZ
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/


----- 
Sunday Times Scottish University of the Year 2011-2013
Top in the UK for student experience
Fourth university in the UK and top in Scotland (National Student Survey 2012)

We invite research leaders and ambitious early career researchers to 
join us in leading and driving research in key inter-disciplinary themes. 
Please see www.hw.ac.uk/researchleaders for further information and how
to apply.

Heriot-Watt University is a Scottish charity
registered under charity number SC000278.


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index