Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: Polynomial Fitting and RD Design
From
"Patrick Button" <[email protected]>
To
[email protected]
Subject
st: Polynomial Fitting and RD Design
Date
Wed, 31 Aug 2011 18:54:52 -0700
Hello Stata users,
I've been getting some unexpected Stata output when fitting polynomials
using a pretty simple OLS regression.
I am replicating a regression discontinuity design paper (Lee, Moretti and
Butler 2004). The paper is here:
http://emlab.berkeley.edu/~moretti/final.pdf Code and data are here:
http://emlab.berkeley.edu/~moretti/data3.html (I am using enricoall2.dta).
I need to run a regression that fits a 4th degree polynomial separately
for points of the running variable, x, below 0.5 and above 0.5. The
regression includes a dummy variable for if x >= 0.5 or not as well. If
there is a discontinuity at 0.5, then this is picked up in the coefficient
on that dummy variable.
In this case the running variable is the vote share that the Democratic
candidate got in U.S. House of Representatives elections, including just
the Democratic and Republican votes. So x < 0.5 means a Republican won,
and >= 0.5 means a Democrat won.
I would like to pool the data instead of running a separate regression for
each side. This is one of the recommended methods in the RD literature.
For some reason this method does not appear in the authors' code so I need
to do it myself.
I'm running and setting up the regression as follows:
***
gen x = demvoteshare
gen D = 1 if x >=0.5
replace D = 0 if x < 0.5
*Left Side Polynomial
gen xa = (1-D)*x
gen x2a = (1-D)*x^2
gen x3a = (1-D)*x^3
gen x4a = (1-D)*x^4
*Right Side Polynomial
gen xb = D*x
gen x2b = D*x^2
gen x3b = D*x^3
gen x4b = D*x^4
regress realincome D xa x2a x3a x4a xb x2b x3b x4b
***
Based on what the authors of the paper got, graphical analysis, and logic,
there should be no jump in realincome at 0.5. There is no reason why
income should be suddenly much different for districts that democrats just
barely won or just barely lost. If it is, this invalidates the regression
discontinuity design. So the coefficient on D should be statistically
insignificant. However, I get the following results:
------------------------------------------------------------------------------
realincome | Coef. Std. Err. t P>|t| [95% Conf.
Interval]
-------------+----------------------------------------------------------------
D | 497414.5 94802.12 5.25 0.000 311589
683240.1
xa | 34396.25 27783.67 1.24 0.216 -20063.66
88856.17
x2a | -22571.61 234577.9 -0.10 0.923 -482377.5
437234.3
x3a | -429659.3 655505.3 -0.66 0.512 -1714542
855223.6
x4a | 667813.9 598416.4 1.12 0.264 -505166.7
1840795
xb | -2805647 534665.3 -5.25 0.000 -3853667
-1757628
x2b | 5828381 1112850 5.24 0.000 3647038
8009724
x3b | -5281210 1012800 -5.21 0.000 -7266441
-3295979
x4b | 1754682 339914.5 5.16 0.000 1088402
2420963
_cons | 31536.64 501.1422 62.93 0.000 30554.33
32518.95
------------------------------------------------------------------------------
I have no idea why D is statistically significant, and why only the
polynomial on the right side is statistically significant. This is not
just a problem with this regression. I get messed up results for every
regression I run that has a 4th degree polynomial on each side of 0.5.
However, I do not get weird results like this when I use just one 4th
degree polynomial (one for the entire thing) with the D dummy.
Does anyone know what I am doing wrong? I have no idea but I have a
feeling that i'm missing something obvious.
Thank you very much for your time and consideration.
--
Patrick Button
Ph.D. Student
Department of Economics
University of California, Irvine
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/