Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Polynomial Fitting and RD Design
From
Nick Cox <[email protected]>
To
[email protected]
Subject
Re: st: Polynomial Fitting and RD Design
Date
Thu, 1 Sep 2011 07:37:43 +0100
Even if you can get this to work as intended, look at the sizes of
those coefficients! The resultant curve may look about right, but this
is a dubious thing to do numerically and statistically. I can't
comment on the underlying scientific rationale for quartics here,
although I will guess wildly that there isn't one.
Nick
On Thu, Sep 1, 2011 at 3:59 AM, Austin Nichols <[email protected]> wrote:
> Patrick Button <[email protected]>:
> Try redefining your x so that the discontinuity is at zero.
>
> On Wed, Aug 31, 2011 at 9:54 PM, Patrick Button <[email protected]> wrote:
>> Hello Stata users,
>>
>> I've been getting some unexpected Stata output when fitting polynomials
>> using a pretty simple OLS regression.
>>
>> I am replicating a regression discontinuity design paper (Lee, Moretti and
>> Butler 2004). The paper is here:
>> http://emlab.berkeley.edu/~moretti/final.pdf Code and data are here:
>> http://emlab.berkeley.edu/~moretti/data3.html (I am using enricoall2.dta).
>>
>> I need to run a regression that fits a 4th degree polynomial separately
>> for points of the running variable, x, below 0.5 and above 0.5. The
>> regression includes a dummy variable for if x >= 0.5 or not as well. If
>> there is a discontinuity at 0.5, then this is picked up in the coefficient
>> on that dummy variable.
>>
>> In this case the running variable is the vote share that the Democratic
>> candidate got in U.S. House of Representatives elections, including just
>> the Democratic and Republican votes. So x < 0.5 means a Republican won,
>> and >= 0.5 means a Democrat won.
>>
>> I would like to pool the data instead of running a separate regression for
>> each side. This is one of the recommended methods in the RD literature.
>> For some reason this method does not appear in the authors' code so I need
>> to do it myself.
>>
>> I'm running and setting up the regression as follows:
>>
>> ***
>> gen x = demvoteshare
>>
>> gen D = 1 if x >=0.5
>> replace D = 0 if x < 0.5
>>
>> *Left Side Polynomial
>> gen xa = (1-D)*x
>> gen x2a = (1-D)*x^2
>> gen x3a = (1-D)*x^3
>> gen x4a = (1-D)*x^4
>>
>> *Right Side Polynomial
>> gen xb = D*x
>> gen x2b = D*x^2
>> gen x3b = D*x^3
>> gen x4b = D*x^4
>>
>> regress realincome D xa x2a x3a x4a xb x2b x3b x4b
>>
>> ***
>>
>> Based on what the authors of the paper got, graphical analysis, and logic,
>> there should be no jump in realincome at 0.5. There is no reason why
>> income should be suddenly much different for districts that democrats just
>> barely won or just barely lost. If it is, this invalidates the regression
>> discontinuity design. So the coefficient on D should be statistically
>> insignificant. However, I get the following results:
>>
>> ------------------------------------------------------------------------------
>> realincome | Coef. Std. Err. t P>|t| [95% Conf.
>> Interval]
>> -------------+----------------------------------------------------------------
>> D | 497414.5 94802.12 5.25 0.000 311589
>> 683240.1
>> xa | 34396.25 27783.67 1.24 0.216 -20063.66
>> 88856.17
>> x2a | -22571.61 234577.9 -0.10 0.923 -482377.5
>> 437234.3
>> x3a | -429659.3 655505.3 -0.66 0.512 -1714542
>> 855223.6
>> x4a | 667813.9 598416.4 1.12 0.264 -505166.7
>> 1840795
>> xb | -2805647 534665.3 -5.25 0.000 -3853667
>> -1757628
>> x2b | 5828381 1112850 5.24 0.000 3647038
>> 8009724
>> x3b | -5281210 1012800 -5.21 0.000 -7266441
>> -3295979
>> x4b | 1754682 339914.5 5.16 0.000 1088402
>> 2420963
>> _cons | 31536.64 501.1422 62.93 0.000 30554.33
>> 32518.95
>> ------------------------------------------------------------------------------
>>
>> I have no idea why D is statistically significant, and why only the
>> polynomial on the right side is statistically significant. This is not
>> just a problem with this regression. I get messed up results for every
>> regression I run that has a 4th degree polynomial on each side of 0.5.
>>
>> However, I do not get weird results like this when I use just one 4th
>> degree polynomial (one for the entire thing) with the D dummy.
>>
>> Does anyone know what I am doing wrong? I have no idea but I have a
>> feeling that i'm missing something obvious.
>>
>> Thank you very much for your time and consideration.
>>
>> --
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/