Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Re: st: Polynomial Fitting and RD Design
From
"Patrick Button" <[email protected]>
To
[email protected]
Subject
Re: Re: st: Polynomial Fitting and RD Design
Date
Mon, 5 Sep 2011 12:40:50 -0700
Thank you for the feedback everyone. It has been extremely useful and now
I am not freaking out as much.
First, i've changed x to x - 0.5 as per Austin Nichols' suggestion. This
makes interpretation easier. I should have done this earlier.
I was thinking that my replication was going to involve critique Nick Cox,
and I agree with you and others that the 4th order polynomials are
somewhat fishy.
The weird thing about the paper is that the authors say that they are
using 4th degree polynomials on either side of the discontinuity, but
their graphs and/or code indicate that they are just using one polynomial
to fit the entire thing. Not sure why that is... So in trying to do the
4th degree polynomial for each side on my own, i?ve run into this issue of
results being weird. Now that I understand why it makes perfect sense.
As for if the 4th degree polynomial is ideal, I would agree with all of
you that it probably is not. If one is going to go with polynomials, the
ideal degree depends on the bandwidth you use. Ariel Linden described this
really well earlier.
Larger bandwidths mean more precision, but more bias. Smaller bandwidths
(say only using data within +/- 2 percentage points of 50%) lead to the
opposite. Lee and Lemieux (2010)
(http://faculty.arts.ubc.ca/tlemieux/papers/RD_JEL.pdf) discuss that the
optimal polynomial degree is a function of the bandwidth.
The ideal degree is determined by the Akaike Information Criterion (AIC).
I'm going to stick with the 4th degree polynomial (and the entire
dataset), then i'll try other polynomials and bandwidths, and then kernel
after that. I need to do the replication first, THEN I will critique that
by going with something more realistic. The -rd- package should be really
useful for that. Thanks so much for all the discussion about a more
realistic model. The key thing is that results should be robust to several
different types of fitting and bandwidths, so long as they are realistic
in the first place.
As for using orthog/orthpoly to generate orthogonal polynomials, I gave
that a shot. Thank you very much for the suggestion Martin Buis.
I've done the orthogonalization two different ways. Both give different
results, neither of which mirror the results where I create the
polynomials in the regular fashion. I'm not sure which method is
"correct". I'm also unsure why the results are significantly different.
Any suggestions would be very helpful.
Orthpoly # 1 uses orthpoly separately on each side of the discontinuity. #
2 does it for all the data.
The code and output are below:
*****
drop if demvoteshare==.
keep if realincome~=.
drop demvs2 demvs3 demvs4
gen double x = demvoteshare - 0.5
gen D = 1 if x >= 0
replace D = 0 if x < 0
*Orthpoly #1
*Creating orthogonal polynomials separately for each side.
orthpoly x if x < 0, deg(4) generate(demvsa demvs2a demvs3a demvs4a)
orthpoly x if x >= 0, deg(4) generate(demvsb demvs2b demvs3b demvs4b)
replace demvsa = 0 if demvsa==.
replace demvsb = 0 if demvsb==.
replace demvs2a = 0 if demvs2a==.
replace demvs2b = 0 if demvs2b==.
replace demvs3a = 0 if demvs3a==.
replace demvs3b = 0 if demvs3b==.
replace demvs4a = 0 if demvs4a==.
replace demvs4b = 0 if demvs4b==.
replace demvsa = (1-D)*demvsa
replace demvs2a = (1-D)*demvs2a
replace demvs3a = (1-D)*demvs3a
replace demvs4a = (1-D)*demvs4a
replace demvsb = D*demvsb
replace demvs2b = D*demvs2b
replace demvs3b = D*demvs3b
replace demvs4b = D*demvs4b
regress realincome D demvsa demvs2a demvs3a demvs4a demvsb demvs2b demvs3b
demvs4b
*Orthpoly #2
orthpoly x, deg(4) generate (demvs demvs2 demvs3 demvs4)
replace demvsa = (1-D)*demvs
replace demvs2a = (1-D)*demvs2
replace demvs3a = (1-D)*demvs3
replace demvs4a = (1-D)*demvs4
replace demvsb = D*demvs
replace demvs2b = D*demvs2
replace demvs3b = D*demvs3
replace demvs4b = D*demvs4
regress realincome D demvsa demvs2a demvs3a demvs4a demvsb demvs2b demvs3b
demvs4b
*****
And the results are:
Orthpoly # 1
------------------------------------------------------------------------------
realincome | Coef. Std. Err. t P>|t| [95% Conf.
Interval]
-------------+----------------------------------------------------------------
D | -2597.064 140.5829 -18.47 0.000 -2872.626
-2321.502
demvsa | -853.4396 109.0927 -7.82 0.000 -1067.277
-639.6025
demvs2a | -941.1276 109.0927 -8.63 0.000 -1154.965
-727.2905
demvs3a | 593.9881 109.0927 5.44 0.000 380.151
807.8252
demvs4a | 121.7433 109.0927 1.12 0.264 -92.09384
335.5804
demvsb | -2006.552 88.66978 -22.63 0.000 -2180.357
-1832.747
demvs2b | -620.1632 88.66978 -6.99 0.000 -793.9685
-446.3579
demvs3b | -134.2237 88.66978 -1.51 0.130 -308.029
39.58156
demvs4b | 457.7355 88.66978 5.16 0.000 283.9302
631.5407
_cons | 32210.1 109.0927 295.25 0.000 31996.26
32423.93
------------------------------------------------------------------------------
Orthpoly # 2
------------------------------------------------------------------------------
realincome | Coef. Std. Err. t P>|t| [95% Conf.
Interval]
-------------+----------------------------------------------------------------
D | -15904.18 22026.78 -0.72 0.470 -59079.79
27271.42
demvsa | 56141.35 33816.59 1.66 0.097 -10143.95
122426.6
demvs2a | 42328.68 25413.63 1.67 0.096 -7485.616
92142.98
demvs3a | 19367.81 11950.96 1.62 0.105 -4057.754
42793.37
demvs4a | 3038.492 2722.757 1.12 0.264 -2298.496
8375.481
demvsb | -40636.36 7469.378 -5.44 0.000 -55277.4
-25995.32
demvs2b | 47190.86 9181.907 5.14 0.000 29193.03
65188.7
demvs3b | -33596.74 6331.021 -5.31 0.000 -46006.43
-21187.04
demvs4b | 7983.823 1546.578 5.16 0.000 4952.31
11015.33
_cons | 68128.44 21623.63 3.15 0.002 25743.08
110513.8
------------------------------------------------------------------------------
The results using the earlier method (generating polynomials normally)
gives the following after I change x to x - 0.5:
------------------------------------------------------------------------------
realincome | Coef. Std. Err. t P>|t| [95% Conf.
Interval]
-------------+----------------------------------------------------------------
D | 1616.347 605.9781 2.67 0.008 428.5441
2804.149
xa | 23487.01 15519.78 1.51 0.130 -6933.964
53907.98
x2a | 334659.2 153845.2 2.18 0.030 33100.93
636217.5
x3a | 905964.7 546408 1.66 0.097 -165072.1
1977001
x4a | 667809.6 598416.3 1.12 0.264 -505170.9
1840790
xb | -60833.88 12050.57 -5.05 0.000 -84454.71
-37213.06
x2b | 538597.3 105340.2 5.11 0.000 332115.5
745079
x3b | -1771874 334373.4 -5.30 0.000 -2427293
-1116455
x4b | 1754710 339912 5.16 0.000 1088435
2420986
_cons | 31122.81 454.4263 68.49 0.000 30232.07
32013.55
------------------------------------------------------------------------------
Any ideas would be great and I greatly appreciate everyone's assistance.
--
Patrick Button
Ph.D. Student
Department of Economics
University of California, Irvine
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/