Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Local Linear Regression for Regression Discontinuity Designs
From
Austin Nichols <[email protected]>
To
[email protected]
Subject
Re: st: Local Linear Regression for Regression Discontinuity Designs
Date
Mon, 23 May 2011 11:00:25 -0400
Alex Olssen <[email protected]>:
You are right that I removed the kernel option from the new -rd- dated 20 March
(update renamed the previous version to rd_obs and defined a new rd command)
and introduced a bug.
The older version rd_obs still works as expected, and allows a
rectangular kernel.
The bug introduced in the last update is that treatment is defined to
be I(Z>0) instead of I(Z>=0).
A new ado file dated today has been submitted to SSC; compare:
sysuse auto, clear
ren price y
gen x = length - 193
gen z = (x >= 0)
gen zs = (x > 0)
gen z_x = z*x
gen xlow=x*(1-z)
gen xhigh=x*z
lpoly y x if x<0, deg(1) ker(tri) bw(10) gen(L2) at(x) nogr
lpoly y x if x>=0, deg(1) ker(tri) bw(10) gen(R2) at(x) nogr
gen diff2 = R2 - L2
su diff2 if x == 0
g kwt=max(0,10-abs(x))
reg y z x z_x [pw=kwt]
reg y z xlow xhigh [pw=kwt]
reg y zs xlow xhigh [pw=kwt]
rd y x, bwidth(10)
On Mon, May 23, 2011 at 7:40 AM, Alex Olssen <[email protected]> wrote:
> Dear Andreas,
>
> Estimation of the local linear regression model can be implemented by
> OLS (restricting the subset of observations appropriately) IF you are
> using the rectangular kernel. However Austin Nichol's latest version
> of -rd- only allows estimation based on the triangular kernel - which
> is optimal for boundary estimation - see the references in Imbens and
> Lemieux 2009.
>
> As an aside, it would have been tremendously helpful if you had posted
> some example code with your question.
>
> I compared OLS with dummies, lpoly, and an older version of Austin
> Nichol's -rd- and got the same result in each case (all used the
> rectangular kernel)
> I tried again tonight but even after using the triangular kernel I
> couldn't quite get the results from manual -lpoly- to match those of
> Austin Nichol's -rd-
>
> I present an example using the auto dataset - just to show the code.
>
> * test rd
> sysuse auto, clear
> ren price y
> gen x = length - 193
> gen z = (x >= 0)
> gen z_x = z*x
>
> reg y x if x > -10 & x < 0
> reg y x if x >= 0 & x < 10
> reg y x z z_x if x > -10 & x < 10
> * OLS with dummies produces the same result as
> * OLS on either side when the same bandwidths are used
>
> lpoly y x if x < 0, deg(1) ker(rec) bwidth(10) gen(L) at(x) nogr
> lpoly y x if x >= 0, deg(1) ker(rec) bwidth(10) gen(R) at(x) nogr
> gen diff = R - L
> su diff if x == 0
> * OLS with dummies produces the same result as
> * local linear regression when the rectangular kernel is used
>
> * note Austin Nichol's rd only allows use of the traingle kernel
> * which is boundary optimal - see references in Imbens and Lemieux 2009
> lpoly y x if x < 0, deg(1) ker(tri) bwidth(10) gen(L2) at(x) nogr
> lpoly y x if x >= 0, deg(1) ker(tri) bwidth(10) gen(R2) at(x) nogr
> gen diff2 = R2 - L2
> su diff2 if x == 0
> rd y x, deg(1) bwidth(10)
>
> Perhaps Austin could comment on the difference? I expect I have made
> an oversight somewhere.
>
> Kind regards,
>
> Alex
>
> On 23 May 2011 01:20, andreas nordset <[email protected]> wrote:
>> Dear Statalist members,
>>
>> in a context in which individuals are eligible for a treatment if and
>> only if they are aged above 50, I would like to implement a Regression
>> Discontinuity Design to estimate the effect of the treatment on
>> several outcomes, i.e. the difference between the average outcome just
>> above the threshold and the average outcome just below the threshold,
>> where these averages must be estimated.
>>
>> My impression is that the standard way of doing this is to use "Local
>> Linear Regression".
>>
>> My understanding is that I can hence obtain the Reduced-Form effect by
>> simply estimating: -reg outcome D50 age D50_age if
>> inrange(age,50-h,50+h)-
>> where D50 is a dummy for being aged above 50, D50_Age is the
>> interaction of that dummy with age, and h is the bandwidth.
>> Equivalently, I would obtain the Wald estimates with: -ivreg2 outcome
>> age D50_age (treatment=D50) if inrange(age,50-h,50+h)-.
>> Put differently, my understanding of "Local Linear Regression" is to
>> estimate simple linear OLS regressions, but a separate line on each
>> side and only "locally", i.e. using only observations from the
>> interval (50-h,50+h).
>>
>> Yet when I do so, I obtain estimates that differ from those obtained
>> using Austin Nichol's -rd- command that apparently uses the -lpoly-
>> command for local linear regression. Does that mean that my
>> understanding of LLR is incorrect, maybe because some more
>> sophisticated weighting of observations is needed? In your view, is
>> such a more sophisticated procedure needed, and if so what would be
>> the problems with my very simple procedure?
>>
>> Thank you so much for your advice and best regards!
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/