Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: RD Optimal Bandwidth Algorithm sensitive to scaling?
From
"Schaffer, Mark E" <[email protected]>
To
<[email protected]>
Subject
RE: st: RD Optimal Bandwidth Algorithm sensitive to scaling?
Date
Fri, 15 Jul 2011 19:54:34 +0100
There is something peculiar going on here...
When I try to replicate Chris' example but using the sample votex
dataset Austin provides with -rd-, I get no sensitivity to scaling. But
when I do it using the auto dataset as Chris does, I get the same
sensitivity to scaling that he does. In fact, if price is rescaled by a
factor of 1,000,000 instead of Chris' 1,000, -rd- exits with an
"insufficient observations" error! Very curious....
--Mark
**************************************
votex example:
use votex, clear
gen double LNE=lne/1000
sum lne LNE d
rd lne d, mbw(100)
rd LNE d, mbw(100)
Output:
. sum lne LNE d
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
lne | 349 21.32478 .4329206 19.65047 23.1144
LNE | 349 .0213248 .0004329 .0196505 .0231144
d | 349 .0502933 .1604194 -.2756163 .4696784
. rd lne d, mbw(100)
Two variables specified; treatment is
assumed to jump from zero to one at Z=0.
Assignment variable Z is d
Treatment variable X_T unspecified
Outcome variable y is lne
Estimating for bandwidth .29287775925349
------------------------------------------------------------------------
------
lne | Coef. Std. Err. z P>|z| [95% Conf.
Interval]
-------------+----------------------------------------------------------
------
lwald | -.0773955 .1056062 -0.73 0.464 -.28438
.1295889
------------------------------------------------------------------------
------
. rd LNE d, mbw(100)
Two variables specified; treatment is
assumed to jump from zero to one at Z=0.
Assignment variable Z is d
Treatment variable X_T unspecified
Outcome variable y is LNE
Estimating for bandwidth .2928777592534422
------------------------------------------------------------------------
------
LNE | Coef. Std. Err. z P>|z| [95% Conf.
Interval]
-------------+----------------------------------------------------------
------
lwald | -.0000774 .0001056 -0.73 0.464 -.0002844
.0001296
**************************************
auto example:
Code:
sysuse auto, clear
gen double Price = price/1000
gen double PRICE = price/1000000
gen double z = length - 193
sum price Price PRICE z
rd price z, mbw(100)
rd Price z, mbw(100)
rd PRICE z, mbw(100)
Output:
. sum price Price PRICE z
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
price | 74 6165.257 2949.496 3291 15906
Price | 74 6.165257 2.949496 3.291 15.906
PRICE | 74 .0061653 .0029495 .003291 .015906
z | 74 -5.067568 22.26634 -51 40
. rd price z, mbw(100)
Two variables specified; treatment is
assumed to jump from zero to one at Z=0.
Assignment variable Z is z
Treatment variable X_T unspecified
Outcome variable y is price
Estimating for bandwidth 24.98807626042474
------------------------------------------------------------------------
------
price | Coef. Std. Err. z P>|z| [95% Conf.
Interval]
-------------+----------------------------------------------------------
------
lwald | -5198.13 2230.786 -2.33 0.020 -9570.391
-825.8697
------------------------------------------------------------------------
------
. rd Price z, mbw(100)
Two variables specified; treatment is
assumed to jump from zero to one at Z=0.
Assignment variable Z is z
Treatment variable X_T unspecified
Outcome variable y is Price
Estimating for bandwidth 8.731619909031293
------------------------------------------------------------------------
------
Price | Coef. Std. Err. z P>|z| [95% Conf.
Interval]
-------------+----------------------------------------------------------
------
lwald | -7.547781 2.493275 -3.03 0.002 -12.43451
-2.661051
------------------------------------------------------------------------
------
. rd PRICE z, mbw(100)
Two variables specified; treatment is
assumed to jump from zero to one at Z=0.
Assignment variable Z is z
Treatment variable X_T unspecified
Outcome variable y is PRICE
insufficient observations
r(2001);
**************************************
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of
> Austin Nichols
> Sent: 15 July 2011 15:23
> To: [email protected]
> Subject: Re: st: RD Optimal Bandwidth Algorithm sensitive to scaling?
>
> Chris--
> I agree it is an undesirable "feature" of the optimal
> bandwidth calculation, but some problem of this sort is
> probably unavoidable--in this case it arises from estimating
> local curvature using squared deviations of the outcome,
> which is evidently sensitive to scale.
> There are alternative approaches which would not face this
> exact problem, but there would almost surely be other
> problems, or other ways of breaking the estimator. The
> sensitivity of bandwidth to scale is particularly
> undesirable, but also serves to illustrate what I have said
> elsewhere: bandwidth selection is more art than science, and
> at a minimum you should assess the sensitivity of your
> estimates to bandwidth, which is why graphs for multiple
> bandwidths are produced by default in -rd-, and there is an
> option -bdep- to assess the dependence graphically.
>
> On Fri, Jul 15, 2011 at 9:45 AM, Stata Chris
> <[email protected]> wrote:
> > Dear list members,
> >
> > I am using Austin Nichols' -rd-
> > (http://ideas.repec.org/c/boc/bocode/s456888.html) command,
> as well as
> > the related -rdob- by Fuji-Imbens-Kalyanaraman-Fuji
> > (http://www.economics.harvard.edu/faculty/imbens/software_imbens)
> >
> > Now I've discovered that the optimal bandwidth chosen and hence the
> > resulting estimates are sensitive to the scaling of the
> outcome variable.
> > To demonstrate this, I make use of an example discussed in this
> > context in an earlier post:
> >
> > sysuse auto, clear
> > gen Price = price/1000
> > gen z = length - 193
> > rd price z
> > rd Price z
> >
> >
> > As you can check, when I use as outcome the price in 1000 dollars
> > ("Price") rather than in dollars ("price"), I get a different
> > bandwidth and hence a very different estimate, whereas I
> think I would
> > wish to get the previous estimate just divided by 1000.
> >
> > This does not seem a very desirable property to me, but I'm
> not sure
> > where in the optimal bandwidth algorithm (see
> > http://www.nber.org/papers/w14726 ) this comes from and whether it
> > would be possible to avoid this. Probably some of you can say more
> > about this?
> >
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
--
Heriot-Watt University is a Scottish charity
registered under charity number SC000278.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/