Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: RD Optimal Bandwidth Algorithm sensitive to scaling?


From   "Schaffer, Mark E" <[email protected]>
To   <[email protected]>
Subject   RE: st: RD Optimal Bandwidth Algorithm sensitive to scaling?
Date   Fri, 15 Jul 2011 19:54:34 +0100

There is something peculiar going on here...

When I try to replicate Chris' example but using the sample votex
dataset Austin provides with -rd-, I get no sensitivity to scaling.  But
when I do it using the auto dataset as Chris does, I get the same
sensitivity to scaling that he does.  In fact, if price is rescaled by a
factor of 1,000,000 instead of Chris' 1,000, -rd- exits with an
"insufficient observations" error!  Very curious....

--Mark

**************************************

votex example:

use votex, clear
gen double LNE=lne/1000
sum lne LNE d
rd lne d, mbw(100)
rd LNE d, mbw(100)

Output:


. sum lne LNE d

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
         lne |       349    21.32478    .4329206   19.65047    23.1144
         LNE |       349    .0213248    .0004329   .0196505   .0231144
           d |       349    .0502933    .1604194  -.2756163   .4696784

. rd lne d, mbw(100)
Two variables specified; treatment is 
assumed to jump from zero to one at Z=0. 

 Assignment variable Z is d
 Treatment variable X_T unspecified
 Outcome variable y is lne

Estimating for bandwidth .29287775925349
------------------------------------------------------------------------
------
         lne |      Coef.   Std. Err.      z    P>|z|     [95% Conf.
Interval]
-------------+----------------------------------------------------------
------
       lwald |  -.0773955   .1056062    -0.73   0.464      -.28438
.1295889
------------------------------------------------------------------------
------

. rd LNE d, mbw(100)
Two variables specified; treatment is 
assumed to jump from zero to one at Z=0. 

 Assignment variable Z is d
 Treatment variable X_T unspecified
 Outcome variable y is LNE

Estimating for bandwidth .2928777592534422
------------------------------------------------------------------------
------
         LNE |      Coef.   Std. Err.      z    P>|z|     [95% Conf.
Interval]
-------------+----------------------------------------------------------
------
       lwald |  -.0000774   .0001056    -0.73   0.464    -.0002844
.0001296


**************************************

auto example: 

Code:

sysuse auto, clear
gen double Price = price/1000
gen double PRICE = price/1000000
gen double z = length - 193
sum price Price PRICE z
rd price z, mbw(100)
rd Price z, mbw(100)
rd PRICE z, mbw(100)

Output:

. sum price Price PRICE z

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
       price |        74    6165.257    2949.496       3291      15906
       Price |        74    6.165257    2.949496      3.291     15.906
       PRICE |        74    .0061653    .0029495    .003291    .015906
           z |        74   -5.067568    22.26634        -51         40

. rd price z, mbw(100)
Two variables specified; treatment is 
assumed to jump from zero to one at Z=0. 

 Assignment variable Z is z
 Treatment variable X_T unspecified
 Outcome variable y is price

Estimating for bandwidth 24.98807626042474
------------------------------------------------------------------------
------
       price |      Coef.   Std. Err.      z    P>|z|     [95% Conf.
Interval]
-------------+----------------------------------------------------------
------
       lwald |   -5198.13   2230.786    -2.33   0.020    -9570.391
-825.8697
------------------------------------------------------------------------
------

. rd Price z, mbw(100)
Two variables specified; treatment is 
assumed to jump from zero to one at Z=0. 

 Assignment variable Z is z
 Treatment variable X_T unspecified
 Outcome variable y is Price

Estimating for bandwidth 8.731619909031293
------------------------------------------------------------------------
------
       Price |      Coef.   Std. Err.      z    P>|z|     [95% Conf.
Interval]
-------------+----------------------------------------------------------
------
       lwald |  -7.547781   2.493275    -3.03   0.002    -12.43451
-2.661051
------------------------------------------------------------------------
------

. rd PRICE z, mbw(100)
Two variables specified; treatment is 
assumed to jump from zero to one at Z=0. 

 Assignment variable Z is z
 Treatment variable X_T unspecified
 Outcome variable y is PRICE

insufficient observations
r(2001);

**************************************

> -----Original Message-----
> From: [email protected] 
> [mailto:[email protected]] On Behalf Of 
> Austin Nichols
> Sent: 15 July 2011 15:23
> To: [email protected]
> Subject: Re: st: RD Optimal Bandwidth Algorithm sensitive to scaling?
> 
> Chris--
> I agree it is an undesirable "feature" of the optimal 
> bandwidth calculation, but some problem of this sort is 
> probably unavoidable--in this case it arises from estimating 
> local curvature using squared deviations of the outcome, 
> which is evidently sensitive to scale.
> There are alternative approaches which would not face this 
> exact problem, but there would almost surely be other 
> problems, or other ways of breaking the estimator.  The 
> sensitivity of bandwidth to scale is particularly 
> undesirable, but also serves to illustrate what I have said 
> elsewhere: bandwidth selection is more art than science, and 
> at a minimum you should assess the sensitivity of your 
> estimates to bandwidth, which is why graphs for multiple 
> bandwidths are produced by default in -rd-, and there is an 
> option -bdep- to assess the dependence graphically.
> 
> On Fri, Jul 15, 2011 at 9:45 AM, Stata Chris 
> <[email protected]> wrote:
> > Dear list members,
> >
> > I am using Austin Nichols' -rd-
> > (http://ideas.repec.org/c/boc/bocode/s456888.html) command, 
> as well as 
> > the related -rdob- by Fuji-Imbens-Kalyanaraman-Fuji
> > (http://www.economics.harvard.edu/faculty/imbens/software_imbens)
> >
> > Now I've discovered that the optimal bandwidth chosen and hence the 
> > resulting estimates are sensitive to the scaling of the 
> outcome variable.
> > To demonstrate this, I make use of an example discussed in this 
> > context in an earlier post:
> >
> > sysuse auto, clear
> > gen Price = price/1000
> > gen z = length - 193
> > rd price z
> > rd Price z
> >
> >
> > As you can check, when I use as outcome the price in 1000 dollars
> > ("Price") rather than in dollars ("price"), I get a different 
> > bandwidth and hence a very different estimate, whereas I 
> think I would 
> > wish to get the previous estimate just divided by 1000.
> >
> > This does not seem a very desirable property to me, but I'm 
> not sure 
> > where in the optimal bandwidth algorithm (see
> > http://www.nber.org/papers/w14726 ) this comes from and whether it 
> > would be possible to avoid this. Probably some of you can say more 
> > about this?
> >
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
> 


-- 
Heriot-Watt University is a Scottish charity
registered under charity number SC000278.


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index