Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: RD Optimal Bandwidth Algorithm sensitive to scaling?
From
Austin Nichols <[email protected]>
To
"[email protected]" <[email protected]>
Subject
Re: st: RD Optimal Bandwidth Algorithm sensitive to scaling?
Date
Fri, 15 Jul 2011 15:27:32 -0400
Mark-
I'm at the Stata conf in Chicago and not back at work until Tue but I
suspect this is related to nbr of obs--the fifth root of N plays a big
role and auto.dta has 74 obs
On Friday, July 15, 2011, Schaffer, Mark E <[email protected]> wrote:
> There is something peculiar going on here...
>
> When I try to replicate Chris' example but using the sample votex
> dataset Austin provides with -rd-, I get no sensitivity to scaling. But
> when I do it using the auto dataset as Chris does, I get the same
> sensitivity to scaling that he does. In fact, if price is rescaled by a
> factor of 1,000,000 instead of Chris' 1,000, -rd- exits with an
> "insufficient observations" error! Very curious....
>
> --Mark
>
> **************************************
>
> votex example:
>
> use votex, clear
> gen double LNE=lne/1000
> sum lne LNE d
> rd lne d, mbw(100)
> rd LNE d, mbw(100)
>
> Output:
>
>
> . sum lne LNE d
>
> Variable | Obs Mean Std. Dev. Min Max
> -------------+--------------------------------------------------------
> lne | 349 21.32478 .4329206 19.65047 23.1144
> LNE | 349 .0213248 .0004329 .0196505 .0231144
> d | 349 .0502933 .1604194 -.2756163 .4696784
>
> . rd lne d, mbw(100)
> Two variables specified; treatment is
> assumed to jump from zero to one at Z=0.
>
> Assignment variable Z is d
> Treatment variable X_T unspecified
> Outcome variable y is lne
>
> Estimating for bandwidth .29287775925349
> ------------------------------------------------------------------------
> ------
> lne | Coef. Std. Err. z P>|z| [95% Conf.
> Interval]
> -------------+----------------------------------------------------------
> ------
> lwald | -.0773955 .1056062 -0.73 0.464 -.28438
> .1295889
> ------------------------------------------------------------------------
> ------
>
> . rd LNE d, mbw(100)
> Two variables specified; treatment is
> assumed to jump from zero to one at Z=0.
>
> Assignment variable Z is d
> Treatment variable X_T unspecified
> Outcome variable y is LNE
>
> Estimating for bandwidth .2928777592534422
> ------------------------------------------------------------------------
> ------
> LNE | Coef. Std. Err. z P>|z| [95% Conf.
> Interval]
> -------------+----------------------------------------------------------
> ------
> lwald | -.0000774 .0001056 -0.73 0.464 -.0002844
> .0001296
>
>
> **************************************
>
> auto example:
>
> Code:
>
> sysuse auto, clear
> gen double Price = price/1000
> gen double PRICE = price/1000000
> gen double z = length - 193
> sum price Price PRICE z
> rd price z, mbw(100)
> rd Price z, mbw(100)
> rd PRICE z, mbw(100)
>
> Output:
>
> . sum price Price PRICE z
>
> Variable | Obs Mean Std. Dev. Min Max
> -------------+--------------------------------------------------------
> price | 74 6165.257 2949.496 3291 15906
> Price | 74 6.165257 2.949496 3.291 15.906
> PRICE | 74 .0061653 .0029495 .003291 .015906
> z | 74 -5.067568 22.26634 -51 40
>
> . rd price z, mbw(100)
> Two variables specified; treatment is
> assumed to jump from zero to one at Z=0.
>
> Assignment variable Z is z
> Treatment variable X_T unspecified
> Outcome variable y is price
>
> Estimating for bandwidth 24.98807626042474
> ------------------------------------------------------------------------
> ------
> price | Coef. Std. Err. z P>|z| [95% Conf.
> Interval]
> -------------+----------------------------------------------------------
> ------
> lwald | -5198.13 2230.786 -2.33 0.020 -9570.391
> -825.8697
> ------------------------------------------------------------------------
> ------
>
> . rd Price z, mbw(100)
> Two variables specified; treatment is
> assumed to jump from zero to one at Z=0.
>
> Assignment variable Z is z
> Treatment variable X_T unspecified
> Outcome variable y is Price
>
> Estimating for bandwidth 8.731619909031293
> ------------------------------------------------------------------------
> ------
> Price | Coef. Std. Err. z P>|z| [95% Conf.
> Interval]
> -------------+----------------------------------------------------------
> ------
> lwald | -7.547781 2.493275 -3.03 0.002 -12.43451
> -2.661051
> ------------------------------------------------------------------------
> ------
>
> . rd PRICE z, mbw(100)
> Two variables specified; treatment is
> assumed to jump from zero to one at Z=0.
>
> Assignment variable Z is z
> Treatment variable X_T unspecified
> Outcome variable y is PRICE
>
> insufficient observations
> r(2001);
>
> **************************************
>
>> -----Original Message-----
>> From: [email protected]
>> [mailto:[email protected]] On Behalf Of
>> Austin Nichols
>> Sent: 15 July 2011 15:23
>> To: [email protected]
>> Subject: Re: st: RD Optimal Bandwidth Algorithm sensitive to scaling?
>>
>> Chris--
>> I agree it is an undesirable "feature" of the optimal
>> bandwidth calculation, but some problem of this sort is
>> probably unavoidable--in this case it arises from estimating
>> local curvature using squared deviations of the outcome,
>> which is evidently sensitive to scale.
>> There are alternative approaches which would not face this
>> exact problem, but there would almost surely be other
>> problems, or other ways of breaking the estimator. The
>> sensitivity of bandwidth to scale is particularly
>> undesirable, but also serves to illustrate what I have said
>> elsewhere: bandwidth selection is more art than science, and
>> at a minimum you should assess the sensitivity of your
>> estimates to bandwidth, which is why graphs for multiple
>> bandwidths are produced by default in -rd-, and there is an
>> option -bdep- to assess the dependence graphically.
>>
>> On Fri, Jul 15, 2011 at 9:45 AM, Stata Chris
>> <[email protected]> wrote:
>> > Dear list members,
>> >
>> > I am using Austin Nichols' -rd-
>> > (http://ideas.repec.org/c/boc/bocode/s456888.html) command,
>> as well as
>> > the related -rdob- by Fuji-Imbens-Kalyanaraman-Fuji
>> > (http://www.economics.harvard.edu/faculty/imbens/software_imbens)
>> >
>> > Now I've discovered that the optimal bandwidth chosen and hence the
>> > resulting estimates are sensitive to the scaling of the
>> outcome variable.
>> > To demonstrate this, I make use of an example discussed in this
>> > context in an earlier post:
>> >
>> > sysuse auto, clear
>> > gen Price = price/1000
>> > gen z = length - 193
>> > rd price z
>> > rd Price z
>> >
>> >
>> > As you can check, when I use as outcome the price in 1000 dollars
>> > ("Price") rather than in dollars ("price"), I get a different
>> > bandwidth and hence a very different estimate, whereas I
>> think I would
>> > wish to get the previous estimate just divided by 1000.
>> >
>> > This does not seem a very desirable property to me, but I'm
>> not sure
>> > where in the optimal bandwidth algorithm (see
>> > http://www.nber.org/papers/w14726 ) this comes from and whether it
>> > would be possible to avoid this. Probably some of you can say more
>> > about this?
>> >
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/statalist/faq
>> * http://www.ats.ucla.edu/stat/stata/
>>
>
>
> --
> Heriot-Watt University is a Scottish charity
> registered under charity number SC000278.
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/