Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | "Schaffer, Mark E" <M.E.Schaffer@hw.ac.uk> |
To | <statalist@hsphsun2.harvard.edu> |
Subject | RE: st: RD Optimal Bandwidth Algorithm sensitive to scaling? |
Date | Fri, 15 Jul 2011 19:54:34 +0100 |
There is something peculiar going on here... When I try to replicate Chris' example but using the sample votex dataset Austin provides with -rd-, I get no sensitivity to scaling. But when I do it using the auto dataset as Chris does, I get the same sensitivity to scaling that he does. In fact, if price is rescaled by a factor of 1,000,000 instead of Chris' 1,000, -rd- exits with an "insufficient observations" error! Very curious.... --Mark ************************************** votex example: use votex, clear gen double LNE=lne/1000 sum lne LNE d rd lne d, mbw(100) rd LNE d, mbw(100) Output: . sum lne LNE d Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- lne | 349 21.32478 .4329206 19.65047 23.1144 LNE | 349 .0213248 .0004329 .0196505 .0231144 d | 349 .0502933 .1604194 -.2756163 .4696784 . rd lne d, mbw(100) Two variables specified; treatment is assumed to jump from zero to one at Z=0. Assignment variable Z is d Treatment variable X_T unspecified Outcome variable y is lne Estimating for bandwidth .29287775925349 ------------------------------------------------------------------------ ------ lne | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------- ------ lwald | -.0773955 .1056062 -0.73 0.464 -.28438 .1295889 ------------------------------------------------------------------------ ------ . rd LNE d, mbw(100) Two variables specified; treatment is assumed to jump from zero to one at Z=0. Assignment variable Z is d Treatment variable X_T unspecified Outcome variable y is LNE Estimating for bandwidth .2928777592534422 ------------------------------------------------------------------------ ------ LNE | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------- ------ lwald | -.0000774 .0001056 -0.73 0.464 -.0002844 .0001296 ************************************** auto example: Code: sysuse auto, clear gen double Price = price/1000 gen double PRICE = price/1000000 gen double z = length - 193 sum price Price PRICE z rd price z, mbw(100) rd Price z, mbw(100) rd PRICE z, mbw(100) Output: . sum price Price PRICE z Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- price | 74 6165.257 2949.496 3291 15906 Price | 74 6.165257 2.949496 3.291 15.906 PRICE | 74 .0061653 .0029495 .003291 .015906 z | 74 -5.067568 22.26634 -51 40 . rd price z, mbw(100) Two variables specified; treatment is assumed to jump from zero to one at Z=0. Assignment variable Z is z Treatment variable X_T unspecified Outcome variable y is price Estimating for bandwidth 24.98807626042474 ------------------------------------------------------------------------ ------ price | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------- ------ lwald | -5198.13 2230.786 -2.33 0.020 -9570.391 -825.8697 ------------------------------------------------------------------------ ------ . rd Price z, mbw(100) Two variables specified; treatment is assumed to jump from zero to one at Z=0. Assignment variable Z is z Treatment variable X_T unspecified Outcome variable y is Price Estimating for bandwidth 8.731619909031293 ------------------------------------------------------------------------ ------ Price | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------- ------ lwald | -7.547781 2.493275 -3.03 0.002 -12.43451 -2.661051 ------------------------------------------------------------------------ ------ . rd PRICE z, mbw(100) Two variables specified; treatment is assumed to jump from zero to one at Z=0. Assignment variable Z is z Treatment variable X_T unspecified Outcome variable y is PRICE insufficient observations r(2001); ************************************** > -----Original Message----- > From: owner-statalist@hsphsun2.harvard.edu > [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of > Austin Nichols > Sent: 15 July 2011 15:23 > To: statalist@hsphsun2.harvard.edu > Subject: Re: st: RD Optimal Bandwidth Algorithm sensitive to scaling? > > Chris-- > I agree it is an undesirable "feature" of the optimal > bandwidth calculation, but some problem of this sort is > probably unavoidable--in this case it arises from estimating > local curvature using squared deviations of the outcome, > which is evidently sensitive to scale. > There are alternative approaches which would not face this > exact problem, but there would almost surely be other > problems, or other ways of breaking the estimator. The > sensitivity of bandwidth to scale is particularly > undesirable, but also serves to illustrate what I have said > elsewhere: bandwidth selection is more art than science, and > at a minimum you should assess the sensitivity of your > estimates to bandwidth, which is why graphs for multiple > bandwidths are produced by default in -rd-, and there is an > option -bdep- to assess the dependence graphically. > > On Fri, Jul 15, 2011 at 9:45 AM, Stata Chris > <statachris@gmail.com> wrote: > > Dear list members, > > > > I am using Austin Nichols' -rd- > > (http://ideas.repec.org/c/boc/bocode/s456888.html) command, > as well as > > the related -rdob- by Fuji-Imbens-Kalyanaraman-Fuji > > (http://www.economics.harvard.edu/faculty/imbens/software_imbens) > > > > Now I've discovered that the optimal bandwidth chosen and hence the > > resulting estimates are sensitive to the scaling of the > outcome variable. > > To demonstrate this, I make use of an example discussed in this > > context in an earlier post: > > > > sysuse auto, clear > > gen Price = price/1000 > > gen z = length - 193 > > rd price z > > rd Price z > > > > > > As you can check, when I use as outcome the price in 1000 dollars > > ("Price") rather than in dollars ("price"), I get a different > > bandwidth and hence a very different estimate, whereas I > think I would > > wish to get the previous estimate just divided by 1000. > > > > This does not seem a very desirable property to me, but I'm > not sure > > where in the optimal bandwidth algorithm (see > > http://www.nber.org/papers/w14726 ) this comes from and whether it > > would be possible to avoid this. Probably some of you can say more > > about this? > > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > -- Heriot-Watt University is a Scottish charity registered under charity number SC000278. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/