Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Jarque-Bera test
From
Nick Cox <[email protected]>
To
[email protected]
Subject
Re: st: Jarque-Bera test
Date
Thu, 27 Sep 2012 14:17:40 +0100
Excellent. Many thanks for pushing this forward.
Nick
On Thu, Sep 27, 2012 at 2:00 PM, Maarten Buis <[email protected]> wrote:
> On Thu, Sep 27, 2012 at 3:44 AM, Nick Cox wrote:
>> The essence of the matter is that Jarque-Bera uses asymptotic results
>> regardless of sample size for a problem in which convergence to those
>> results is very slow. This approach is decades out of date and I am
>> surprised that StataCorp support the test without a warning. The
>> Doornik-Hansen test, for example, looks much more satisfactory.
>
> I took up this challenge and did a simulation comparing the
> performance of the Jarque-Bera test with the Doornik-Hansen test. In
> particular I focused on whether the p-value follow a uniform
> distribution, i.e. whether the nominal rejection rates correspond with
> the proportion of simulations in which the test was rejected at those
> nominal rates. In essence both tests perform badly at sample sizes of
> a 100 and a 1,000. As Nick suggested, the Jarque-Bera test's
> perfomance is more awful than the performance of the Doornik-Hansen
> test, but for both tests my conclusion would be that a 1,000
> observations is just not enough for either test. At 10,000 and 100,000
> observations both tests seem to perform acceptable. However, at such
> large sample sizes you need to worry about whether a rejection of the
> null-hypothesis actually represents a substantively meaningful
> deviation from the normal/Gaussian distribution.
>
> So the bottom line is: at small sample sizes graphs are the only
> reliable way of judging whether a variable comes from a
> normal/Gaussian distribution because tests just don't perform well
> enough. At large sample sizes graphs are still the only reliable way
> of judging whether a variable comes from a normal/Gaussian
> distribution because in large sample sizes tests will pick up
> substantively meaningless deviations from the null-hypothesis.
>
> *------------------- begin simulation -------------------
> clear all
>
> program define sim, rclass
> drop _all
> set obs `=1e5'
> gen x = rnormal()
> tempname jb jbp
> forvalues i = 2/5 {
> sum x in 1/`=1e`i'', detail
> scalar `jb' = (r(N)/6) * ///
> (r(skewness)^2 + 1/4*(r(kurtosis) - 3)^2)
> scalar `jbp' = chi2tail(2,`jb')
> return scalar jb`i' = `jb'
> return scalar jbp`i' = `jbp'
>
> mvtest norm x in 1/`=1e`i''
> return scalar dh`i' = r(chi2_dh)
> return scalar dhp`i' = r(p_dh)
>
> }
> end
>
> simulate jb2=r(jb2) jbp2=r(jbp2) ///
> jb3=r(jb3) jbp3=r(jbp3) ///
> jb4=r(jb4) jbp4=r(jbp4) ///
> jb5=r(jb5) jbp5=r(jbp5) ///
> dh2=r(dh2) dhp2=r(dhp2) ///
> dh3=r(dh3) dhp3=r(dhp3) ///
> dh4=r(dh4) dhp4=r(dhp4) ///
> dh5=r(dh5) dhp5=r(dhp5) ///
> , reps(2e4): sim
>
> rename jbp2 p2jb
> rename jbp3 p3jb
> rename jbp4 p4jb
> rename jbp5 p5jb
> rename dhp2 p2dh
> rename dhp3 p3dh
> rename dhp4 p4dh
> rename dhp5 p5dh
>
> gen id = _n
>
> reshape long p2 p3 p4 p5, i(id) j(dist) string
>
> label var p2 "N=100"
> label var p3 "N=1,000"
> label var p4 "N=10,000"
> label var p5 "N=100,000"
>
> encode dist, gen(distr)
> label define distr 2 "Jarque-Bera" ///
> 1 "Doornik-Hansen", replace
> label value distr distr
>
> simpplot p?, by(distr) scheme(s2color) legend(cols(4))
> *-------------------- end simulation --------------------
> (For more on examples I sent to the Statalist see:
> http://www.maartenbuis.nl/example_faq )
>
> This simulation requires the -simpplot- package available at SSC and
> described here: <http://www.maartenbuis.nl/software/simpplot.html>
>
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/