Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Jarque-Bera test
From
Maarten Buis <[email protected]>
To
[email protected]
Subject
Re: st: Jarque-Bera test
Date
Thu, 27 Sep 2012 15:00:57 +0200
On Thu, Sep 27, 2012 at 3:44 AM, Nick Cox wrote:
> The essence of the matter is that Jarque-Bera uses asymptotic results
> regardless of sample size for a problem in which convergence to those
> results is very slow. This approach is decades out of date and I am
> surprised that StataCorp support the test without a warning. The
> Doornik-Hansen test, for example, looks much more satisfactory.
I took up this challenge and did a simulation comparing the
performance of the Jarque-Bera test with the Doornik-Hansen test. In
particular I focused on whether the p-value follow a uniform
distribution, i.e. whether the nominal rejection rates correspond with
the proportion of simulations in which the test was rejected at those
nominal rates. In essence both tests perform badly at sample sizes of
a 100 and a 1,000. As Nick suggested, the Jarque-Bera test's
perfomance is more awful than the performance of the Doornik-Hansen
test, but for both tests my conclusion would be that a 1,000
observations is just not enough for either test. At 10,000 and 100,000
observations both tests seem to perform acceptable. However, at such
large sample sizes you need to worry about whether a rejection of the
null-hypothesis actually represents a substantively meaningful
deviation from the normal/Gaussian distribution.
So the bottom line is: at small sample sizes graphs are the only
reliable way of judging whether a variable comes from a
normal/Gaussian distribution because tests just don't perform well
enough. At large sample sizes graphs are still the only reliable way
of judging whether a variable comes from a normal/Gaussian
distribution because in large sample sizes tests will pick up
substantively meaningless deviations from the null-hypothesis.
*------------------- begin simulation -------------------
clear all
program define sim, rclass
drop _all
set obs `=1e5'
gen x = rnormal()
tempname jb jbp
forvalues i = 2/5 {
sum x in 1/`=1e`i'', detail
scalar `jb' = (r(N)/6) * ///
(r(skewness)^2 + 1/4*(r(kurtosis) - 3)^2)
scalar `jbp' = chi2tail(2,`jb')
return scalar jb`i' = `jb'
return scalar jbp`i' = `jbp'
mvtest norm x in 1/`=1e`i''
return scalar dh`i' = r(chi2_dh)
return scalar dhp`i' = r(p_dh)
}
end
simulate jb2=r(jb2) jbp2=r(jbp2) ///
jb3=r(jb3) jbp3=r(jbp3) ///
jb4=r(jb4) jbp4=r(jbp4) ///
jb5=r(jb5) jbp5=r(jbp5) ///
dh2=r(dh2) dhp2=r(dhp2) ///
dh3=r(dh3) dhp3=r(dhp3) ///
dh4=r(dh4) dhp4=r(dhp4) ///
dh5=r(dh5) dhp5=r(dhp5) ///
, reps(2e4): sim
rename jbp2 p2jb
rename jbp3 p3jb
rename jbp4 p4jb
rename jbp5 p5jb
rename dhp2 p2dh
rename dhp3 p3dh
rename dhp4 p4dh
rename dhp5 p5dh
gen id = _n
reshape long p2 p3 p4 p5, i(id) j(dist) string
label var p2 "N=100"
label var p3 "N=1,000"
label var p4 "N=10,000"
label var p5 "N=100,000"
encode dist, gen(distr)
label define distr 2 "Jarque-Bera" ///
1 "Doornik-Hansen", replace
label value distr distr
simpplot p?, by(distr) scheme(s2color) legend(cols(4))
*-------------------- end simulation --------------------
(For more on examples I sent to the Statalist see:
http://www.maartenbuis.nl/example_faq )
This simulation requires the -simpplot- package available at SSC and
described here: <http://www.maartenbuis.nl/software/simpplot.html>
-- Maarten
---------------------------------
Maarten L. Buis
WZB
Reichpietschufer 50
10785 Berlin
Germany
http://www.maartenbuis.nl
---------------------------------
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/