Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Jarque-Bera test

From	Nick Cox <[email protected]>
To	[email protected]
Subject	Re: st: Jarque-Bera test
Date	Thu, 27 Sep 2012 14:17:40 +0100

Excellent. Many thanks for pushing this forward.

Nick

On Thu, Sep 27, 2012 at 2:00 PM, Maarten Buis <[email protected]> wrote:
> On Thu, Sep 27, 2012 at 3:44 AM, Nick Cox  wrote:
>> The essence of the matter is that Jarque-Bera uses asymptotic results
>> regardless of sample size for a problem in which convergence to those
>> results is very slow. This approach is decades out of date and I am
>> surprised that StataCorp support the test without a warning. The
>> Doornik-Hansen test, for example, looks much more satisfactory.
>
> I took up this challenge and did a simulation comparing the
> performance of the Jarque-Bera test with the Doornik-Hansen test. In
> particular I focused on whether the p-value follow a uniform
> distribution, i.e. whether the nominal rejection rates correspond with
> the proportion of simulations in which the test was rejected at those
> nominal rates. In essence both tests perform badly at sample sizes of
> a 100 and a 1,000. As Nick suggested, the Jarque-Bera test's
> perfomance is more awful than the performance of the Doornik-Hansen
> test, but for both tests my conclusion would be that a 1,000
> observations is just not enough for either test. At 10,000 and 100,000
> observations both tests seem to perform acceptable. However, at such
> large sample sizes you need to worry about whether a rejection of the
> null-hypothesis actually represents a substantively meaningful
> deviation from the normal/Gaussian distribution.
>
> So the bottom line is: at small sample sizes graphs are the only
> reliable way of judging whether a variable comes from a
> normal/Gaussian distribution because tests just don't perform well
> enough. At large sample sizes graphs are still the only reliable way
> of judging whether a variable comes from a normal/Gaussian
> distribution because in large sample sizes tests will pick up
> substantively meaningless deviations from the null-hypothesis.
>
> *------------------- begin simulation -------------------
> clear all
>
> program define sim, rclass
>         drop _all
>         set obs `=1e5'
>         gen x = rnormal()
>         tempname jb jbp
>         forvalues i = 2/5 {
>                 sum x in 1/`=1e`i'', detail
>                 scalar `jb' = (r(N)/6) * ///
>                        (r(skewness)^2 + 1/4*(r(kurtosis) - 3)^2)
>                 scalar `jbp' = chi2tail(2,`jb')
>                 return scalar jb`i' = `jb'
>                 return scalar jbp`i' = `jbp'
>
>                 mvtest norm x in 1/`=1e`i''
>                 return scalar dh`i' = r(chi2_dh)
>                 return scalar dhp`i' = r(p_dh)
>
>         }
> end
>
> simulate jb2=r(jb2) jbp2=r(jbp2) ///
>          jb3=r(jb3) jbp3=r(jbp3) ///
>          jb4=r(jb4) jbp4=r(jbp4) ///
>          jb5=r(jb5) jbp5=r(jbp5) ///
>                  dh2=r(dh2) dhp2=r(dhp2) ///
>          dh3=r(dh3) dhp3=r(dhp3) ///
>          dh4=r(dh4) dhp4=r(dhp4) ///
>          dh5=r(dh5) dhp5=r(dhp5) ///
>          , reps(2e4): sim
>
> rename jbp2 p2jb
> rename jbp3 p3jb
> rename jbp4 p4jb
> rename jbp5 p5jb
> rename dhp2 p2dh
> rename dhp3 p3dh
> rename dhp4 p4dh
> rename dhp5 p5dh
>
> gen id = _n
>
> reshape long p2 p3 p4 p5, i(id) j(dist) string
>
> label var p2 "N=100"
> label var p3 "N=1,000"
> label var p4 "N=10,000"
> label var p5 "N=100,000"
>
> encode dist, gen(distr)
> label define distr 2 "Jarque-Bera" ///
>                    1 "Doornik-Hansen", replace
> label value distr distr
>
> simpplot p?, by(distr) scheme(s2color) legend(cols(4))
> *-------------------- end simulation --------------------
> (For more on examples I sent to the Statalist see:
> http://www.maartenbuis.nl/example_faq )
>
> This simulation requires the -simpplot- package available at SSC and
> described here: <http://www.maartenbuis.nl/software/simpplot.html>
>
>
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: Jarque-Bera test
  - From: Nick Cox <[email protected]>
- Re: st: Jarque-Bera test
  - From: Maarten Buis <[email protected]>

Prev by Date: Re: st: Jarque-Bera test
Next by Date: Re: st: SUREG with if command.
Previous by thread: Re: st: Jarque-Bera test
Next by thread: st: Omitted Fixed Effects Dummy Variables
Index(es):
- Date
- Thread