I think I can summarize the situation.
I have to agree that these variations from theoretic perfection
probably don't make much of a difference in most contexts.
Nonetheless, here are some options.
corr2data gives us perfect variance, but imperfect skew, unlike
invnorm. The kurtosis is also imperfect. The imperfect skew is
visible in a histogram, FWIW.
invnorm gives perfect skew, imperfect variance and imperfect kurtosis.
The variance is easy to fix: just divide the variable by its own
standard deviation. Kurtosis can be addressed by increasing the
sample size sufficiently, though hitting 3.00000 might require
500,000,000 observations, as Jeph Herrin has shown (thanks Jeph!).
The late Gunnar Blom's recommendation of (_n - 3/8) / (_N + 3/4),
yields perfect variance, but imperfect kurtosis and skew. (_n/_N+1)
gives (almost) perfect skew, imperfect kurtosis and imperfect variance
that can be easily corrected.
Here's another approach for smaller sample sizes. The inflection
point of a Gaussian variable is 1 sigma. So increasing the kurtosis
involves making the central data points smaller and the points beyond
+/-1 sigma larger. Consider the following code:
version 8.2
set memory 10m
set obs 20000
generate double perfuniform = _n/(_N+1)
generate double perfnorm = invnorm(perfuniform)
gen double perfnorm5=perfnorm
scalar ep = .00001
scalar kurt = 0
while kurt<2.999999 {
quietly summarize perfnorm5
scalar sdnorm5=r(sd)
* Increase kurtosis:
quietly replace perfnorm5 = perfnorm5*(1-ep) ///
if perfnorm5<sdnorm5 & perfnorm5>-sdnorm5
quietly replace perfnorm5 = perfnorm5*(1+ep) ///
if perfnorm5>sdnorm5 & perfnorm5<-sdnorm5
quietly sum perfnorm5, detail
scalar kurt=r(kurtosis)
di "kurtosis: " kurt " ep: " ep "
}
di "kurtosis: " kurt " ep: " ep "
sum perfnorm5, detail
scalar ep = .000001
while kurt>3.000001 {
quietly summarize perfnorm5
scalar sdnorm5=r(sd)
* Decrease kurtosis:
quietly replace perfnorm5 = perfnorm5*(1+ep) ///
if perfnorm5<sdnorm5 & perfnorm5>-sdnorm5
quietly replace perfnorm5 = perfnorm5*(1-ep) ///
if perfnorm5>sdnorm5 & perfnorm5<-sdnorm5
quietly sum perfnorm5, detail
scalar kurt=r(kurtosis)
}
di "kurtosis: " kurt " ep: " ep "
quietly sum perfnorm5
replace perfnorm5 = perfnorm5/r(sd)
sum perfnorm5, detail
Here's what we get:
. sum perfnorm5, detail
perfnorm5
-------------------------------------------------------------
Percentiles Smallest
1% -2.329282 -3.897047
5% -1.647359 -3.725188
10% -1.28356 -3.6213 Obs 20000
25% -.6713475 -3.54596 Sum of Wgt. 20000
50% -6.93e-17 Mean -1.38e-17
Largest Std. Dev. 1
75% .6713475 3.54596
90% 1.28356 3.6213 Variance 1
95% 1.647359 3.725188 Skewness -8.25e-16
99% 2.329282 3.897047 Kurtosis 3
Perfection! Or so I hope. The histogram looks ok to my eyes.
I should disclose that I have not read about kurtosis adjustment
anywhere, and my technique (which occurred to me this morning) may
have problems that I have not considered.
Does this matter? Probably not in most cases. But consider a smaller
sample size:
set obs 200
generate double perfuniform = _n/(_N+1)
generate double perfnorm = invnorm(perfuniform)
sum perfnorm, det
perfnorm
-------------------------------------------------------------
Percentiles Smallest
1% -2.250142 -2.577553
5% -1.623964 -2.328219
10% -1.270418 -2.172065 Obs 200
25% -.6706013 -2.055808 Sum of Wgt. 200
50% 6.94e-17 Mean 1.43e-17
Largest Std. Dev. .9797342
75% .6706013 2.055808
90% 1.270418 2.172065 Variance .9598791
95% 1.623964 2.328219 Skewness 4.38e-16
99% 2.250142 2.577553 Kurtosis 2.746629
Conceivably, it might make sense to improve upon a kurtosis of 2.747
and a variance of .96.
Great thanks to Jeph Herrin, Nick Cox and Richard Williams for their comments.
On 5/10/07, Nick Cox <[email protected]> wrote:
I can't take the minute differences in this thread very seriously
but I think I recall, for what it's worth, that the late Gunnar Blom
recommended (in this notation) (_n - 3/8) / (_N + 3/4)
as a plotting position for (near) Gaussian data.
Nick
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/