|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: RE: Simulate a skewed variable in stata, sample vs. population skewness
From |
"Nick Cox" <[email protected]> |
To |
<[email protected]> |
Subject |
st: RE: Simulate a skewed variable in stata, sample vs. population skewness |
Date |
Mon, 7 Dec 2009 13:11:12 -0000 |
First, there is a presumption here that the generic property of skewness
is best measured by a particular moment-based measure. That's dubious at
best and far from cogent for sample sizes of 10.
Second, although I am not completely clear on what you are trying to do
it appears that your interest is in sampling from a skewed population.
It's everybody's problem that small samples from such a population
differ enormously.
Third, what regression-type analysis is that you are doing for samples
of 10?
Nick
[email protected]
Karl-Oskar Lindgren
I have a question that I guess is partly statistical and partly
philosphical. In a paper that uses Monte-Carlo simulations to study
the small sample performance of an estimator I was asked by a referee
to investigate how the estimator performs when the error terms are
skewed.
When trying to implement this suggestion I realized that sample
skewness as reported by stata can differ considerably from the
skewness of the underlying population (although both the sample mean
and variance of the variable remain close to their population
counterparts). My question is therefore if it is the sample skewness
or the population skewness that should be kept constant when
examining the small sample performance of a statistical estimator.
In case my question is unclear the following simple example may help
illustrate the gist of my problem. Let's assume that we want to study
how the OLS-estimator perform in small samples when the error terms
are skewed. In order to do this we decide to generate 10 error terms
from a chi-square distribution with 1 degree-of-freedom. The
population skewness should then be 2^(3/2), i.e., about 2.8. But if I
generate 1000 samples from such a distribution in stata the average
skewness across these 1000 samples turn out to be about 1.3 (see the
example code below). I understand that the reason for the discrepancy
is that measures of skewness tend to be biased in small samples when
the variables are non-normal (indeed the sample skewness is
approaching its theoretical level as we increases the number of
observations in the example below).
My question, however, concerns whether it is the sample skewness or
the population skewness that I should keep constant in my
replications when I vary the other parameters of the model. If it is
the population skewness the implementation is straightforward since
the skewness in the population is known. But if it is the sample
skewness that should be kept constant I would appreciate any hints
of appropriate methods to accomplish this.
**Example code to illustrate the bias of r(skewness)
program define skewchi, rclass
version 9.2
drop _all
set obs 10
gen double x=invnorm(uniform())
gen double x2=x^2
sum x2, detail
return scalar mean=r(mean)
return scalar var=r(Var)
return scalar skew=r(skewness)
end
simulate mean=r(mean) var=r(var) skew=r(skew), ///
reps(1000) seed(1) dots: skewchi
sum
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/