Dear Gary:
By accident or design you reply to my reply, but
you don't focus on the kind of issue it raises.
As I understand it, you can reduce your problem of
non-normality by attacking the parts of the
data you find least convenient and changing
them! The ancient myth of the hotelier Procrustes
who chopped and stretched his unfortunate guests to fit
the beds on offer springs to mind. What's uppermost
here, jumping through hoops to attain respectable
P-values, or trying to promote statistical science?
Put in more conventional and less histrionic terms,
what precisely is the non-normality "problem" you have?
A simple example, nothing to do with residuals or time
series, but illustrative of the key difficulty,
is provided by the auto data. If you go
foreach v of var price-gear {
swilk `v'
}
you will see that various variables qualify as non-normal
according to conventional significance levels. But
this means mostly that the sample size is large enough
to detect some non-normality, not that the non-normality
is large enough to be problematic for any purpose
of data analysis. (In other words, the results exemplify
a standard limitation of significance tests.) In fact,
to pick up one example, a careful look at -gear-ratio-
by e.g.
qnorm gear_ratio
shows that despite the P-value of 0.01525 this
variable has a distribution which in practice
would not be problematic if it were a distribution
of residuals. (The P-value I put down partly to some
granularity, certainly not outliers or fat tails.)
And the n = 74 of the auto data is pretty modest
by most people's standards: the issue will be
compounded in larger datasets. My guess is
that with your kind of data you have a much
larger n.
Incidentally, chopping according to
a multiple of the SD is not Winsorization,
as I pointed out on Sunday in reply
to a previous posting of yours. More
importantly, replacing a distribution
longer-tailed than normal with one
shorter-tailed than normal may well lead
to rejections of normality too, depending
precisely on what test you are using...
Nick
[email protected]
gary tian
>
> Further to John's question regarding trimming, I would like
> to raise the
> following question to seek your help.
> I and testing cointegration and causality for daily return of
> share indices
> time series (first log difference) data based on VAR model.
> whatever I put
> different lag of each variable, I found there is still
> non-normality exist
> in the time series by residual test. I applied sort of
> winsorization in
> which the returns are winsorized by replacing all returns
> outside the range
> [mean +/- standard deviations] with these boundary values.
> the problems of
> non-normality has been largely improved but still existed. the Second
> method, I found it is more effective is using monthly and
> quarterly data,
> the problem is losing the original meaning of integration in
> precise number
> of days. Are these standard ways to treat the problem, or is
> there any other
> better way?
Nick Cox
> I guess there's a literature on this somewhere,
> but it doesn't seem that trimming of tails
> before regression ever caught on as standard practice
> (unless there's a subdiscipline that does it all the
> time, as a living refutation of this guess).
>
> The key question to me is what is your underlying
> problem? Worrying about long tails is often
> best met by quantile or robust regression or using
> transformations or non-identity link functions.
> Far simpler and better supported than tinkering
> with the tails...
>
Rijo John
> > I have a data set with quite a few outliers. Suppose I am
> trimming my
> > dependent variable 1% each from top and bottom using 1st and 99th
> > percentiles. And I have the regression estimates before and after
> > trimming. Let us also suppose that some of the variables that were
> > significant before trimming turned out to be insignificant
> > after trimming
> > and/or viceversa.
> >
> > Is there a standard way by which one can decide how much percentage
> > of data should be trimmed? Is a chow test for the equality of
> > coefficients
> > enough for this? I mean trim upto the point where the changes in
> > coefficients becomes insignificant? Or is there any other
> > standard way to
> > do this?
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/