In general, I've found that bad skewness/asymmetry messes up
significance tests more than heavy tails. I know I read this somewhere
long ago, and it seems to work pretty well.
When you have skewness, looking for transformations is a good idea.
Where you can get messed up is when the skewness is caused by a lumping
at a value - e.g. the number of subjects who have 0 days of
hospitalization. Then no transformation will help - and it's probably
better to fit models to no response and response given that it's greater
than 0. This might be a two-part or hurdle model or a mixture of
distributions (such as zip or zinb)
Tony
Peter A. Lachenbruch
Department of Public Health
Oregon State University
Corvallis, OR 97330
Phone: 541-737-3832
FAX: 541-737-4001
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Maarten buis
Sent: Wednesday, November 26, 2008 6:49 AM
To: [email protected]
Subject: Re: RE: st: significance of mean and median
--- Bastian Steingros <[email protected]> wrote:
> using
> sysuse auto, clear
> reg mpg, nohe
> mean mpg
> ttest mpg==0
>
> displays the same results. However, how do these tests deal with the
> assumption, that mpg has to normal distributed?
> More precisely , how important is the fact that mpg is normal
> distributed? Most of the variables in my sample are left or right
> skewed...
> Is ttest also in this case reliable it?
You can find that out using -simulate-. One way to figure this out is
to use simulation. You declare your data to be the population and
repeatedly test a true hypothesis on a random sample from your
"population" (N out of N with replacement, just like the bootstrap),
and than you look at whether the p-value folows a uniform distribution,
and whether you reject the null in only 5% of the samples. See the
example below and http://ideas.repec.org/p/boc/nsug08/14.html .
*-------------- begin example --------------------
capture program drop sim
program define sim, rclass
sysuse auto, clear
sum mpg, meanonly
replace mpg = mpg - r(mean)
bsample
ttest mpg = 0
return scalar p = r(p)
end
simulate p=r(p), reps(5000): sim
hist p // should be a uniform distribution
gen sig = p < .05
sum sig // mean should be .05
*--------------- end example --------------------
(For more on how to use examples I sent to the Statalist, see
http://home.fsw.vu.nl/m.buis/stata/exampleFAQ.html )
> by the way, median mpg require a option. So, how can I test if the
> median of a var. is significant without using this command? Because I
> have no idea which by-option would make sense in my sample.
I think that the term "significant" has done more harm than good
because it hides the null hypothesis. As a consequence too many
non-sensical hypotheses are being tested. What you need to do is to
specify a null hypothesis and justify why anyone should care about this
hypothesis. The hypothesis that the mean or the median of a variable is
zero is almost never of interest, and thus should almost never be
tested. It is usually much more interesting to compare the mean/median
between groups, for example men and women. So this is probably why it
never occured to someone (or no one thought it was worth their time) to
implement a test whether or not the median is equal to a certain fixed
value.
> Nick Cox seems not to be fully agreed with LAD/qreg...
Nick can speak for himself, but I got the impression that he wasn't
negative about -qreg-, but just noted that -qreg- did not have a neat
test equivalent like -regress- and -ttest-.
-- Maarten
-----------------------------------------
Maarten L. Buis
Department of Social Research Methodology
Vrije Universiteit Amsterdam
Boelelaan 1081
1081 HV Amsterdam
The Netherlands
visiting address:
Buitenveldertselaan 3 (Metropolitan), room N515
+31 20 5986715
http://home.fsw.vu.nl/m.buis/
-----------------------------------------
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/