Sometime one sends an email too early, I did so earlier today - apologies
for that.
I managed to find time to do a small search on the topic of
"outlier/outside value" definition in relation to box plots.
Frigge, M., D. C. Hoaglin, and B. Iglewicz. 1989. Some implementations of
the box plot. American Statistician 43: 50�54." documents eight different
implementations of the quartile in software/algoritms.
A recent paper also discusses this: "Outlier Labeling With Boxplot
Procedures" C. H. SIM, F. F. GAN, and T. C. CHANG. JOURNAL OF THE AMERICAN
STATISTICAL ASSOCIATION 100 (470): 642-652 JUN 2005
The authors have made large scale simulations and give tables of suggested
outlier detection principles depending on the supposed underlying
distribution.
And they say "This article shows that the commonly constructed boxplot is
in general inappropriate for detecting outliers in the normal and
especially the exponential samples. We recommend that the graphical
boxplot be constructed based on the knowledge of the underlying
distribution of the dataset and by controling the risk of labeling regular
observations as outliers."
Certainly this recommendation further underlines the need to quote what
type of whiskers are shown in a given box plot. A quick search through a
number of publications usually did not include any definition.
Jens Lauritsen
Consultant MD, ph.d. Associate professor
Denmark
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/