I think you are referring to -iqr- from STB-3. The FAQ
advises
"Say what command(s) you are using. If they are not part of
official Stata, say where they come from: the STB/SJ, SSC, or
other archives."
As you did not follow this advice, I had to puzzle out
which command you were referring to. No doubt various
other members of the list fell at the first fence.
Now as to your question: I do not understand what you
do not understand. The help for -iqr- looks very helpful
to me. It includes these definitions:
============================================================
IQR (Interquartile Range) = 75th percentile - 25th percentile
Pseudo standard deviation = IQR/1.349
10% trim mean = Average of cases between 10th and
90th percentiles
Inner fences = Q(25)-1.5IQR and Q(75)+1.5IQR
Outer fences = Q(25)-3IQR and Q(75)+3IQR
Mild outlier = Q(25)-3IQR <= x < Q(25)-1.5IQR or
Q(75)+1.5IQR < x <= Q(75)+3IQR
Severe outlier = x < Q(25)-3IQR or x > Q(75)+3IQR
=============================================================
Thus a "severe outlier" lies more than 3 IQR away from the nearer
quartile and a "mild outlier" lies more than 1.5 (but not more than 3)
IQR away from the nearer quartile.
These definitions go back to J.W. Tukey. 1977. Exploratory data
analysis. Reading, MA: Addison-Wesley, except that the definitions
of quartiles Stata uses are documented at [R] summarize.
These are arbitrary limits. Their main interest is that they are
sometimes used in boxplots to determine which data points should
be shown individually.
That said, "getting rid" of severe outliers is, in my view, not
usually a good idea unless there is independent evidence that
the data are wholly untrustworthy (e.g. a laboratory record that
the experiment was grossly disturbed). Dropping values more than
3 IQR away from the nearer quartile will in most instances throw
out important information. It would throw away most major cities
compared with cities in their country.
Nick
[email protected]
[email protected]
> The description of IQR in Stata help is a little confusing.
> I am using
> this command to get rid of severe outliers but I am not quite
> sure how
> the iqr command calculates them. The notation is a bit
> confusing. Can
> someone explain this to me or direct me to other sources? I have
> Statistics with Stata (for version 7), a book published by Lawrence
> Hamilton, the person who wrote the IQR program, but I am still a bit
> baffled.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/