For what it's worth, a version of Nick's Method Class 7 is provided in
Stata by the -somersd- package, downloadable from SSC using the -ssc-
command. The -somersd- package calculates confidence intervals for a
large family of median slopes, differences and ratios. More
documentation is available from my website (see my signature below). A
case involving a horrendous outlier can be found on Page 12 of the .pdf
document
http://www.imperial.ac.uk/nhli/r.newson/papers/censlope.pdf
which is also distributed with the -somersd- package as an ancillary
file.
I hope this helps.
Roger
Roger Newson
Lecturer in Medical Statistics
Respiratory Epidemiology and Public Health Group
National Heart and Lung Institute
Imperial College London
Royal Brompton campus
Room 33, Emmanuel Kaye Building
1B Manresa Road
London SW3 6LR
UNITED KINGDOM
Tel: +44 (0)20 7352 8121 ext 3381
Fax: +44 (0)20 7351 8322
Email: [email protected]
www.imperial.ac.uk/nhli/r.newson/
Opinions expressed are those of the author, not of the institution.
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Nick Cox
Sent: 07 June 2007 20:26
To: [email protected]
Subject: st: RE: RE: Re: RE: Re: RE: RE: IQR
Sure, there is a -winsor- ado which I wrote on SSC
and, according to Kit Baum's reports, it is quite heavily
used. I have never used it myself, bar in development.
I cannot recall the details, but perhaps someone
wrote into Statalist reporting that it seemed that
Stata did not support Winsorizing and that was a black
mark against Stata. To which the best reply was a
program, being concrete evidence that you can easily do
Winsorizing in Stata and here is one way to do it.
But let us look at the wider picture. There is no
one way to deal with outliers. There are many ways
to deal with outliers, including
1. Going out "into the field" and doing the measurement
again.
2. Testing whether they are genuine. Most of the
tests look pretty contrived to me, but you might find one
that you can believe fits your situation. Irrational
faith that a test is appropriate is always needed
to apply a test that is then presented as quintessentially
rational.
3. Throwing them out as a matter of judgement, i.e.
in Stata terms -drop-ping them from the data.
4. Throwing them out using some more-or-less
automated (usually not "objective") rule.
5. Ignoring them, along the lines of either 3 or 4.
This could be formal (e.g. trimming) or just leaving
them in the dataset, but omitting them from analyses
as too hot to handle.
6. Pulling them in using some kind of adjustment,
e.g. Winsorizing.
7. Downplaying them by using some other robust estimation
method.
8. Downplaying them by working on a transformed
scale.
9. Downplaying them by using a non-identical link
function.
10. Accommodating them by fitting some appropriate
fat-, long-, or heavy-tailed distribution, without
or with predictors.
11. Sidestepping the issue by using some non-parametric
(e.g. rank-based) procedure.
12. Getting a handle on the implied uncertainty
using bootstrapping, jackknifing or permutation-based
procedure.
13. Editing to replace an outlier with some more
likely value, based on deterministic logic. "An 18-
year-old grandmother is unlikely, but the person
in question was born in 1926, so presumably is
really 81."
14. Editing to replace an impossible or implausible
outlier using some imputation method that is currently
acceptable not-quite-white magic.
15. Analysing with and without, and seeing how much
difference the outlier(s) make(s), statistically,
scientifically or practically.
16. Something Bayesian. My prior ignorance of quite
what forbids from giving any details.
Naturally, these categories intergrade in some
cases, and I can believe I have forgotten
or am not aware of yet other approaches.
What is quite striking to me -- as with many
any areas of statistical science -- is how much
preferred solutions vary between investigator
and discipline, despite the broad similarity
of the problems that outliers pose.
Nick
[email protected]
Rajesh Tharyan
> Isn't there a winsor ado (written by nick) which can be used
> to deal with
> outliers? In some cases it may be preferable to throwing out the
> observations?
Rodrigo A. Alfaro
> It seems to be a 'common' practice when COMPUSTAT
> data is used. The dataset is composed by the balance sheet
> reports of US firms. It would be difficult to identify in the
> data mergers, splits or any sort of change in property that
> implies a huge change in the composicion of a firm (in terms
> of assets, fixed capital, etc.) then dropping extreme values
> in change in assets allows you to 'delete' the unexplained
> firms. Also, a similar problem affects the price where
> sometime a change in the dividend policy can produce a
> jump that makes sense only when the researcher knows
> the change in policy. Usually, researchers do not know
> about these policies or it is a titatic (and maybe useless)
> job trying to include them in the analysis.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/