| |
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: RE getting rid of the outliners
Maarten, I had written in earlier suggesting -lv- (output below) or -iqr- (I just checked and for some reason, my
response went to Vora N and not to the list), however, your response is more true to the original posting.
That said, I have a follow up question for you
Using the fences created by
local u = r(p75) + (3/2) * (r(p75) - r(p25))
local l = r(p25) - (3/2) * (r(p75) - r(p25))
Would capture "mild" outliers. So my question is, how does this sit with the discussion in for example Hamilton,
Statistics with Stata, which distinguishes between mild and severe outliers pointing out that it is severe outliers that
create problems for many statistical techniques.
Many thanks
Ronnie
. lv mpg
# 74 Mileage (mpg)
---------------------------------
M 37.5 | 20 | spread pseudosigma
F 19 | 18 21.5 25 | 7 5.216359
E 10 | 15 21.5 28 | 13 5.771728
D 5.5 | 14 22.25 30.5 | 16.5 5.576303
C 3 | 14 24.5 35 | 21 5.831039
B 2 | 12 23.5 35 | 23 5.732448
A 1.5 | 12 25 38 | 26 6.040635
1 | 12 26.5 41 | 29 6.16562
| |
| | # below # above
inner fence | 7.5 35.5 | 0 1
outer fence | -3 46 | 0 0
Maarten Buis wrote:
-findit adjacent value- brings up the Nick's module
-adjacent- which you can install. It will only show
you the adjacent values, it does not store them so
you can use them to drop outliers. That could be an
oversight on the part of Nick, but I would not be
surprised if it was deliberate to prevent people
from mechanically dropping outliers.
Underneath I show how to create a new variable that
is one when mpg is an outliner and zero when it is
not, and how that variable could be used without
dropping cases. For details have a look at:
http://www.stata.com/support/faqs/data/trueorfalse.html
*----------------begin example-----------------
sysuse auto, clear
sum mpg, detail
local u = r(p75) + (3/2) * (r(p75) - r(p25))
local l = r(p25) - (3/2) * (r(p75) - r(p25))
gen out = mpg<`l' | mpg>`u'
hist mpg /*histogram including outlier*/
hist mpg if !out /*historgram excluding outlier*/
*---------------end example---------------------
HTH,
Maarten
-----------------------------------------
Maarten L. Buis
Department of Social Research Methodology
Vrije Universiteit Amsterdam
Boelelaan 1081
1081 HV Amsterdam
The Netherlands
visiting adress:
Buitenveldertselaan 3 (Metropolitan), room Z214
+31 20 5986715
http://home.fsw.vu.nl/m.buis/
-----------------------------------------
-----Original Message-----
From: vora n [mailto:[email protected]]
Sent: zondag 30 april 2006 2:47
To: [email protected]
Subject: st: getting rid of the outliners
Is there any STATA command that can drop
the observations that are the outliners?
Let's say I graph the box-and-whisker plot
graph box y
and then the graph will show the outliners.
Is there any built-in command that can identify
these outliners and drop them out of my data?
Or is there any command that tells the upper
adjacent value and the lower adjacent value
so that I can drop the outliners manually?
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/