| |
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: RE getting rid of the outliners
Thanks Maarten, my approach is close to yours, however in early stages of data cleaning, I like to look at severe
outliers and crosscheck this with the raw data so as to be sure it is not a data entry problem. Post cleaning, I am
really like you regarding how you deal with outliers.
That said, my main concern was that there will always be mild outliers and these can be ignored, however, if Vora would
like to drop any outliers, I would think "severe" outliers are a better candidate. But that is for Vora to decide I suppose.
Thanks anyhow (I didnt know about the -adjacent- )
Ronnie
Maarten Buis wrote:
Ronnie:
I had the same problem with sending to Vora instead of the
statalist (so Vora received multiple copies my email before
I found out what the problem was, sorry about that)
In my not overly humble opinion, determining outliers this way
is nothing more than applying rules of thumb, and it is bad
practice to let your analysis be influenced by a blind
application of a single rule of thumb. I am a regression man,
so when I am looking for outliers I look at scatter plots,
various plots involving residuals, cook's distances, and
leverages. I than try to identify points that worry me and
try to find out why they are special. Than I decide what I
am going to do about them, and in many cases the answer is
nothing.
HTH,
Maarten
-----------------------------------------
Maarten L. Buis
Department of Social Research Methodology
Vrije Universiteit Amsterdam
Boelelaan 1081
1081 HV Amsterdam
The Netherlands
visiting adress:
Buitenveldertselaan 3 (Metropolitan), room Z214
+31 20 5986715
http://home.fsw.vu.nl/m.buis/
-----------------------------------------
-----Original Message-----
From: [email protected] [mailto:[email protected]]On Behalf Of Ronnie Babigumira
Sent: maandag 1 mei 2006 11:24
To: [email protected]
Subject: Re: st: RE getting rid of the outliners
Maarten, I had written in earlier suggesting -lv- (output below) or -iqr- (I just checked and for some reason, my
response went to Vora N and not to the list), however, your response is more true to the original posting.
That said, I have a follow up question for you
Using the fences created by
local u = r(p75) + (3/2) * (r(p75) - r(p25))
local l = r(p25) - (3/2) * (r(p75) - r(p25))
Would capture "mild" outliers. So my question is, how does this sit with the discussion in for example Hamilton,
Statistics with Stata, which distinguishes between mild and severe outliers pointing out that it is severe outliers that
create problems for many statistical techniques.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/