Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: How to detect outliers
From
Xixi Lin <[email protected]>
To
statalist <[email protected]>
Subject
st: How to detect outliers
Date
Mon, 11 Feb 2013 13:50:34 -0500
Hi,
I tried two ways to detect outliers: one is to regard Cook’s Distance
greater than 4/n as outliers; the other is to regard those with
standardized residuals greater than 2 in magnitude as outliers. Here
is the my code:
gen residual=.
tempvar temp
foreach z of numlist 2/120 {
capture reg Y X1 X2 X3 X4 if Period==`z', noconstant
if !_rc {
predict temp,rstu
replace residual=temp if Period==`z'
drop temp
}
}
//cook's distance
gen di_bench=4/165979
gen distance=.
tempvar temp1
foreach z of numlist 2/120 {
capture reg Y X1 X2 X3 X4 if Period==`z', noconstant
if !_rc {
predict temp1,cook
replace distance=temp1 if Period==`z'
drop temp1
}
}
//outlier numbers
count if abs(residual) > 2 // 7922
count if distance > di_bench //111879
My question is did I mess up the codes? Why the two results are so
different? one shows 7922 outliers, the other shows 111879 outliers.
If I compare Cook's Distance with 1, then the outlier number is 133.
Can anyone tells me which method I should choose? Or is there any
other better ways to detect outliers? Thanks a lot.
Best,
Xixi Lin
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/