Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: How to detect outliers
From
Xixi Lin <[email protected]>
To
[email protected]
Subject
Re: st: How to detect outliers
Date
Tue, 12 Feb 2013 13:22:27 -0500
Hi Steve,
About the robust regression, I have a question, after running mmreg,
is it possible to predict residuals? Mine has errors:
xi: mmregress Y X1 X2 X3
predict r,residual
error message: option residual not allowed
My question is that is it possible to test residual normality and
heterokedasticity after robust regression or does robust regression
already corrects for those?
Best,
Xixi Lin
On Mon, Feb 11, 2013 at 5:51 PM, Steve Samuels <[email protected]> wrote:
> Identifying outliers on the basis of a least squares fit is a very bad
> idea, however popular (Hampel et al., 1986). A far superior approach in
> Stata is the robust regression package -mmregress- by Verardi and Croux
> (-findit-). In providing a resistant fit, -mmregress- also identifies
> outliers and high leverage points.
>
>
> Verardi, V., and C. Croux. 2009. Robust regression in Stata. Stata
> Journal 9, no. 3: 439-453.
>
> Hampel, Frank, Elvezio Ronchetti, Peter Rousseeuw, and Werner Stahel.
> 1986. Robust Statistics: The Approach Based on Influence Functions
> (Wiley Series in Probability and Mathematical Statistics). New York:
> John Wiley and Sons.
>
>
> Steve
>
> On Feb 11, 2013, at 2:37 PM, Xixi Lin wrote:
>
> Hi Nick,
>
> You are absolutely right! I messed up the obs numbers, it should be
> obs in each period instead. And After I fix that, the results from
> these two methods are pretty close.
>
> Thanks again. You are so helpful! ^_^
>
> Best,
> Xixi Lin
>
> On Mon, Feb 11, 2013 at 2:24 PM, Nick Cox <[email protected]> wrote:
>> I wouldn't regard any kind of large residual as indicating outliers
>> unequivocally. On the contrary, a really marked outlier is likely to
>> pull the regression towards it, with the result of a small residual.
>>
>> Your criterion here for Cook is 4/n, but evidently you are fitting
>> regressions separately for each period. The total dataset size of
>> 165779 is not pertinent to regressions fitted individually. The
>> relevant criterion is the number of observations used in each
>> regression.
>>
>> I think you'd learn more from residual vs fitted plots, even all 119 of them.
>>
>> Whether you would be better off with a different model depends on your
>> research problem.
>>
>> Nick
>>
>> On Mon, Feb 11, 2013 at 6:50 PM, Xixi Lin <[email protected]> wrote:
>>> Hi,
>>> I tried two ways to detect outliers: one is to regard Cook’s Distance
>>> greater than 4/n as outliers; the other is to regard those with
>>> standardized residuals greater than 2 in magnitude as outliers. Here
>>> is the my code:
>>>
>>> gen residual=.
>>> tempvar temp
>>> foreach z of numlist 2/120 {
>>> capture reg Y X1 X2 X3 X4 if Period==`z', noconstant
>>> if !_rc {
>>> predict temp,rstu
>>> replace residual=temp if Period==`z'
>>> drop temp
>>> }
>>> }
>>>
>>> //cook's distance
>>> gen di_bench=4/165979
>>> gen distance=.
>>> tempvar temp1
>>> foreach z of numlist 2/120 {
>>> capture reg Y X1 X2 X3 X4 if Period==`z', noconstant
>>> if !_rc {
>>> predict temp1,cook
>>> replace distance=temp1 if Period==`z'
>>> drop temp1
>>> }
>>> }
>>> //outlier numbers
>>> count if abs(residual) > 2 // 7922
>>> count if distance > di_bench //111879
>>>
>>> My question is did I mess up the codes? Why the two results are so
>>> different? one shows 7922 outliers, the other shows 111879 outliers.
>>> If I compare Cook's Distance with 1, then the outlier number is 133.
>>>
>>> Can anyone tells me which method I should choose? Or is there any
>>> other better ways to detect outliers? Thanks a lot.
>>>
>>> Best,
>>> Xixi Lin
>>>
>>> *
>>> * For searches and help try:
>>> * http://www.stata.com/help.cgi?search
>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>> * http://www.ats.ucla.edu/stat/stata/
>>
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>> * http://www.ats.ucla.edu/stat/stata/
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/