Hi stata users,
I am trying to run a stata program to detecet outlier in my data set. I found 2 grubbs programs written in stata. Programs are here:
Program # 1.
_______________________________program begins_____________________________________________
program define grubbs
* this is a revised version of the original command, it no longer deletes missing obs or outlier
* instead, it sets "tag_grubbs" =1 if it believes the obs to be an outlier
* usage: "grubbs myvar .05 10"
version 8.0
*arguments:
* 1= Name of variable
* 2= Confidence interval (0.05 or 0.01)
* 3= Max number of iterations
args xvar conf maxit
tempvar dev missx
gen byte tag_grubbs = 0
local i = 1
di "deleting missing values"
gen byte missx = `xvar'==.
* initial guess for critical value
scalar Gcrit = 10
* start with G > Gcrit (otherwise loop will not begin)
scalar G = Gcrit +1
di "maxit = " `maxit'
di "G= " G
di "Gcrit = " Gcrit
while G > Gcrit & `i'<= `maxit' {
sum `xvar' if tag_grubbs == 0
local nobs = r(N)
gen `dev' = (abs(`xvar' -r(mean)))/r(sd)
gsort -`missx' -tag_grubbs `dev'
scalar G = `dev'[_N]
local ct = `conf'/(2*`nobs')
local ts = invttail(`nobs'-2,`ct')
scalar Gcrit = (`nobs'-1)*sqrt(`ts'^2/(`nobs'*(`nobs'-2+`ts'^2)))
di "Iteration = " `i' " Critical G = " Gcrit " Current G = " G
if (G > Gcrit) di `xvar'[_N] " is an outlier, so tag_grubbs = 1"
replace tag_grubbs = 1 if `dev' == G & G > Gcrit
local i = `i'+1
drop `dev'
}
if (`i'<=`maxit') di "Grubbs procedure terminated: no more outliers"
else di "Maximum iterations exceeded: Use larger maxit"
end
____________________________________end of program__________________________________
Program #2.
__________________________________________program begins___________________________-
program define grubbs
* this is a revised version of the original command, it no longer deletes missing obs or outlier
* instead, it sets "tag_grubbs" =1 if it believes the obs to be an outlier
* usage: "grubbs myvar .05 10"
version 8.0
*arguments:
* 1= Name of variable
* 2= Confidence interval (0.05 or 0.01)
* 3= Max number of iterations
args xvar conf maxit
tempvar dev missx
gen byte tag_grubbs = 0
local i = 1
di "deleting missing values"
gen byte missx = `xvar'==.
* initial guess for critical value
scalar Gcrit = 10
* start with G > Gcrit (otherwise loop will not begin)
scalar G = Gcrit +1
di "maxit = " `maxit'
di "G= " G
di "Gcrit = " Gcrit
while G > Gcrit & `i'<= `maxit' {
sum `xvar' if tag_grubbs == 0
local nobs = r(N)
gen `dev' = (abs(`xvar' -r(mean)))/r(sd)
gsort -`missx' -tag_grubbs `dev'
scalar G = `dev'[_N]
local ct = `conf'/(2*`nobs')
local ts = invttail(`nobs'-2,`ct')
scalar Gcrit = (`nobs'-1)*sqrt(`ts'^2/(`nobs'*(`nobs'-2+`ts'^2)))
di "Iteration = " `i' " Critical G = " Gcrit " Current G = " G
if (G > Gcrit) di `xvar'[_N] " is an outlier, so tag_grubbs = 1"
replace tag_grubbs = 1 if `dev' == G & G > Gcrit
local i = `i'+1
drop `dev'
}
if (`i'<=`maxit') di "Grubbs procedure terminated: no more outliers"
else di "Maximum iterations exceeded: Use larger maxit"
end
___________________________________end of program__________________________________________
Could anyone suggest me which program is better to use. I will appreciate if you please use auto data for variable price as an example to run these programs.
Thanks.
Badri Prasad
Policy, Reporting and Data Development
Labour Standards and Workplace Equity
National Labour Operations Directorate
HRSDC
(819) 956 - 8146
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/