Sergiy <[email protected]> and [email protected]
If the issue is that there are more distinct values than -levelsof-
can handle, that is easily resolved, unless I am missing some finer
point:
prog grub2, rclass sortpreserve
syntax [varlist] [if] [in] [, Level(int 95)]
marksample touse
foreach v of local varlist {
tempvar c lev2 levsum sqsum Z
qui {
bys `v' `touse': g `c'=0 if (_N-_n==0)&`touse'
count if `c'==0 & `touse'
local n=r(N)
local t2=(invttail(`n'-2,(1-`level'/200)/(2*`n')))^2
local G_cr=((`n'-1)/sqrt(`n'))*sqrt(`t2'/(`n'-2+`t2'))
sort `c'
g `lev2'=`v'^2
g `levsum'=sum(`v') if `c'<.
g `sqsum'=sum(`lev2') if `c'<.
qui su `levsum', meanonly
loc lsum=r(max)
qui su `sqsum', meanonly
loc ssum=r(max)
local mean=`lsum'/`n'
local levsdev=sqrt(`ssum'/`n'-(`mean')^2)
g `Z'=abs(`mean'-`v')/`levsdev' if `c'<.
levelsof `v' if `Z'>`G_cr'&`c'<., local(outliers)
}
di as txt "Outliers in `v': " as res "`outliers'"
}
return local outliers="`outliers'"
end
clear
range n 1 190717 190717
g x=invnorm(uniform())
replace x=6 in 1
grub2
No doubt the above program could be cleaned up a bit...
On Feb 20, 2008 12:48 PM, Sergiy Radyakin <[email protected]> wrote:
> On 2/20/08, [email protected]
> <[email protected]> wrote:
> > Hi Austin,
> > I ran your program with my data set of 190717 observations and found the following result.
> >
> > . Grubbs2 lnwage, lev(95)
> > macro length exceeded
> > r(1000);
>
> That is the answer to Austin's question regarding why we need to limit
> the number of unique values.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/