In your first complicated command, p(75) is evidently a typo for r(p75).
In your second complicated command, p(25) is evidently a typo for
r(p25).
I didn't look further.
There is no gain here, and much fiddly extra typing, in writing e.g.
`r(p75)' rather than r(p75).
Note that -extremes- from SSC and the -egen- functions -adjl()- and
-adju()- from -egenmore- from SSC already incorporate similar
functionality.
Nick
[email protected]
Thomas Speidel
I am trying to use macros stored in the summarize command to flag
outliers/influenetial observations if they fall outside of this range:
p25 - 2IQR <= var <= p75 +2IQR
suppose I try to do this on the weight var from the auto.dta dataset
(with a little modification):
sysuse auto, clear
set obs 75
replace weight = 8000 in 75
qui: summ weight, d
gen weight_outlier=1 if (weight>`p(75)'+2*(`r(p75)'-`r(p25)') &
(weight<.))
replace weight_outlier=1 if (weight<`p(25)'-2*(`r(p75)'-`r(p25)'))
If I was to do it by hand:
. di 3*2240-2*3670
-620
. di 3*3670-2*2240
6530
gen weight_outlier2=1 if weight>6530 & weight <.
There is something I am doing wrong in the first approach - read: poor
macro programming :-) - but can't quite grasp what the problem is.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/