Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: How to find extreme values
From
"Nakelse, Tebila (AfricaRice)" <[email protected]>
To
"[email protected]" <[email protected]>
Subject
RE: st: How to find extreme values
Date
Tue, 27 Mar 2012 08:19:37 +0000
Hi Sandy Y. Zhu,
Find below an example of identification and correction of extreme value .
*** correction of the variable price
sysuse auto, clear
/* plot to visualize the extreme*/
graph box price
/* we can distinguish 8 extremes values*/
*** quartiles of price
egen Q1_price= pctile(price), p(25)
egen Q3_price= pctile(price), p(75)
egen IC_price= iqr(price)
***Identification of extreme value
gen touse=1 if (price< Q1_price-1.25*IC_price| price> Q3_price+1.25*IC_price) & missing(price)==0
recode touse . =0
tab touse
***Correction of the price
gen pricec =price
replace pricec =Q1_price-1.25*IC_price if price < Q1_price-1.25*IC_price & touse==1
replace pricec =Q3_price+1.25*IC_price if price> Q3_price+1.25*IC_price & touse==1
/* the corrected price box plot to see if the extreme value remain*/
*graph box pricec
Hope this help,
Tebila
-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Maarten Buis
Sent: Tuesday, March 27, 2012 8:27 AM
To: [email protected]
Subject: Re: st: How to find extreme values
On Tue, Mar 27, 2012 at 5:24 AM, Barth Riley <[email protected]> wrote:
> To remove outliers, you could:
>
> preserve
> replace var = . if abs(var) >= 1000000 (or some other value) [perform
> analyses] restore
>
> preserve and restore are added if you want to make a temporary change
> to these values
If I were to exclude such observations I would probably do something like:
gen byte touse = abs(var) <= 1e6
reg y var x if touse
-reg- could be any command, the key is the -if touse- part. The variable touse will contain 0s and 1s such that those non-extreme values get 1 (true) and the extreme values get 0 (false), see:
<http://www.stata.com/support/faqs/data/trueorfalse.html>. The reason why I prefer this is that it does not destroy any information in my dataset.
Hope this helps,
Maarten
--------------------------
Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen
Germany
http://www.maartenbuis.nl
--------------------------
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/