[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: Dropping the largest and smallest 1% of observations

From	"Nick Cox" <[email protected]>
To	<[email protected]>
Subject	RE: st: Dropping the largest and smallest 1% of observations
Date	Thu, 13 Feb 2003 12:00:40 -0000

FUKUGAWA, Nobuya 
> 
> > I want to cut off extraordinarily large and small values 
> from variables
> > used in regression analysis.
> > What is the easiest way to drop the largest and smallest 
> 1% of observations
> > from variables in STATA-7?

Ronan Conroy

> These values are potentially very informative. You can try 
> other approaches
> such as 
> - median regression
> - intreg 
> - robust regression
> 
> Very large and very small values can indicate problems with 
> measurement.
> -intreg- can be used to specify that these values are not 
> known precisely
> but are bigger/smaller than some threshold.
> Robust regression is useful to confirm that substantive 
> conclusions from
> your analysis are not being 'driven' by influential observations.
> 
> I hate discarding data. These strange values are trying to tell us
> something. We ignore them at our peril. I am analysing some 
> microbiology
> data at the moment. There is a tradition of discarding any 
> measurements
> where there were so many bugs that the plate was 
> unreadable. You can imagine
> the havoc that this has played with results!
> 

I'd echo this strongly. Two other points: 

1. identifying points extreme within univariate 
distributions is not guaranteed to identify 
multivariate outliers. -hadimvo- is one command 
in this area. 

2. tagging outliers and comparing results 
with and without is another relatively simple strategy. 

Nick 
[email protected] 
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- Re: st: Dropping the largest and smallest 1% of observations
  - From: Ronan Conroy <[email protected]>

Prev by Date: Re: st: Dropping the largest and smallest 1% of observations
Next by Date: Re: st: user written ado file
Previous by thread: Re: st: Dropping the largest and smallest 1% of observations
Next by thread: st: new ado, stcompet, for competing risks
Index(es):
- Date
- Thread