FUKUGAWA, Nobuya
>
> > I want to cut off extraordinarily large and small values
> from variables
> > used in regression analysis.
> > What is the easiest way to drop the largest and smallest
> 1% of observations
> > from variables in STATA-7?
Ronan Conroy
> These values are potentially very informative. You can try
> other approaches
> such as
> - median regression
> - intreg
> - robust regression
>
> Very large and very small values can indicate problems with
> measurement.
> -intreg- can be used to specify that these values are not
> known precisely
> but are bigger/smaller than some threshold.
> Robust regression is useful to confirm that substantive
> conclusions from
> your analysis are not being 'driven' by influential observations.
>
> I hate discarding data. These strange values are trying to tell us
> something. We ignore them at our peril. I am analysing some
> microbiology
> data at the moment. There is a tradition of discarding any
> measurements
> where there were so many bugs that the plate was
> unreadable. You can imagine
> the havoc that this has played with results!
>
I'd echo this strongly. Two other points:
1. identifying points extreme within univariate
distributions is not guaranteed to identify
multivariate outliers. -hadimvo- is one command
in this area.
2. tagging outliers and comparing results
with and without is another relatively simple strategy.
Nick
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/