Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: Elimination of outliers
From
Nick Cox <[email protected]>
To
"'[email protected]'" <[email protected]>
Subject
RE: st: Elimination of outliers
Date
Mon, 6 Jun 2011 15:17:32 +0100
1. Transformation means using a transformed scale (e.g. logarithms) for one or more of your variables.
2. A non-identity link function in a generalized linear model means what it says: the help for -glm- is the place to start and points to other documentation.
Otherwise, I assert that elimination of outliers is a very bad idea _unless_ you know from independent evidence that they arise from serious and irremediable problems of measurement, in which case chopping the tails of the distribution is _not_ the way to do it. In most fields I know, the outliers that stick out are genuine and important (the Amazon in hydrology, USA or China wherever it is in economics, and so on, and so on) and leaving them out is in my view lousy science and lousy statistics.
If you disagree, well, we disagree, but I am not going to tell you how to do this in Stata.
Nick
[email protected]
-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Achmed Aldai
Sent: 06 June 2011 15:07
To: [email protected]
Subject: Re: st: Elimination of outliers
Hi
sorry I cannot really understand why it is a bad idea. I want to eliminate the outliers beacuse I think they cause a bias in my results.
How can I transform my predictors and what do you mean by that?
What is a non-identity link function?
Thank you
FElix
-------- Original-Nachricht --------
> Datum: Mon, 6 Jun 2011 13:39:20 +0100
> Von: Nick Cox <[email protected]>
> An: "[email protected]" <[email protected]>
> Betreff: Re: st: Elimination of outliers
> In general, a very bad idea. Consider transforming your response or
> predictors or using a non-identity link function in a generalized
> linear model or some flavour of robust regression as more measured
> tactics.
>
> Nick
>
> On 6 Jun 2011, at 12:46, "Achmed Aldai" <[email protected]> wrote:
>
> > Hi
> >
> > I am currently working on a do file where I want to eliminate
> > outliers which have the highest and the lowest values regarding
> > certain variables. Here it is e.g. at and lt. In general I have
> > 150000 observations and out of these observations I want to delete
> > 25 observations from the upper and lower boundaries. But it might
> > also be better to do it relatively meaning that I dont take the
> > highest and lowest 25 but the lower and upper 1% of the
> > corresponding variables.
> >
> > gvkey at lt
> > 1001 1120 231
> > 1001 1230 312
> > 1210 57 32
> > 1210 67 25
> > 1354 789 560
> > 1368 650 500
> > 1481 1230 900
> > 2930 21 30
> > 3201 234 213
> > 3201 256 220
> > 3210 267 320
> > 4510 4335 3214
> >
> > I hope this became clear.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/