Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: RE: st: Elimination of outliers
From
Nick Cox <[email protected]>
To
"'[email protected]'" <[email protected]>
Subject
RE: RE: st: Elimination of outliers
Date
Mon, 6 Jun 2011 15:48:43 +0100
I said I was not going to do this, but Austin Nichols gave you a gun.
Nick
[email protected]
Achmed Aldai
Hi Nick,
can you please tell me how to eliminate the top and bottom 2% of each variable because in my regression so far I am not getting the proper results and want to find out with this if this causes the problem.
Thank you!
-------- Original-Nachricht --------
> Datum: Mon, 6 Jun 2011 15:17:32 +0100
> Von: Nick Cox <[email protected]>
> An: "\'[email protected]\'" <[email protected]>
> Betreff: RE: st: Elimination of outliers
> 1. Transformation means using a transformed scale (e.g. logarithms) for
> one or more of your variables.
>
> 2. A non-identity link function in a generalized linear model means what
> it says: the help for -glm- is the place to start and points to other
> documentation.
>
> Otherwise, I assert that elimination of outliers is a very bad idea
> _unless_ you know from independent evidence that they arise from serious and
> irremediable problems of measurement, in which case chopping the tails of the
> distribution is _not_ the way to do it. In most fields I know, the outliers
> that stick out are genuine and important (the Amazon in hydrology, USA or
> China wherever it is in economics, and so on, and so on) and leaving them
> out is in my view lousy science and lousy statistics.
>
> If you disagree, well, we disagree, but I am not going to tell you how to
> do this in Stata.
>
> Nick
> [email protected]
>
>
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Achmed Aldai
> Sent: 06 June 2011 15:07
> To: [email protected]
> Subject: Re: st: Elimination of outliers
>
> Hi
>
> sorry I cannot really understand why it is a bad idea. I want to eliminate
> the outliers beacuse I think they cause a bias in my results.
>
> How can I transform my predictors and what do you mean by that?
>
> What is a non-identity link function?
>
> Thank you
>
> FElix
> -------- Original-Nachricht --------
> > Datum: Mon, 6 Jun 2011 13:39:20 +0100
> > Von: Nick Cox <[email protected]>
> > An: "[email protected]" <[email protected]>
> > Betreff: Re: st: Elimination of outliers
>
> > In general, a very bad idea. Consider transforming your response or
> > predictors or using a non-identity link function in a generalized
> > linear model or some flavour of robust regression as more measured
> > tactics.
> >
> > Nick
> >
> > On 6 Jun 2011, at 12:46, "Achmed Aldai" <[email protected]> wrote:
> >
> > > Hi
> > >
> > > I am currently working on a do file where I want to eliminate
> > > outliers which have the highest and the lowest values regarding
> > > certain variables. Here it is e.g. at and lt. In general I have
> > > 150000 observations and out of these observations I want to delete
> > > 25 observations from the upper and lower boundaries. But it might
> > > also be better to do it relatively meaning that I dont take the
> > > highest and lowest 25 but the lower and upper 1% of the
> > > corresponding variables.
> > >
> > > gvkey at lt
> > > 1001 1120 231
> > > 1001 1230 312
> > > 1210 57 32
> > > 1210 67 25
> > > 1354 789 560
> > > 1368 650 500
> > > 1481 1230 900
> > > 2930 21 30
> > > 3201 234 213
> > > 3201 256 220
> > > 3210 267 320
> > > 4510 4335 3214
> > >
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/