Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: Outliers in correlation analysis

From   Chris Roebuck <[email protected]>
To   "'[email protected] '" <[email protected]>
Subject   RE: st: Outliers in correlation analysis
Date   Sat, 31 Dec 2005 07:56:08 -0600

I just read a great article on the effects of cleaning data (using trimming,
winsorizing, etc.). You may want to have a look at it:
Bollinger, C.R. and A. Chandra. 2005. "Iatrogenic Specification Error: A
Cautionary Tale of Cleaning Data."  Journal of Labor Economics 23(2):

Also, just curious if you've tried the non-parametric Spearman test?

-----Original Message-----
From: Maarten buis
To: [email protected]
Sent: 12/31/2005 3:36 AM
Subject: RE: st: Outliers in correlation analysis

Hi Siddharth,

There is a very nice cartoon in (Fox 1991) about how we deal with
outliers: You see a man in front
of a blackboard with a scatterplot on it. He tries to fit a regression
line through it, but there
are some obvious outliers. He frowns at the outliers, than he gets an
idea, picks up an eraser,
erases the inconvenient points, draws the regression line, and looks
very happy at the result. 

This cartoon is probably not the reference you are looking for, but the
John Fox "little green
Sage book" on regression diagnostics would be a good place to start
looking. Many ideas about
diagnosing outliers can also be applied to correlation, since the
correlation coefficient can be
seen as the standardized regresion coefficient in a bivariate

HTH and happy 2006,

John Fox (1991), "Regression Diagnostics, an Introduction". Thousand
Oaks, Sage.

Maarten L. Buis
Department of Social Research Methodology
Vrije Universiteit Amsterdam
Boelelaan 1081
1081 HV Amsterdam
The Netherlands

visiting adress:
Buitenveldertselaan 3 (Metropolitan), room Z214

+31 20 5986715

Siddharth wrote:
> I am trying to correlate two things using pearsons correlation,
> the results are non-significant due to one particular outlier
> (total number of observations = 36). If I exclude this outlier, 
> there is a strong correlation between the other 35 patients 
> (and this result makes biological sense)
> I checked if there were any biological reasons why this outlier
> should be excluded, there are none.
> These are cognitive tests, and it is possible that the outlier
> possibly was distracted and was not able to perform well.
> How should I deal with the problem? Are there any defined
> criteria for dealing with outliers in correlational analysis
> and if possible, is there a reference I can quote?

To help you stay safe and secure online, we've developed the all new
Yahoo! Security Centre.
*   For searches and help try:
*   For searches and help try:

© Copyright 1996–2025 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index