There IS an interpretation of the Spearman correlation for continuous variables in an infinite population. In that case, if the random variables are X and Y, then the Spearman rho(X,Y) is simply the Pearson correlation of F_X(X) and F_Y(Y), where F_X(.) and F_Y(.) are the population cumulative distribution functions of X and Y respectively. And a Pearson correlation, as always, is a measure of linearity.
The two main problems with the Spearman rho are that (a) it is ONLY a measure of linearity between 2 cumulative distribution functions (with no interpretation as a difference between concordance and discordance probabilities), and that (b) the Central Limit Theorem works a lot less quickly for the sample Spearman rho than for the sample Kendall tau-a, especially under the null hypothesis of zero correlation (see Kendall and Gibbons, 1990).
Best wishes
Roger
References
Kendall, M. G., and J. D. Gibbons. 1990. Rank Correlation Methods. 5th ed. Oxford, UK: Oxford University Press.
Roger B Newson BSc MSc DPhil
Lecturer in Medical Statistics
Respiratory Epidemiology and Public Health Group
National Heart and Lung Institute
Imperial College London
Royal Brompton Campus
Room 33, Emmanuel Kaye Building
1B Manresa Road
London SW3 6LR
UNITED KINGDOM
Tel: +44 (0)20 7352 8121 ext 3381
Fax: +44 (0)20 7351 8322
Email: [email protected]
Web page: http://www.imperial.ac.uk/nhli/r.newson/
Departmental Web page:
http://www1.imperial.ac.uk/medicine/about/divisions/nhli/respiration/popgenetics/reph/
Opinions expressed are those of the author, not of the institution.
-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Stas Kolenikov
Sent: 07 October 2009 21:27
To: [email protected]
Subject: Re: st: difference between "Spearman" and "pwcorr / correlate"
> >Inference for Pearson's moment correlation relies on normality of the
> >data. Spearman rank correlation is free of any assumptions, but there
> >is no population characteristic that it estimates, which makes
> >interpretation and asymptotic inference somewhat weird. If one is
> >significant and the other is not, you are making either type I or type
> >II error somewhere.
> In the angels on the head of a pin vein:
> Of possible interest in this regard is that the Spearman coefficient is the
> same as the Pearson calculated on the ranked values of the variables (ties
> getting the average rank). I would agree that this is not a terribly
> interesting population parameter, but isn't this nevertheless an
> estimable/testable population characteristic?
If you have a finite population, then of course you will have Spearman
correlation for it. Although if you want to set up any asymptotic
framework, you will be trying to hit a moving target. I don't think
there is a meaningful definition of Spearman correlation for infinite
populations/continuous variables, although I might be mistaken. On the
other hand, Kendall's tau, as Nick Cox quoted from Roger Newson, has
explicit population analogues in probabilities of concordant and
discordant pairs of observations.
The question is: if the correlation estimate is 0.5, what does it say?
For Pearson moment correlation, it means that the proportion of
explained variance in a bivariate regression is 0.25. For Kendall's
tau, it means that for every discordant pair of observations, there
are three concordant pairs (i.e., Prob[ concordant ] = 3 Prob[
discordant ] = 3/4 ). For Spearman rank correlation, you can only say
that the variables are positively associated, but not much more.
--
Stas Kolenikov, also found at http://stas.kolenikov.name
Small print: I use this email account for mailing lists only.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/