Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: Clustermat puzzle
From
[email protected] (Brendan Halpin)
To
[email protected]
Subject
st: Clustermat puzzle
Date
Sat, 24 Mar 2012 19:42:57 +0000
I have a small matrix of pairwise distances (all integers) that I'm
passing to clustermat (Ward's method). I notice that if I scale the
distances by a constant, I get different results. On investigation it
seems that if I scale it by other than an integer power of two I get one
solution, and by a power of two, another.
Code below demonstrates the problem. Experimentation with the code shows
that using a factor of a power of two by 0.11 (e.g. 0.44, 1.76) also
returns the original solution.
While clustering is often vulnerable to small changes in the data, it
shouldn't be affected by a simple scale change. Presumably something
subtle is happening with the internal representations of the distances.
Brendan
Code to download the distance matrix and compare solutions:
use http://teaching.sociology.ul.ie/bhalpin/dist
mkmat d1-d42, mat(D)
clustermat wards D, name(D) add
cluster generate a4=groups(4)
capture program drop cltest
program define cltest
args mult
tempname n4 diff M
matrix `M' = D * `mult'
clustermat wards `M', name(`M') add
cluster generate `n4'=groups(4)
tab `n4' a4
gen `diff' = `n4' - a4
su `diff'
di _newline
if r(mean)!=0 {
di "Cluster solutions differ, factor " `mult'
}
else {
di "Cluster solutions identical, factor " `mult'
}
cluster drop `M'
end
cltest 2
cltest 3
cltest 1/40
cltest 0.125
cltest 0.44
--
Brendan Halpin, Department of Sociology, University of Limerick, Ireland
Tel: w +353-61-213147 f +353-61-202569 h +353-61-338562; Room F1-009 x 3147
mailto:[email protected] ULSociology on Facebook: http://on.fb.me/fjIK9t
http://teaching.sociology.ul.ie/bhalpin/wordpress twitter:@ULSociology
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/