Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | "Abrams, Judith" <abramsj@karmanos.org> |
To | <statalist@hsphsun2.harvard.edu> |
Subject | Re: st: Clustermat puzzle |
Date | Sat, 24 Mar 2012 11:43:40 -0400 |
T-Mobile. America's First Nationwide 4G Network Brendan Halpin <brendan.halpin@ul.ie> wrote: I have a small matrix of pairwise distances (all integers) that I'm passing to clustermat (Ward's method). I notice that if I scale the distances by a constant, I get different results. On investigation it seems that if I scale it by other than an integer power of two I get one solution, and by a power of two, another. Code below demonstrates the problem. Experimentation with the code shows that using a factor of a power of two by 0.11 (e.g. 0.44, 1.76) also returns the original solution. While clustering is often vulnerable to small changes in the data, it shouldn't be affected by a simple scale change. Presumably something subtle is happening with the internal representations of the distances. Brendan Code to download the distance matrix and compare solutions: use http://teaching.sociology.ul.ie/bhalpin/dist mkmat d1-d42, mat(D) clustermat wards D, name(D) add cluster generate a4=groups(4) capture program drop cltest program define cltest args mult tempname n4 diff M matrix `M' = D * `mult' clustermat wards `M', name(`M') add cluster generate `n4'=groups(4) tab `n4' a4 gen `diff' = `n4' - a4 su `diff' di _newline if r(mean)!=0 { di "Cluster solutions differ, factor " `mult' } else { di "Cluster solutions identical, factor " `mult' } cluster drop `M' end cltest 2 cltest 3 cltest 1/40 cltest 0.125 cltest 0.44 -- Brendan Halpin, Department of Sociology, University of Limerick, Ireland Tel: w +353-61-213147 f +353-61-202569 h +353-61-338562; Room F1-009 x 3147 mailto:brendan.halpin@ul.ie ULSociology on Facebook: http://on.fb.me/fjIK9t http://teaching.sociology.ul.ie/bhalpin/wordpress twitter:@ULSociology * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ ----------- Confidentiality Notice: This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and/or privileged information. If you are not the intended recipient(s), you are hereby notified that any dissemination, unauthorized review, use, disclosure or distribution of this email and any materials contained in any attachments is prohibited. If you receive this message in error, or are not the intended recipient(s), please immediately notify the sender by email and destroy all copies of the original message, including attachments. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/