To close the loop on this, I believe I've figured out how to cluster using Mahalanobis Distance as the similarity measure. Mahalanobis Distance is useful because, as Wikipedia says, "It differs from Euclidean distance in that it takes into account the correlations of the data set and is scale-invariant, i.e. not dependent on the scale of measurements."
The steps are:
1) installing David Kantor's -mahapick-
2) Confirm no missing values in varlist/observations you wish to use in clustering
3) using -mahascores- with -genmat- option to create the dissimilarity matrix
3) using -clustermat- on the resulting matrix
For example:
*----
ssc install mahapick
mahascores varlist, genmat(distance) unsq compute_invcovarmat
clustermat averagelinkage distance, add
*---
-----Original Message-----
From: Dan Weitzenfeld [mailto:[email protected]]
Sent: Friday, January 02, 2009 2:55 PM
To: [email protected]
Subject: Mahalanobis Distance and Clustering
Hi All,
I am looking into the possibility of using Mahalanobis Distance as a
similarity/dissimilarity measure in a hierarchical clustering
analysis.
I've done some searching through the archives, and I've found some
Mahalanobis-based programs, but none that do the clustering step.
I'm wondering:
-if this exists, and I just couldn't find it;
-if it doesn't exist, is there a reason why not - some limitation or
reason I'm not aware of.
Thanks in advance,
Dan
--
Dan Weitzenfeld
Media Analyst
EmSense Corporation
512 2nd Street, 3rd Floor
San Francisco, CA 94107
w: 415.418.7314
m: 510.552.0106
[email protected]
This email message is confidential and is intended only for the named recipient(s) above, and may contain information that is privileged, attorney work product or exempt from disclosure under applicable law. The contents of this email are for discussion purposes only and shall not create a legally binding commitment or agreement of EmSense Corporation, which shall require EmSense Corporation management approval and a formal written agreement signed by both parties. If you have received this message in error, or are not the named recipient(s), please immediately notify the sender and delete this message from your computer.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/