Dear Statalisters,
I have a question about applying Ward linkage method for cluster
analysis of data:
As raw data matrix we have a 30 x 1200000 matrix consisting of binary
data variables.
Afterwards we use a Jaccard similarity coefficient to compare each
pair of the 30 objects. So we obtain a new matrix filled with the
values of the Jaccard coefficients. Thus, we have now a 30 x 30
similarity matrix with continuous values between 0 and 1.
Now we perform single linkage method on this 30 x 30 similarity
matrix in order to identify outliers within the objects. After that
we perform both average linkage method and Ward linkage method to
find appropriate clusters among the objects (without outliers). The
results shown in our dendrograms are quite reasonable, but in
literature I read that variables have to be measured on a metric
scale when Ward linkage method is used for clustering.
Therefore the question: Can the Ward linkage method be applied for
clustering in this case (binary raw data matrix)?
Thank you very much in advance,
Jochen
---
Jochen Siegele
Universitaet Karlsruhe (TH)
Institut fuer Wirtschaftspolitik und Wirtschaftsforschung (IWW)
Sektion Verkehr und Kommunikation
Postfach 6980, D-76128 Karlsruhe
Tel.: +49-(0)721-608-6043
Fax: +49-(0)721-34613
Mail: [email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/