
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Cluster analyis on hand made distance matrix

From   Ulrich Kohler <[email protected]>
To   [email protected]
Subject   st: Cluster analyis on hand made distance matrix
Date   Mon, 10 Mar 2008 16:12:30 +0100

I have two "hand made" distance matrizes, SQdist1 and SQdist2. Both
distance matrizes are essentially identical, with the exception that
they are differently ordered.

If I perform a cluster analysis using singlelinkage for the two distance
matrizes, I get identical results:

. clustermat single SQdist1, name(cluster1) add
. clustermat single SQdist2, name(cluster2) add
. sum *_hgt

    Variable |       Obs        Mean    Std. Dev.       Min        Max
cluster1_hgt |        53    .2232704     .108128   .1666667   .6666667
cluster2_hgt |        53    .2232704     .108128   .1666667   .6666667

(The same is true for median-linkage and centroid linkage.)

However, if I use wards-linkage I get different results for the two
distance matrizes:

. clustermat wards SQdist1, name(cluster1) add
. clustermat wards SQdist2, name(cluster2) add
. sum *_hgt

    Variable |       Obs        Mean    Std. Dev.       Min        Max
cluster1_hgt |        53    .7051013     .861406   .1666667   4.414418
cluster2_hgt |        53    .7051013    .8751653   .1666667   4.645984

Although the difference doesn't seem large, it have led to quite
different groupings in a practical application. Unfortunately, I am not
an expert with cluster analysis. So, please, can anybody explain me why
this happens? If the order of distance matrix matter for
cluster-analysis, what is the "correct" order of the distance matrix,

Many regards


*   For searches and help try:

© Copyright 1996–2025 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index