Agricola Odoi <[email protected]> asks:
> I am running k-means cluster analysis. The clusters have already been
> identified but I would like to calculate the distance between each of the
> clusters. Does anyone know how to do this in STATA?
First, I will assume that when you did -cluster kmeans ...- that
you used the -keepcenters- option to add the k cluster mean
points to the bottom of your data.
The -cluster measures- command (see "[CL] cluster programming
utilities" in the manual or -help clprog-) can compute the
distances.
For example if there were 100 observations in my dataset and I
ran
cluster kmeans ... , k(3) keepcenters ...
to obtain the 3 group k-means cluster solution, then observations
101-103 would contain the 3 group means. I could then run
cluster measures ... in 101/103, compare(101/103) gen(d1 d2 d3) ...
And the new variables d1, d2, and d3 would contain the desired
distances. In particular, d1 (in observations 101/103) would
contain the distances between the mean of group 1 and the means
of the three groups. etc.
If you also wanted the distances between the various individual
observations and the group means you would change that last
command to
cluster measures ... , compare(101/103) gen(d1 d2 d3) ...
i.e., leave off the -in 101/103-. Then the 7th observation in
variable d2 would be the distance between the 7th observation and
the 2nd group mean (just as an example).
Ken Higbee [email protected]
StataCorp 1-800-STATAPC
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/