Thanks for the references. My own prejudices are different: the main
problems with cluster analyses are in general being confident that any
kind of cluster analysis is a good idea and in particular that the
results you got are not just an artefact of arbitrary choices. But
that's as may be.
Nick
[email protected]
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Alp Eren
Yurtseven
Sent: 08 June 2009 14:42
To: [email protected]
Subject: st: RE: kmeans clustering -initial starting points
Hi,
Quoting de Jong and Marsili,
The main problem with cluster analysis is to decide
on the number of clusters, to balance the need to represent
the data appropriately and, at the same time,
to keep the results manageable. A priori, we consider
two up to six groups manageable for finding plausible
interpretations and for future applications of the taxonomy.
Indeed previous taxonomies use the same range
of groups (see Table 1). To find a solution within this
range, we combined hierarchical and non-hierarchical
techniques (Milligan and Sokol, 1980; Punj and Stewart,
1983). For each number of groups (k), between two
and six, we perform a k-means "non-hierarchical" cluster
analysis, in which the firms are iteratively classified
based on their distance to some initial starting points
of dimension k. While some k-means methods use randomly
selected starting points, we employ the centroids
of an initial hierarchical solution for this purpose.4
4 To generate the initial solutions we carried out a hierarchical
analysis
by using theWard's method, which is based on squared Euclidian
distances.Ward's method generally provides good results compared to
other clustering methods (Milligan and Cooper, 1987). Homogeneous
groups are built so as to minimise the distance in scores of firms
within
a single cluster and to maximise the distance in scores between
companies
from the various clusters. A visual inspection of the dendrogram,
plotting the initial solutions of the hierarchical analysis, suggests a
taxonomy with four clusters.
de Jong,J.P.J., Marsili O., (2006). The fruit flies of innovation:
a taxonomy of innovative small firms. Research Policy 35, 213-229.
Punj, G., Stewart, D.W., 1983. Cluster analysis in marketing research:
review and suggestions for application. Journal of Marketing
Research 20, 134--148.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/