Hi,
Quoting de Jong and Marsili,
The main problem with cluster analysis is to decide
on the number of clusters, to balance the need to represent
the data appropriately and, at the same time,
to keep the results manageable. A priori, we consider
two up to six groups manageable for finding plausible
interpretations and for future applications of the taxonomy.
Indeed previous taxonomies use the same range
of groups (see Table 1). To find a solution within this
range, we combined hierarchical and non-hierarchical
techniques (Milligan and Sokol, 1980; Punj and Stewart,
1983). For each number of groups (k), between two
and six, we perform a k-means "non-hierarchical" cluster
analysis, in which the firms are iteratively classified
based on their distance to some initial starting points
of dimension k. While some k-means methods use randomly
selected starting points, we employ the centroids
of an initial hierarchical solution for this purpose.4
4 To generate the initial solutions we carried out a hierarchical analysis
by using theWard's method, which is based on squared Euclidian
distances.Ward's method generally provides good results compared to
other clustering methods (Milligan and Cooper, 1987). Homogeneous
groups are built so as to minimise the distance in scores of firms within
a single cluster and to maximise the distance in scores between companies
from the various clusters. A visual inspection of the dendrogram,
plotting the initial solutions of the hierarchical analysis, suggests a
taxonomy with four clusters.
de Jong,J.P.J., Marsili O., (2006). The fruit flies of innovation:
a taxonomy of innovative small firms. Research Policy 35, 213-229.
Punj, G., Stewart, D.W., 1983. Cluster analysis in marketing research:
review and suggestions for application. Journal of Marketing
Research 20, 134--148.
Alp Eren Yurtseven
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/