Carlos de Los Rios <[email protected]> asks:
> I am performing a "cluster kmedian" analysis, and I am wondering if
> there is any tool that can measure the "goodness of fit" of the number
> of groups predetermined.
The -cluster stop- command provides the Calinski & Harabasz
pseudo-F index. Most people think of using -cluster stop- only
after doing one of the hierarchical cluster analysis methods, but
the default rule(calinski) is also allowed after -cluster kmeans-
and -cluster kmedians-.
As a fake example:
. sysuse auto
. set seed 123123
. cluster kmedian mpg-gear, k(5) name(k5)
. cluster kmedian mpg-gear, k(6) name(k6)
. cluster kmedian mpg-gear, k(7) name(k7)
. cluster stop k5
+---------------------------+
| | Calinski/ |
| Number of | Harabasz |
| clusters | pseudo-F |
|-------------+-------------|
| 5 | 232.59 |
+---------------------------+
. cluster stop k6
+---------------------------+
| | Calinski/ |
| Number of | Harabasz |
| clusters | pseudo-F |
|-------------+-------------|
| 6 | 415.37 |
+---------------------------+
. cluster stop k7
+---------------------------+
| | Calinski/ |
| Number of | Harabasz |
| clusters | pseudo-F |
|-------------+-------------|
| 7 | 598.36 |
+---------------------------+
Gives me the pseudo-F for the kmedian clustering for 5, 6, and 7
clusters.
See the "[CL] cluster stop" manual entry (page 94) for a similar
example that you could run starting with
. webuse physed
to obtain the data.
Ken Higbee [email protected]
StataCorp 1-800-STATAPC
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/