Herve STOLOWY <[email protected]> asks:
> I have a group of 21 observations with one variable (a score) and
> would like to create three "homogeneous" groups.
>
> I found the -cluster kmeans- command. Here are my command lines:
>
> gsort - finance_aggregate
> cluster kmeans finance_aggregate, k(3)
>
> Each time I run these commands, I get a different result (i.e., a
> different clustering: the three groups are different). I looked
> at the help file but don't understand. (It might be related to
> the start option but I am not sure).
>
> Is there a way to obtain the same result everytime?
You can -set seed 183289- (or any other number you like) before
each call of -cluster kmeans- so that the same set of random
starting values are selected each time. Or, as you were
guessing, you can use the -start()- option to do the same thing
(with several suboptions controlling the k starting groups), see
-help cluster kmeans- for details.
SR Millis <[email protected]> said:
> You're going to need more than 1 variable. Cluster
> analysis is a multivariable technique. In addition, a
> sample size of only 21 is often too small for cluster
> analysis.
While cluster analysis is a multivariate technique, it will work
with a single variable also. That is no problem. Having only 21
observations might or might not be a problem. It depends on the
data. After you do your cluster analysis you might want to look
at some summaries or graphs of the resulting three groups.
. set seed 12345
. cluster kmeans myvar, k(3) name(myclus)
. bysort myclus: summarize myvar
. twoway dot myvar myclus
and possibly also
. cluster stop
(or similarly -anova myvar myclus-) to get a feel for how
distinct the groups are.
Ken Higbee [email protected]
StataCorp 1-800-STATAPC
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/