Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: RE: cluster analysis
From
Nick Cox <[email protected]>
To
"'[email protected]'" <[email protected]>
Subject
st: RE: cluster analysis
Date
Tue, 24 Jan 2012 13:43:06 +0000
As I understand it, you want to cluster data for two variables into two groups.
Any clustering that makes sense will be evident on a scatter plot and allow scientific interpretation.
K-means sounds to me overkill for such a problem, but tastes differ.
I know that many economists don't believe anything without a P-value attached.
A more formal approach to such data would presumably start with a discriminant analysis.
Nick
[email protected]
Gianluca Cafiso
I have run this cluster analysis:
cluster kmeans X1 X2 if id_X3==1, k(2) name(ca2) s(prandom) keepcen
cluster list ca2
cluster query ca2
return list
sreturn list
However, I do not manage to get the following information related to the cluster analysis:
1 - the initial mean values used as group centers
(I command the way they are defined "prandom", but I want to see the values too)
2 - the value of the dissimilarity measure (L2, euclidian)
Furthermore:
- Is there a way to test statistically whether my partition makes sense?
(I mean: do the data really flow into 2 groups?)
A statistician friend of mine suggested to look at Wilks' lamda.
Does anybody know if it makes sense with Stata's cluster algorithm and , if so,
how to get it?
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/