Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: New command for clustering -clustpop-
From
Nick Cox <[email protected]>
To
[email protected]
Subject
Re: st: New command for clustering -clustpop-
Date
Thu, 21 Apr 2011 19:39:33 +0100
Although there are plenty of exceptions, most cluster analysis
implementations that I've heard of are essentially exploratory in
spirit. Any inferential calculations are contingent not only on how
repeated sampling is set up, but also on the particular cluster
analysis method chosen. What has been called the classification
crunch amounts to this: If you have well-distinguished clusters, some
simple sensible graphical method will show you what they are. If you
don't, you should lay in supplies for endless experimentation with how
cluster dissimilarity is defined. how observations or clusters should
be grouped into larger clusters, and so forth.
To revisit a well-worn joke, statistical people can be clustered into
those who take cluster analysis very seriously and those who don't.
But it could be replied that this applies to most other statistical
methods too. Also, cluster analysis with a stronger hypothesis testing
element tends to be called something else, say discriminant analysis.
Just my proverbial tuppenceworth, Nick
On Thu, Apr 21, 2011 at 7:24 PM, Airey, David C
<[email protected]> wrote:
> .
>
> What do other software packages usually do with cluster analyses?
>
>> Thanks to Kit Baum, a new command of interest to -cluster- users has
>> been uploaded to ssc.
>>
>> Users of -cluster- are no doubt familiar that each run of the command
>> on the same data produces different clustering.
>>
>> A run of cluster is, in effect, a sample (n=1) from the population of
>> possible cluster groupings.
>>
>> -clustpop- expands the sample size by running -cluster- many times to
>> estimate the population group assignments. The most frequent group
>> assignment is taken as the estimate of the population and a
>> statistical test of significance is performed to ensure the lower
>> bound of the proportion is greater than 0.5. In other words, users
>> can be confident, at a given alpha level, that this group assignment
>> occurs for the majority of cases in the population. Cases which do no
>> meet the criteria are set to missing.
>>
>> I expect that most users will prefer this method to using the
>> -cluster- command alone.
>>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/