Nick
[email protected]
Matissa Hollister
> Im hoping someone can help me solve this problem,
> although I'm beginning to think that it's hopeless.
> Basically I've created my own special measure of
> dissimilarity that I want to use for clustering, but
> I'm finding that there is no way to get Stata to allow
> me to use this new dissimilarity measure. Any ideas
> of ways to get around this problem would be greatly
> appreciated.
>
> Basically, I am using a procedure called Optimal
> Matching, an algorithm designed to create a measure of
> dissimilarity between two sequences of data. I am
> using it to identify people who have similar career
> patterns. I've created a do-file that accomplishes
> the most difficult and unusual part of Optimal
> Matching, which is creating the measure of
> dissimilarity between each pair of sequences. I now
> want to run a clustering procedure to identify groups
> based upon this dissimilarity measure.
>
> I found a post in the listserv archives (dated
> November 18, 2002) where someone wanted to do
> something similar (she wanted to create a geographic
> distance measure). From the response I gather the
> calling and running of the dissimilarity algorithms
> occurs within the built-in stata command _cluster and
> is done within C, which is certainly beyond my
> programming abilities. I've contemplated several
> possibilities and would love help or advice on any of
> them:
>
> 1)find a different software program that will allow me
> to easily input a new dissimilarity measure into a
> cluster command (preferably not expensive)
>
> 2)a way to alter Stata's cluster command to allow for
> this new dissimilarity measure
>
> 3)a way to get around this problem, e.g.:
>
> A.use the ParseDist command within cluster.ado to
> somehow cause the built-in command to call up a
> different distance command
>
> B.ways to enter the data so that a built-in Stata
> dissimilarity measure will result in the same pairwise
> distances (difficult because the pairwise
> dissimilarities make up a multi-dimensional space, the
> whole point is that they are difficult to summarize in
> a few variables)
>
> 4) write my own clustering procedure
>
> Please! Any help would be gratefully accepted. I
> know that several other researchers have already used
> Optimal Matching with clustering, so my guess is that
> option #1 might be the most viable one, but I'm not
> sure where to look.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/