Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Brendan Halpin <brendan.halpin@ul.ie> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: Observations in Sequence analysis |
Date | Wed, 03 Oct 2012 23:25:38 +0100 |
On Wed, Oct 03 2012, Sarah Park wrote: > However, when I run the cluster analysis based > on the dissimilarity matrix afterwards, I have Stata categorizing some > of the sequence with same elements (ex. 333, 333333, 333333333) into > the same cluster, some into different clusters when they are clearly > the same sequence. These are not the same sequence: they have different length. If you are using the optimal matching algorithm to compare sequences, the indel cost is incurred for each unit difference in length. If you want these sequences to be considered the same, you need another approach. If you really want to consider these sequences as the same, you could remove consecutive duplicates (so that they would all be the same as the sequence "3"). Brendan -- Brendan Halpin, Department of Sociology, University of Limerick, Ireland Tel: w +353-61-213147 f +353-61-202569 h +353-61-338562; Room F1-009 x 3147 mailto:brendan.halpin@ul.ie ULSociology on Facebook: http://on.fb.me/fjIK9t http://teaching.sociology.ul.ie/bhalpin/wordpress twitter:@ULSociology * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/