Dear Ladies and Gentlemen,
I am looking for advice concerning a cluster distance measure for
categorial data which weights by number of categories. Stata Version 8 will
be at my disposal to use.
I am an SPSS-user but as I could not find this kind of distance / proximity
measure in SPSS I thought about using Stata. However, among the
standard distance / proximity measures described in the programme's
help/search section and in the 'Stata Cluster Analysis'-book I could not find
such a measure either.
Therefore I would be very glad for any recommendation from this forum. I
am bemused to have not bounced into anything because such a category-
number-weighted measure seems to be crucial for most survey-based
studies.
I am looking for something capable of solving the distortion problem that
occurs when including (and dichotomising) variables of very different
category numbers (such as gender: 2 vs. occupation: 10). In my opinion the
matches would be overrated for occupation when using the available
matching coefficents for binary data. And reducing all variables to 2 or 3
categories would cause an inelegant and hopefully unnecessary loss of
information. In terms of formulas I found some suitable suggestions in the
literature but I am now looking for a possibility to apply it to my dataset
(World Values Survey) using statistical packages.
Sincerely yours,
Dana Liebmann
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/