To close the loop on my own question, I'm using -matrix dissimilarility..., gower- which uses a distance measure that allows for missing values, and then running -clustermat- on the resulting matrix.
-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Dan Weitzenfeld
Sent: Friday, June 12, 2009 9:53 AM
To: [email protected]
Subject: st: Clustering with missing values
Hi All,
I am doing a cluster analysis with a dataset that is sparse in a subset of variables. I'm against the standard techniques - modeling the missing values, or replacing them with the mean - for theoretical reasons.
A google search turned up a paper about using soft constraints - essentially using the sparse variables when they exist - and I'm wondering if there is a package/routine in Stata that implements this (or a similar) technique.
The paper is available at:
http://www.litech.org/~wkiri/Papers/wagstaff-missing-ifcs04.pdf
Thanks,
Dan
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/