|
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: RE: Cluster analysis on survey data
It appears to me that a cluster analysis will not serve Jessica's
purpose: "to describe who are the kinds of people that report medical
debt and medical bankrupcy". Implied in this is "compared to people
who do not report these events". (If Jessica does not think so, I
hope that she will show an example of what a cluster-analysis might
find.)
Better I think would be a discriminant analysis to describe the
differences between the two groups (perhaps three, if Jessica
considers medical debt without bankrupcty and medical bankruptcy to
be different). This could be be done with -logit- and ---probit- (-
mlogit- and -mprobit- for three groups), all survey-enabled. (Stata
has other kinds of discriminant analysis-see help for -discrim- and -
candisc-, but these take no survey features except pweights.) Such
analyses could include interactions and might show, for example,
that the odds of being older and male are greater for the debt/
bankrupcy group than for the comparison group.
The most flexible way of describing group differences, to my mind, is
Classification and Regression Trees (CART); The only implementation
in STATA that I know of is the user-contributed -cart- command, but
it applies only to Cox regression and does not take weights.
-Steve
On Aug 28, 2008, at 11:55 AM, Nick Cox wrote:
What is BRFSS?
On the main question, it is evident that -cluster- does not support
any
kind of weights, so that is one short answer.
I am unclear on how in principle any kind of weights could inform
cluster analysis. Although there are different recipes, cluster
analysis
as implemented in Stata is in essence a more or less elaborate way of
quantifying information on similarity or differences between
observations in a multivariate space.
Suppose for example that I am in a survey, you are too, and several
other people are as well. Cluster analysis offers methods for plotting
me, you and the others in a space. How are those differences
affected by
the sampling design behind who is and who isn't in the dataset,
particularly as no parameter estimation or hypothesis testing is
involved?
Nick
[email protected]
Jessica M. Tullar, PhD
I am using a BRFSS dataset and therefore it has a complex sampling
design. I would like to describe who are the kinds of people that
report
medical debt and medical bankruptcy and therefore thought cluster
analysis might be appropriate.
I've looked through all the manuals and even searched survey analysis
and on cluster analyses and can't find the answer. Is there a way to
perform a cluster analysis and account for the survey weights?
A second possibility would involve creating a new representative
dataset
on which to perform the standard cluster analysis. How best would one
create a new dataset using the survey weighting which would
approximate
the population?
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/