Another option is to use Latent Class Analysis (LCA) and
estimate "latent score" as a composite score. LCA may be considered as
a variant of factor analysis with categorical manifest variables. One
problem, however, is that STATA does not have LCA facility. A free
downloadable program, called LEM, is available at:
http://www.kub.nl/faculteiten/fsw/organisatie/departementen/mto/software
2.html
As Nick has pointed out, you have to be careful about masking the
variables (guide by theory; examine first separately to see the
association and effect directions; whether the variables have inverse
association (with outcomes); etc). Another point, LEM can not handle
missing values.
In recent days, principal component analysis has stirred to generate
composite index (e.g., creating wealth index from household items).
Although one assumption for pc/factor analysis is that the manifest
variables are continuous in metric scale, for generating pc score as a
means of "data reduction", the assumption could be relaxed. Natural
ordering, however, is warranted.
Besides the two methods mentioned earlier, "classification of
individuals by attributes" ( if viewed as "clustering of individuals")
may be done with other statistical methods – potentially. One such
method is implemented in STATA as "cluster analysis commands." [This
para for info only; considering the distribution of 4 variables
mentioned, this may not be a good choice; this method is more suitable
with several manifest variables, specially when little theory is known
about the underlying common variables for the clustering effects]
Best wishes,
Saifuddin
Saifuddin Ahmed, MD, PhD
Johns Hopkins Bloomberg School of Public Health
----- Original Message -----
From: Nick Cox <[email protected]>
Date: Friday, August 23, 2002 5:53 am
Subject: st: RE: Re: creating composite measures
> Seth D. Hannah asked
>
> > > Can someone help me with creating a composite measure of
> > prejudice from
> > > four individual variables in my data set which measure prejudice.
> > > the variables are:
> > >
> > > deasyblk: perception of blacks as easy to get along with
> > > dwelfblk: perception of blacks as likely to be on welfare
> > > dintlblk: perception of blacks as intelligent
> > > drichblk: perception of blacks as rich or poor
> > >
> > > the variables are distributed as follows:
> > >
> > > . tab deasyblk
> > >
> > > easy to get along |
> > > w/blacks | Freq. Percent
> > Cum.
> > > ---------------------+-----------------------------------
> > > easy to get along w/ | 915 10.26 10.26
> > > 2 | 1052
> > 11.80 22.06
> > > 3 | 1379
> > 15.47 37.53
> > > neither | 2722 30.53
> > 68.06
> > > 5 | 1143
> > 12.82 80.88
> > > 6 | 638
> > 7.16 88.03
> > > hard to get along w/ | 547 6.14 94.17
> > > don't know... | 418 4.69 98.86
> > > missing | 102 1.14 100.00
> > > ---------------------+-----------------------------------
> > > Total | 8916 100.00
> > >
> > > . tab dwelfblk
> > >
> > > self-supporting: |
> > > blacks | Freq. Percent Cum.
> > > --------------------+-----------------------------------
> > > prefer self-support | 754 8.46 8.46
> > > 2 | 521
> > 5.84 14.30
> > > 3 | 879
> > 9.86 24.16
> > > neither | 2132 23.91 48.07
> > > 5 | 1723
> > 19.32 67.40
> > > 6 | 1332 14.94
> > 82.34
> > > prefer welfare | 1046 11.73 94.07
> > > don't know... | 425 4.77 98.83
> > > missing | 104 1.17 100.00
> > > --------------------+-----------------------------------
> > > Total | 8916 100.00
> > >
> > > . tab dintlblk
> > >
> > > intelligence: |
> > > blacks | Freq. Percent Cum.
> > > --------------+-----------------------------------
> > > intelligent | 723 8.11 8.11
> > > 2 | 807 9.05 17.16
> > > 3 | 1597 17.91 35.07
> > > neither | 3259 36.55 71.62
> > > 5 | 1255 14.08 85.70
> > > 6 | 479 5.37 91.0
> > > unintelligent | 207 2.32 93.39
> > > don't know... | 481 5.39 98.79
> > > missing | 108 1.21 100.00
> > > --------------+-----------------------------------
> > > Total | 8916 100.00
> > >
> > > . tab drichblk
> > >
> > > rich-poor: |
> > > blacks | Freq. Percent Cum.
> > > --------------+-----------------------------------
> > > rich | 59 0.66 0.66
> > > 2 | 193 2.16 2.83
> > > 3 | 499 5.60 8.42
> > > neither | 2101 23.56 31.99
> > > 5 | 2506 28.11 60.09
> > > 6 | 2137 23.97 84.06
> > > poor | 970 10.88 94.9
> > > don't know... | 371 4.16 99.10
> > > missing | 80 0.90 100.00
> > > --------------+-----------------------------------
> > > Total | 8916 100.00
> > >
> > > What I want to do is combine these four variables into
> > one measure of
> > > prejudice, which will become a dependent variable in some
> > of my models.
> > >
> > > The only way I could think to do it was to create a new
> > variable prejblk
> > > with numerical values 1 through 7 that equal the sums of
> > the respective
> > > 1 through 7's
> > > from my four variables...
> > >
> > > gen prejblk=.
> > > replace prejblk=1 if
> > drichblk==1|dwelfblk==1|deasyblk==1|dintlblk==1
> > > replace prejblk=2 if
> > drichblk==2|dwelfblk==2|deasyblk==2|dintlblk==2
> > > etc.
> > >
> > > somehow this doesn't seem right, please help!
>
> Bo Cutter
>
> > As a first step you may want to look at a factor analysis (Principal
> > components). This analysis will look at how and whether
> > you can reduce your
> > 5 variables into one or more variables.
>
> Nick Winter
>
> > I would consider averaging the variables, after reversing the coding
> for
> > the ones that are coded with opposite "sense". (e.g., so that
> higher
> > scores on each indicates more tolerant attitudes)
>
> > Look at egen rmean(...)
>
> Why do you need a composite measure? It is often a good way
> of blurring important distinctions. If in fact these measures
> are highly related, then one will serve as well as any other.
> If, as seems a little more likely, they measure rather
> different things, it is not clear that any composite measure
> will add much to looking separately at your different responses.
>
> In any case, any kind of averaging (means or PCA) has to be smart
> about
> don't knows and missings, which I guess are coded higher
> than the other values. At first sight, the only clean way
> to deal with those is to omit any observation with any don't know
> or missing from the averaging.
>
> Also contemplate
>
> gra deasyblk dwelfblk dintlblk drichblk, matrix j(1)
>
>
> Nick
> [email protected]
>
> *
> * For searches and help try:
> * http://www.stata.com/support/faqs/res/findit.html
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/