A alternative that would fit the desciption given by Kevin is:
Agresti (1996) An Introduction to Categorical Data Analysis. Hoboken
NJ: John Wiley.
Also the reference given by Nick is the second edition, which is much
expanded from the the first edition.
-- Maarten
--- Nick Cox <[email protected]> wrote:
> A reference, as requested by Steven Samuels in his question to Kevin
> Daley, is
>
> Agresti, A. 2002. Categorical data analysis. Hoboken NJ: John Wiley.
>
>
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Nick Cox
> Sent: 04 March 2008 18:04
> To: [email protected]
> Subject: RE: st: Measure of Variability in a Nominal Variable
>
> If p_i is proportion in category i,
> then SUM p_i^2 is the probability of being in the same category.
> (The sum is over categories, not observations.)
>
> The complement 1 - SUM p_i^2 is
> then the probability of being in different categories.
>
> The reciprocal 1 / SUM p_i^2 has a nice interpretation as the
> equivalent
> number
> of equally probable categories.
>
> One or more of these quantities arise under many different names
>
> Gini index (but NB that many other measures have also been
> called that)
>
> Simpson index in ecology (the same Simpson as Simpson's paradox)
>
>
> Herfindahl index in economics
>
> heterozygosity in genetics
>
> And no doubt others.
>
> Maarten gave one way to calculate it. Another is through -ineq- on
> SSC.
>
> Nick
> n.j.cox
>
> Maarten buis
>
> > --- Kevin Daley <[email protected]> wrote:
> > > I would like to use a statistic discussed by Agresti in his
> > > categorical data analysis book that gives the probability that
> two
> > > randomly selected independent observations in a given dataset
> will
> > > end up in different categories of the given variable. The
> > > statistic has a minimum value of 0 and a maximum value of J-1.
>
> --- Maarten buis <[email protected]> wrote:
> > If it is a probability than the maximum is 1. In that case you
> could
> > compute it as follows:
> >
> > *---------- begin example -------------
> > sysuse auto, clear
> > preserve
> > contract rep78 , percent(p) nomiss
> > gen double psq = (p/100)^2
> > sum psq, meanonly
> > di 1-r(sum)
> > restore
> > *--------- end example -----------------
> > (For more on how to use examples I sent to the Statalist, see
> > http://home.fsw.vu.nl/m.buis/stata/exampleFAQ.html )
>
> In the case above the two draws are draws with replacement, in which
> case the maximum is 1-1/_N. The maximum variability is obtained when
> each observation is in its own category, so there are _N categories
> each with a probability of 1/_N. The probability of drawing the one
> particular category twice is (1/_N)^2, and there are _N such
> categories, so the probability of drawing a category twice is
> _N*(1/_N)^2 is 1/_N. the probability of not drawing a category twice
> is
> 1-1/_N.
>
> *
> * For searches and help try:
> * http://www.stata.com/support/faqs/res/findit.html
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
-----------------------------------------
Maarten L. Buis
Department of Social Research Methodology
Vrije Universiteit Amsterdam
Boelelaan 1081
1081 HV Amsterdam
The Netherlands
visiting address:
Buitenveldertselaan 3 (Metropolitan), room Z434
+31 20 5986715
http://home.fsw.vu.nl/m.buis/
-----------------------------------------
__________________________________________________________
Sent from Yahoo! Mail.
A Smarter Inbox. http://uk.docs.yahoo.com/nowyoucan.html
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/