> --- Kevin Daley <[email protected]> wrote:
> > I would like to use a statistic discussed by Agresti in his
> > categorical data analysis book that gives the probability that two
> > randomly selected independent observations in a given dataset will
> > end up in different categories of the given variable. The
> > statistic has a minimum value of 0 and a maximum value of J-1.
--- Maarten buis <[email protected]> wrote:
> If it is a probability than the maximum is 1. In that case you could
> compute it as follows:
>
> *---------- begin example -------------
> sysuse auto, clear
> preserve
> contract rep78 , percent(p) nomiss
> gen double psq = (p/100)^2
> sum psq, meanonly
> di 1-r(sum)
> restore
> *--------- end example -----------------
> (For more on how to use examples I sent to the Statalist, see
> http://home.fsw.vu.nl/m.buis/stata/exampleFAQ.html )
In the case above the two draws are draws with replacement, in which
case the maximum is 1-1/_N. The maximum variability is obtained when
each observation is in its own category, so there are _N categories
each with a probability of 1/_N. The probability of drawing the one
particular category twice is (1/_N)^2, and there are _N such
categories, so the probability of drawing a category twice is
_N*(1/_N)^2 is 1/_N. the probability of not drawing a category twice is
1-1/_N.
-- Maarten
-----------------------------------------
Maarten L. Buis
Department of Social Research Methodology
Vrije Universiteit Amsterdam
Boelelaan 1081
1081 HV Amsterdam
The Netherlands
visiting address:
Buitenveldertselaan 3 (Metropolitan), room Z434
+31 20 5986715
http://home.fsw.vu.nl/m.buis/
-----------------------------------------
__________________________________________________________
Sent from Yahoo! Mail.
A Smarter Inbox. http://uk.docs.yahoo.com/nowyoucan.html
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/