Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: RE: Hypergeometric Distribution


From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   RE: st: RE: Hypergeometric Distribution
Date   Thu, 23 Aug 2007 10:25:03 +0100

Consider 

comb(K, k) * comb(N - K, n - k) / comb(N, n)

When I look at that, my main worry would be that 
the numerator could get rather large before it
is scaled down by the denominator. Hence I
would try 

exp(ln(comb(K, k)) + ln(comb(N - k, n - k)) - ln(comb(N, n))) 

as a check. I know that 0s will become missings, but 
it's my understanding that in such cases the resulting
probabilities should all be 0 in any case. 

It may be that any StataCorp function, say 

hyperg(N, n, K, k), 

would be just be this underneath. But perhaps not. 

Nick 
[email protected] 

Marcello Pagano
 
> The hypergeometric plays a central role in sampling when 
> sampling from a 
> finite population.  The binomial provides an approximation for large 
> samples, but why rely on approximations today when they are not 
> necessary? and how good is the approximation, anyway?  Possibly the 
> reliance on the approximation provided by the binomial has lulled us 
> into a complacency that contributed to the "evidence since 1999"?
> 
> I did research a little with -comb( )- and that works pretty 
> well, but I 
> did a very limited study.  A Stata function with all its usual 
> associated robustness and accuracy would be nice, in my opinion.
 
       
Nick Cox 

> >>>>> Roger's posting includes what I presume is an allusion to 
> >>>>> an -egen- function _ghyper.ado that I wrote in 1999. 
> >>>>>
> >>>>> I withdrew this program as redundant some years ago, 
> >>>>> given that you can use something like 
> >>>>>
> >>>>> comb(K, k) * comb(N - K, n - k) / comb(N, n)
> >>>>>
> >>>>> wherever you want. In context N, K, n, k may be 
> >>>>> variables, scalars or placeholders for numeric
> >>>>> constants, or any mixture thereof. 
> >>>>>
> >>>>> This might need a wrapper to yield zeros where 
> >>>>> appropriate, or it might need care whenever 
> >>>>> individual terms get very large, but otherwise
> >>>>> does it raise any problems? 

Marcello Pagano
           
> >>>>>> Does anyone have or know of Stata code to calculate the 
> >>>> Hypergeometric Distribution accurately?
> >>>>>>
> >>>>>> See Journal of Discrete Algorithms ,  Volume 5 ,  Issue 2  
> >>>>>>             
> >>>> (June 2007) 
> >>>>         
> >>>>>> Pages: 341-347 for an article by Berkopec, HyperQuick 
> >>>> algorithm for discrete hypergeometric distribution 
> > 
<http://portal.acm.org/citation.cfm?id=1240586&coll=GUIDE&dl=GUIDE&CFID=
   

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2025 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index