Dear Statalisters and Stata experts,
I would like to know your expert opinion about a faster approach to
generate binomial random numbers using Stata. I wish to know if the
approach below is valid and if it makes some sense.
Well, for example, assume we have _N=1. Then, we have 25000 observations
and p= 0.25, which is the probability of the event. -rndbin- is quite
straighforward:
rndbin 25000 0.25 1
qui count if xb==1
However, I have to run -rndbin- millions of times, say, 10^15 times, count
the number of events (xb=1) and then summarize it to get a new variable.
That approach takes a lot of time and even using Mata functions this is
time-consuming.
Taking statistical aspects of the binomial distribution into account, may
I approximate that calculation using the following approach?
p = 0.25
observations = 25000
sd_of_p_hat = standard deviation of the p_hat
gene sd_of_p_hat= sqrt(((p)*(1-p))/(observations))
generate z = invnorm(uniform())
replace p = (z)*(sd_of_p_hat)+(p)
gene number_of_events= round(p*observations)
The latter approach is really faster (2-3 seconds for 100000 studies,
whereas -rndbin- are likely to take some hours, at least in my PC) and it
is likely to be unbiased for p�s between 0.3 and 0.7, the range I have to
work with in Human Genetics.
I will again grateful for any help and comments.
Best regards,
Tiago
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/