Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Re:Interesting results from a simulation
From
Steven Samuels <[email protected]>
To
[email protected]
Subject
Re: st: Re:Interesting results from a simulation
Date
Sun, 20 Mar 2011 18:22:03 -0400
This is a consequence of the combinatorics for random sampling with replacement when N is large
p = single draw probability =1/N
P{draw a specified element k times} = comb(N,k) * p^k ( (1-p)^(N-k)
P0= P{draw no times ) = (1-1/N)^N ~ exp(-1) for N large ~ .3679
P1 = P{draw 1 time ) = comb(N,1)* 1/N * (1-1/N)^(N-1) = N *(1/N) * (1-1/N)^(N-1) = P0* N/(N-1)~ .3679
P2 = P(draw 2 times) = comb(N,2)*(1/N)^2 *(1-1/N)^(N-1) = (N*(N-1)/2)*(1/N)^2 *(1-1/N)^(N-2) ~ (1/2) p0*(N/(N-1))^2 ~.3679/2
etc.
Steve
[email protected]
On Mar 20, 2011, at 4:18 PM, Victor Zammit wrote:
Dear Statalisters,
I have simulated drawing at random,one observation with replacement,at a time ,for 30,000 times, from a finite population of 30,000 observations.The same process was repeated 200 times.Then I made a count for every observation and discovered that any given observation,after 30000 trials has the probability of realising 0 times(c0) is ~36.756%,the probability of realing 1 time (c1) is ~36.83% ,the probability of realing 2 times (c2) is 1/2 as much,i.e. ~18.4%.That of realising 3 time (c3) is about 1/3 of 18.4 = ~6.11%.and the pattern basically continues,as the summation at the bottom,demonstrates.
c0 c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 n
11018 11054 5549 1807 450 107 13 2 0 0 0 1
11133 10980 5403 1884 471 102 21 6 0 0 0 2
11018 11054 5549 1807 450 107 13 2 0 0 0 3
....................................................................................................................................................................................
10886 11273 5482 1797 453 97 11 0 0 1 0 134
11026 11013 5554 1847 472 78 10 0 0 0 0 135
11019 11030 5574 1805 465 96 10 1 0 0 0 136
......................................................................................................................................................................................
11037 10981 5610 1823 443 86 13 6 1 0 0 199
11011 11032 5583 1827 439 85 21 2 0 0 0 200
PS If anyone is interested I would provide the complete dataset.Please let me know.
. summ
Variable Obs Mean Std. Dev. Min Max
c0 200 11026.87 61.53633 10886 11184
c1 200 11050.32 82.50102 10839 11273
c2 200 5520.295 69.39095 5366 5672
c3 200 1834.235 40.62328 1761 1930
c4 200 457.595 15.43835 415 502
c5 200 91.255 10.32536 72 117
c6 200 16.56 4.544188 7 25
c7 200 2.675 2.039651 0 7
c8 200 .195 .3971949 0 1
c9 200 .01 .0997484 0 1
c10 200 0 0 0 0
Dividing the above variables c0-c10 by 30000 to get the respective probabilities and then summ. results in:
. summ
Variable Obs Mean Std. Dev. Min Max
c0 200 .3675622 .0020512 .3628667 .3728
c1 200 .3683438 .00275 .3613 .3757667
c2 200 .1840098 .002313 .1788667 .1890667
c3 200 .0611412 .0013541 .0587 .0643333
c4 200 .0152532 .0005146 .0138333 .0167333
c5 200 .0030418 .0003442 .0024 .0039
c6 200 .000552 .0001515 .0002333 .0008333
c7 200 .0000892 .000068 0 .0002333
c8 200 6.50e-06 .0000132 0 .0000333
c9 200 3.33e-07 3.32e-06 0 .0000333
c10 200 0 0 0 0
Note:the most number of times that a given observation is drawn in one set of 30000 draws is 9,and in my experiment happened in the 134th loop.
I find,the probability pattern,quite surprising.Can anyone provide any intuition on this ? Why is the probability of an observation not realising ,equal
to that of realising just once,(= ~.368),in the number of trials the size of the population ?
Victor Zammit.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/