Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Re:Interesting results from a simulation

From	Steven Samuels <[email protected]>
To	[email protected]
Subject	Re: st: Re:Interesting results from a simulation
Date	Sun, 20 Mar 2011 18:22:03 -0400

This is a consequence of the combinatorics for random sampling with replacement when N is large

p = single draw probability =1/N
P{draw a specified element k times} = comb(N,k) * p^k ( (1-p)^(N-k)

P0=  P{draw no times ) = (1-1/N)^N  ~ exp(-1) for N large ~ .3679
P1 = P{draw 1 time ) =   comb(N,1)* 1/N * (1-1/N)^(N-1) = N *(1/N) * (1-1/N)^(N-1) =  P0* N/(N-1)~ .3679
P2 = P(draw 2 times) =   comb(N,2)*(1/N)^2 *(1-1/N)^(N-1) = (N*(N-1)/2)*(1/N)^2 *(1-1/N)^(N-2) ~ (1/2) p0*(N/(N-1))^2 ~.3679/2
etc.

Steve
[email protected]

On Mar 20, 2011, at 4:18 PM, Victor Zammit wrote:

Dear Statalisters,

I have simulated drawing at random,one observation with replacement,at a time ,for 30,000 times, from a finite population of 30,000 observations.The same process was repeated 200 times.Then I made a count for every observation and discovered that any given observation,after 30000 trials has the probability of realising 0 times(c0) is ~36.756%,the probability of realing 1 time (c1) is ~36.83% ,the probability of realing 2 times (c2) is 1/2 as much,i.e. ~18.4%.That of realising 3 time (c3) is about 1/3 of 18.4 = ~6.11%.and the pattern basically continues,as the summation at the bottom,demonstrates.



c0 c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 n

11018 11054 5549 1807 450 107 13 2 0 0 0 1

11133 10980 5403 1884 471 102 21 6 0 0 0 2

11018 11054 5549 1807 450 107 13 2 0 0 0 3

....................................................................................................................................................................................

10886 11273 5482 1797 453 97 11 0 0 1 0 134

11026 11013 5554 1847 472 78 10 0 0 0 0 135

11019 11030 5574 1805 465 96 10 1 0 0 0 136

......................................................................................................................................................................................

11037 10981 5610 1823 443 86 13 6 1 0 0 199

11011 11032 5583 1827 439 85 21 2 0 0 0 200

PS If anyone is interested I would provide the complete dataset.Please let me know.

. summ

Variable Obs Mean Std. Dev. Min Max


c0 200 11026.87 61.53633 10886 11184

c1 200 11050.32 82.50102 10839 11273

c2 200 5520.295 69.39095 5366 5672

c3 200 1834.235 40.62328 1761 1930

c4 200 457.595 15.43835 415 502


c5 200 91.255 10.32536 72 117

c6 200 16.56 4.544188 7 25

c7 200 2.675 2.039651 0 7

c8 200 .195 .3971949 0 1

c9 200 .01 .0997484 0 1


c10 200 0 0 0 0

Dividing the above variables c0-c10 by 30000 to get the respective probabilities and then summ. results in:

. summ

Variable Obs Mean Std. Dev. Min Max


c0 200 .3675622 .0020512 .3628667 .3728

c1 200 .3683438 .00275 .3613 .3757667

c2 200 .1840098 .002313 .1788667 .1890667

c3 200 .0611412 .0013541 .0587 .0643333

c4 200 .0152532 .0005146 .0138333 .0167333


c5 200 .0030418 .0003442 .0024 .0039

c6 200 .000552 .0001515 .0002333 .0008333

c7 200 .0000892 .000068 0 .0002333

c8 200 6.50e-06 .0000132 0 .0000333

c9 200 3.33e-07 3.32e-06 0 .0000333


c10 200 0 0 0 0

Note:the most number of times that a given observation is drawn in one set of 30000 draws is 9,and in my experiment happened in the 134th loop.

I find,the probability pattern,quite surprising.Can anyone provide any intuition on this ? Why is the probability of an observation not realising ,equal

to that of realising just once,(= ~.368),in the number of trials the size of the population ?

Victor Zammit.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- RE: st: variable format
  - From: Grace Jessie <[email protected]>
- Re: st: variable format
  - From: Nick Cox <[email protected]>
- st: Re:Interesting results from a simulation
  - From: "Victor Zammit" <[email protected]>

Prev by Date: Re: st: RE: Hausman-Taylor and Autocorrelation
Next by Date: RE: st: RE: Hausman-Taylor and Autocorrelation
Previous by thread: Re: st: Re:Interesting results from a simulation
Next by thread: Re: st: Re:Interesting results from a simulation
Index(es):
- Date
- Thread