Re: st: Generating Random Number

From   Joseph Coveney <[email protected]>
To   Statalist <[email protected]>
Subject   Re: st: Generating Random Number
Date   Wed, 24 Jan 2007 11:58:50 +0900

I wrote:

local quit = 1
while (`quit') {
   generate double randu`quit' = uniform()
   sort randu`quit', stable
   capture assert randu`quit' > randu`quit'[_n-1] in 2/l
   if _rc `++quit'
   else continue, break


Now I remember:  the sorting on random numbers needed to be hierarchical in
order to assure that the iterations would eventually end, especially with
the large dataset.  What I ended up with was something more akin to

set memory 100M
set obs `=2e6'
set seed `=date("2007-01-24", "ymd")'
generate long surrogate_id = _n
generate byte duplicates = 1
local pass 1
while (`pass') {
   generate double randu`pass' = uniform() if duplicates
   sort randu*, stable
   replace duplicates = 0
   replace duplicates = (randu`pass' == randu`pass'[_n-1]) ///
     if !mi(randu`pass') & _n > 1
   capture assert duplicates == 0
   if _rc {
       replace duplicates = 1 if (duplicates[_n + 1] == 1)
       local pass = `pass' + 1
   else continue, break
drop randu* duplicates
display in smcl as text "Number of passes: " as result `pass'

This example (two million rows) takes two passes even with double-precision
random-number variables.

All this effort to explicitly rerandomize duplicate random numbers arose
when it seemed that "randomized" in Stata's documentation for -sort ,
stable- meant more "haphazard" and less "in a reproducible pseudorandom
sequence." (See the example below typed from the keyboard.)  It might be
that -sort-'s randomization runs off a different seed.  In any event, an
observation like the one below threw me, and I resorted to hierarchical
randomization in order to assure myself unambiguous reproducibility of the

Joseph Coveney

. clear

. set more off

. set seed 1234567890

. set obs 20
obs was 0, now 20

. generate byte id = _n

. generate double randu = uniform()

. replace randu = randu[1] in 2
(1 real change made)

. sort randu

. list id if inrange(id, 1, 2)

    | id |
 5. |  2 |
 6. |  1 |

. sort id

. set seed 1234567890

. replace randu = uniform()
(1 real change made)

. replace randu = randu[1] in 2
(1 real change made)

. sort randu

. list id if inrange(id, 1, 2)

    | id |
 5. |  1 |
 6. |  2 |


