Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | "William Gould, StataCorp LP" <wgould@stata.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: Default Seed of Stata 12 |
Date | Wed, 24 Oct 2012 11:13:38 -0500 |
Rasool Bux <rasool.bux@aku.edu> asked, > Can anybody tell me the default system values i.e. seed etc. > of Stata 12.1 The random-number seed is set to 123456789 each time Stata is launched. As Maarten Buis <maartenlbuis@gmail.com> noted, the value changes during the Stata session as you use the random-number generators. More information ---------------- I wrote this response mainly so I could say, "123456789", but Maarten also wrote, > The default can change during a Stata session. > > You can see the current value of the seed by typing di c(seed). > See -help creturn- for this and other system values. > Also see -help set seed- for an explanation what that weird string > returned by -c(seed)- actually is. and now I feel obligated to provide more details than you will find in the manuals. So for those who are curious: The random-number generator has something called a state. When you -set seed-, you are specifying the state. Each time you ask for a random number, say by using the -runiform()- function, the state is recursively updated -- new_state = f(current_state) -- and then a random number is produced based on the value of new_state. The code works like this: random_number: new_state = f(current_state) random_number = g(new_state) current_state = new_state return(random_number) Now here's what's interesting: The state has more bits than the random number. In the case of the KISS random number generator, the random numbers produced are 32 bit values, and the state is a 128 bit value! Having more bits for the state than the random number is a general property of random-number generators and not just a property of KISS. When you set the seed, say by typing . set seed 123456789 you are setting the value of current_state. A number like 123456789 is a 32-bit value. Somehow, that 32-bit value is converted to a 128-bit value and, no matter how we do it, obviously state can take on only one of 2^32 values. The seeting of the sed works like this: set_seed_32_bit_value: current_state = h(32_bit_value) burn in current_state by repeating 100 times { produce random number (and throw it away) } Maarten mentioned -c(seed)- and a second syntax of seed which allows you to specify the full state. Let me explain. First off, -c(seed)- is a misleading name because it is not the seed, it is the state, which is related to the seed. -c(seed)- after setting the seed to the 32-bit value 123456789 looks like this, . set seed 123456789 . display c(seed) X075bcd151f123bb5159a55e50022865700043e55 The strange looking X075bcd151f123bb5159a55e50022865700043e55 is one way of writing the full 128-bit value. X0765...55 is the result of running set_seed_32_bit_value on the 32-bit number 123456789. Remember that the state is updated each time a random number is generated. Let's look at the state value after generating a random number: . * we have already set seed 12345678 . display runiform() .13698408 . display c(seed) X5b15215854f24767556efaba82801d9b0004330a Think of the random-number generator as producing an infinitely long sequence of states: ------------------------------------------------------------------------- state0 -> state1 -> state2 -> ... -> state{2^124} -> state0 -> state1 ... where, state0 = X075bcd151f123bb5159a55e50022865700043e55, state1 = X5b15215854f24767556efaba82801d9b0004330a, and so on, and where the i-th pseudo random number is given by g(state{i}). ------------------------------------------------------------------------- The sequence may be infinitely long, but it repeats. The period is approximately 2^124 in the case of KISS. The easy-to-type 32-bit seed provides 2^32 entry points into this sequence --------------------------------------------------------------------- state0 -> state1 -> ... -> state{2^96) -> ... -> state{2^124) -> ... | | | 123456789 ???????? ?????? --------------------------------------------------------------------- I put ?????? in the above because I didn't bother to work out the 32-bit numeric values corresponding to the particular states. What's important is the function state = h(32_bit_seed) is designed to space the entry points approximately equally. Also important to understand is that, because the sequence is infinitely long, my numbering of the states is arbitrary. I could have picked any one of the 2^124+1 states and labeled it 0. What's important is that the 32-bit seed provides an entry point into this sequence. In the last experiment we tried, . set seed 123456789 . display runiform() .13698408 . display c(seed) X5b15215854f24767556efaba82801d9b0004330a There is no 32-bit seed that you could set that corresponds to that state. And that is why the value of -c(seed)- looks so strange: It provides every possible entry point into the sequence, whereas -set seed #- provides merely a subset. Do I have to say it? If this kind of thing interests you, consider a career at StataCorp. -- Bill wgould@stata.com * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/