Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Default Seed of Stata 12
From
Henrik Støvring <[email protected]>
To
"[email protected]" <[email protected]>
Subject
Re: st: Default Seed of Stata 12
Date
Thu, 25 Oct 2012 09:20:08 +0000
Thanks for this clear, fascinating and very detailed presentation of the
inner workings of Stata! I am not about to apply for a position at
Stata, as you suggest interested readers should consider to do, but I
must admit that every time you do one of these valuable pieces on the
inner mechanics of Stata I feel a little more in doubt on whether I
should reconsider. :-)
Best,
Henrik
On 10/24/2012 06:13 PM, William Gould, StataCorp LP wrote:
> Rasool Bux <[email protected]> asked,
>
>> Can anybody tell me the default system values i.e. seed etc.
>> of Stata 12.1
> The random-number seed is set to 123456789 each time Stata is launched.
> As Maarten Buis <[email protected]> noted, the value changes during
> the Stata session as you use the random-number generators.
>
>
> More information
> ----------------
>
> I wrote this response mainly so I could say, "123456789", but
> Maarten also wrote,
>
>> The default can change during a Stata session.
>>
>> You can see the current value of the seed by typing di c(seed).
>> See -help creturn- for this and other system values.
>> Also see -help set seed- for an explanation what that weird string
>> returned by -c(seed)- actually is.
> and now I feel obligated to provide more details than you will find in
> the manuals. So for those who are curious:
>
> The random-number generator has something called a state. When you
> -set seed-, you are specifying the state. Each time you ask for
> a random number, say by using the -runiform()- function, the
> state is recursively updated -- new_state = f(current_state) -- and
> then a random number is produced based on the value of new_state.
> The code works like this:
>
> random_number:
> new_state = f(current_state)
> random_number = g(new_state)
> current_state = new_state
> return(random_number)
>
> Now here's what's interesting: The state has more bits than the
> random number. In the case of the KISS random number generator, the
> random numbers produced are 32 bit values, and the state is a 128 bit
> value! Having more bits for the state than the random number is a
> general property of random-number generators and not just a property
> of KISS.
>
> When you set the seed, say by typing
>
> . set seed 123456789
>
> you are setting the value of current_state. A number like 123456789
> is a 32-bit value. Somehow, that 32-bit value is converted to
> a 128-bit value and, no matter how we do it, obviously state can
> take on only one of 2^32 values.
>
> The seeting of the sed works like this:
>
> set_seed_32_bit_value:
> current_state = h(32_bit_value)
> burn in current_state by repeating 100 times {
> produce random number (and throw it away)
> }
>
> Maarten mentioned -c(seed)- and a second syntax of seed which allows
> you to specify the full state. Let me explain.
>
> First off, -c(seed)- is a misleading name because it is not the seed,
> it is the state, which is related to the seed. -c(seed)- after setting
> the seed to the 32-bit value 123456789 looks like this,
>
> . set seed 123456789
>
> . display c(seed)
> X075bcd151f123bb5159a55e50022865700043e55
>
> The strange looking X075bcd151f123bb5159a55e50022865700043e55 is one
> way of writing the full 128-bit value. X0765...55 is the result
> of running set_seed_32_bit_value on the 32-bit number 123456789.
>
> Remember that the state is updated each time a random number is
> generated. Let's look at the state value after generating a random
> number:
>
> . * we have already set seed 12345678
>
> . display runiform()
> .13698408
>
> . display c(seed)
> X5b15215854f24767556efaba82801d9b0004330a
>
> Think of the random-number generator as producing an infinitely long
> sequence of states:
>
>
> -------------------------------------------------------------------------
> state0 -> state1 -> state2 -> ... -> state{2^124} -> state0 -> state1 ...
>
> where,
>
> state0 = X075bcd151f123bb5159a55e50022865700043e55,
>
> state1 = X5b15215854f24767556efaba82801d9b0004330a,
>
> and so on,
>
> and where the i-th pseudo random number is given by g(state{i}).
> -------------------------------------------------------------------------
>
> The sequence may be infinitely long, but it repeats. The period is
> approximately 2^124 in the case of KISS.
>
>
> The easy-to-type 32-bit seed provides 2^32 entry points into this sequence
>
> ---------------------------------------------------------------------
> state0 -> state1 -> ... -> state{2^96) -> ... -> state{2^124) -> ...
> | | |
> 123456789 ???????? ??????
> ---------------------------------------------------------------------
>
> I put ?????? in the above because I didn't bother to work out
> the 32-bit numeric values corresponding to the particular states.
> What's important is the function state = h(32_bit_seed) is
> designed to space the entry points approximately equally.
> Also important to understand is that, because the sequence is
> infinitely long, my numbering of the states is arbitrary.
> I could have picked any one of the 2^124+1 states and labeled it 0.
>
> What's important is that the 32-bit seed provides an entry point
> into this sequence. In the last experiment we tried,
>
> . set seed 123456789
>
> . display runiform()
> .13698408
>
> . display c(seed)
> X5b15215854f24767556efaba82801d9b0004330a
>
> There is no 32-bit seed that you could set that corresponds to that
> state.
>
> And that is why the value of -c(seed)- looks so strange: It provides
> every possible entry point into the sequence, whereas -set seed #-
> provides merely a subset.
>
> Do I have to say it? If this kind of thing interests you, consider a
> career at StataCorp.
>
> -- Bill
> [email protected]
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
>
--
*Henrik Støvring, PhD*
Associate professor
[email protected]
Phone +45 8716 7991
Fax +45 8716 7305
Web: au.dk/en/stovring@biostat <http://au.dk/en/stovring@biostat>
Department of Public Health
Biostatistics
University of Aarhus
Bartholins Allé 2, Bldg 1261, 217
DK-8000 Aarhus C
Denmark
Department of Public Health, Aarhus University
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/