Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | "Allan Reese (Cefas)" <allan.reese@cefas.co.uk> |
To | <statalist@hsphsun2.harvard.edu> |
Subject | st: Re: Random start to random number sequence |
Date | Fri, 20 Aug 2010 11:12:39 +0100 |
A few replies to Bill Gould's comments, to clarify my position. BG: This next will surprise you until you think about it, but the best way to use a pseudo-random-number generator is to set the seed only once, the day you get it, and to just let it continue on its merry way until you've used it up! AR: Agree - see below. BG: Because you set the seed only once, we do not need to discuss randomness. Randomness is a property of sequences of numbers. AR: Disagree, but this is philosophy not Stata. Randomness is a property of the generating mechanism, which we never in practice know. A recent letter in Nature ("Random numbers certified by Bell's theorem", Pironio et al 15/Apr/2010) was way over my head but pointed out the problem of knowing whether "random" numbers were simply being fed to you by an intelligence that was choosing them. P&friends used two quantum-entangled atoms separated by approximately one metre. On a finite computer, the Pseudo-RNG generates a sequence of bit patterns, interpreted as binary numbers, such that "short" sequences of the numbers *pass a series of tests of apparent randomness*. Eventually the series must get back to the opening value, and will cycle. When the cycle length is billions, "short" will easily cover sequences of a few million. Knuth (Art of computer programming) shows that generally if you have a good PRNG then any attempt to make it "more" random will introduce non-randomness. Bill made the same point, and the archived messages that prompted me to write include those that suggest resetting the seed within a simulation loop. NO! It's better to just continue the PNG cycle. BG: We will still recommend you set the seed randomly, however, because we will want randomness in numbers generated across researchers. Most pseudo-random-number generator designers would prefer it if you used their generators in this way. AR: Disagree slightly, as the point is just to use a different seed each session *unless you wish to reproduce a particular subset of the PRN cycle*. As [D]generate states, "Without loss of pseudorandomness, the seed may be set to small numbers." So a good solution would be to save the "code" at the end of each session and use that as the seed for the next session. Or save an incrementing integer "mysessionnumber" to be the seed at the start of the next session. I like the way Stata generates PRNs, but the warning that the seed is reset to 123456789 is easily overlooked. If you are running simulations to generate Monte-Carlo results, this probably does not matter. On the day, I wanted to generate a random sequence for randomizing allocation of treatments, and here it might cause a problem. I might, for example, always end up allocating the control treatment or the extreme treatment to the same physical location. (think numbered plots in a field) BG: So what was wrong with Allan's original suggestion? Allen based the seed on the time of day. Let's say Allan gets to the office around the same time every day. AR: BG has not experienced the traffic improvements taking place in Weymouth as preparation for the Olympic sailing in 2012! Travel time is not predictable. Nor am I. BG: Let's assume Allan runs simulations around the same time on days he runs them. Perhaps he starts them right after lunch, or just before going home. Alan is now drawing seeds in close proximity to each other. AR: But as [D]generate notes, n and n+1 as seeds will give very different starting points in the PNG cycle. BG: He is trusting H() to jumble that for him. AR: Wrong, as I set the seed only once in a session. BG: Moreover, he is drawing from such a reduced set that over a period of time, Allan is likely to choose the same seed! AR: I wrote, "... you can use the system clock which changes every second. This will not make the subsequent sequence any more (or less) random, but will make each session unique." I almost wrote "almost certainly unique" but thought that was pedantry. Let's assume I run Stata most days at work and often exit and restart. Say, 500 sessions a year and I've been using Stata for 25 years. Let's also assume there are times of the day I'm unlikely to be running Stata. That suggests maybe 12x60x60 (=43200) options for the clock time and 12500 occasions I might have set the seed. Like with the "birthday problem", you may be surprised at the number of repeats, but they don't matter. In practice, I'm doing simulation or generating RN tables in a small proportion of sessions. Using the date+time and dropping the trailing 000 (caused by rounding the millisecond system clock to whole second) is, however, clearer. Yours, using a non-random selection from Bill's spellings Allan *********************************************************************************** This email and any attachments are intended for the named recipient only. Its unauthorised use, distribution, disclosure, storage or copying is not permitted. If you have received it in error, please destroy all copies and notify the sender. In messages of a non-business nature, the views and opinions expressed are the author's own and do not necessarily reflect those of the organisation from which it is sent. All emails may be subject to monitoring. *********************************************************************************** * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/