Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Pseudorandom Number from Supplied Table of Probabilities (Equivalent of SAS Rand("Table") Function)
From
Phil Clayton <[email protected]>
To
[email protected]
Subject
Re: st: Pseudorandom Number from Supplied Table of Probabilities (Equivalent of SAS Rand("Table") Function)
Date
Mon, 6 Aug 2012 23:09:21 +1000
Maybe something like this will help. This is for a pre-defined set of 3 categories with probabilities of 20%, 50% and 30% respectively. It is straightforward to extend the technique to more categories.
You can follow this with a -recode- or -merge- that converts the category variable into the corresponding number from your pre-defined set.
Phil
. clear
. set obs 10000
obs was 0, now 10000
. set seed 1
. gen random=runiform()
. egen category=cut(random), at(0 0.2 0.7 1) icodes
. tab category
category | Freq. Percent Cum.
------------+-----------------------------------
0 | 1,986 19.86 19.86
1 | 5,001 50.01 69.87
2 | 3,013 30.13 100.00
------------+-----------------------------------
Total | 10,000 100.00
On 06/08/2012, at 10:51 PM, Mustafa Hirji wrote:
> Greetings,
>
> I would like to select one number from a a pre-defined set, where the probability of selecting each number is known. The analogous case if my set had only two numbers would be a Bernoulli event which would be the equivalent of using
>
> rbinomial(1, p)
>
> where p is the probability of one of the events (and, obviously, (1-p) being the probability of the other).
>
> I am writing code where my set will be larger than just two items, however.
>
> SAS has a function
>
> rand("table", p1, p2, …, pn)
>
> that does this as is explained on this page (click "Tabled Distribution" from the contents list):
>
> http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a001466748.htm
>
> 1. Is there anything similar in Stata. I cannot find anything.
>
> 2. If not, how do you recommend I implement this? I have to thoughts:
>
> (a) manufacture a dataset with 100,000 observations (or any large number), populate a single variable from my set in proportions corresponding to the probability, shuffle it, and then randomly sample one observation. There's a slight loss of precision here, and it's inelegant in that I'd need to save my current dataset to create this one, and I have a few hundred cases where I need to run this.
>
> (b) I could loop through each element in my set, and execute the Bernoulli probability
>
> rbinomial(1, p)
>
> for each element until I get a non-zero result. p would be the probability corresponding to that element in the set divided by the cumulative probability of all remaining elements, inclusive, so as not to bias my result based on ordering. If nothing returned non-zero, I'd select the final element of my self. I'm not entirely sure that this approach is statistically correct, however.
>
> Any advice you can provide would be much appreciated!
>
> Mustafa Hirji
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/