Jann Ben <[email protected]> asks about generating random variables
from a given distribution:
> I would like to generate random variables with the density
>
> f(x) = 1/2 * (2x^2 + 4x + 1) * exp(-2x), x>=0
>
> Can anyone help me with this?
Fair warning -- the following reply is rather long, consisting of Mata code, a
couple of do-files and an ado-file. I couldn't help myself.
I used Stata to draw the function, just to get a feel for its shape; then I
integrated f(z) from 0 to x to get F(x), the CDF of f(x):
F(x) = 1 - 0.5 * (x + 1) * (x + 2) * exp(-2*x)
Now all I need to do is invert F(x) to get the quantile function of f(x); once
I have the quantile function of F(x)--invF(x) say--then I can feed it random
uniforms to get randomly generated quantiles from the "f(x)" distribution.
Unfortunately, if there is a closed form for the inverse function of F(x), I'd
be hard pressed to produce it. Looks like we'll have to solve for invF(x)
numerically. This is a great problem for Mata to solve.
So I decided to write my own Newton-Raphson root finder (note that I could
have used bisection or something else, but I have a particular fondness for
this algorithm).
A quick google search will uncover several sites that explain the basic
Newton-Raphson algorithm (for those who don't have a Numerical Analysis book
handy). The basic idea is to use a first order Taylor expansion of F(x) (the
function we want to invert) to develop an update equation which we iterate
until convergence (or we get tired of waiting). Start with an initial guess
x0, the new guess x1 is then computed using
x1 = x0 - F(x0)/F'(x0) = x0 - F(x0)/f(x0)
In our case the root we really want to find is for G(x) = F(x) - y; where y is
a (randomly generated) value in the range of F(x). So our update equation is
x1 = x0 - ( F(x0) - y )/f(x0)
Now that we have all the mathematical statistics out of the way, let's write
some code.
Here is the do-file for my Mata function -mynrinverter()-. The comments
document what I did here. Notice that two of the arguments to my Mata
function are pointers to the two functions F() and f(); I could have hard
coded F() and f(), but decided to go for generality--it really didn't cost me
much, but will pay dividends when I want to use -mynrinverter()- on a similar
problem at a later date.
***** BEGIN: mynrinverter.do
* mynrinverter.do
mata :
mata clear
mata set matastrict on
real scalar mynrinverter(
real scalar y, // y = f(x)
pointer(real scalar function) f, // f() -- the function
pointer(real scalar function) df, // f'() -- f()'s derivative
real scalar maxiter, // maximum iterations allowed
real scalar epsilon // convergence criterion
)
{
real scalar x0 // the current guess
real scalar x1 // the updated guess
real scalar delta // = [ f(x0) - y ]/f'(x0)
real scalar iter // iteration counter
// input missing => output missing
if (missing(y)) return(.)
// set the default value for the maximum iterations
if (missing(maxiter) | maxiter < 1) {
maxiter = 100
}
// set the default value for the convergence criterion
if (missing(epsilon) | epsilon <= 0) {
epsilon = 1e-9
}
// we will use the input value as our initial guess; this is probably
// not a good initial guess for all problems, but it will be find for
// the problem at hand
x0 = y
x1 = y
// set the step value to the initial guess to make sure we get inside
// the loop
delta = x0
iter = 0
while (abs(delta/x0) > epsilon) {
// update the current guess
x0 = x1
// get the new step value
delta = ((*f)(x0) - y)/(*df)(x0)
if (missing(delta)) return(.)
// get the new guess
x1 = x0 - delta
// step the iteration counter and display some information
// about our progress
++iter
printf("iter = %4.0f x1 = %9.6f f(x1) = %9.6f delta = %9.6f\n",
iter, x1, (*f)(x1), delta)
if (iter >= maxiter) return(.)
}
// return the value we converged to
return(x1)
}
mata mosave mynrinverter, replace
end
***** END: mynrinverter.do
I can now run the above do-file to create an mo-file. If I type,
. do mynrinverter
Stata will create mynrinvert.mo, then I can use -myrninverter()- in any of my
Mata sessions so long as mynrinvert.mo is somewhere in my adopath--just like
ado-files.
I then wrote the following do-file to test things out:
***** BEGIN: test.do
clear
// build mynrinverter.mo
run mynrinverter
ls *.mo
mata:
// F'(x) -- the density function Ben originally asked about
real scalar dfx(real scalar x)
{
if (x < 0) return(.)
return( ( (x + 1)^2 - 0.5 ) * exp(-2*x) )
}
mata mosave dfx, replace
// F(x) -- the CDF of Ben's density function
real scalar fx(real scalar x)
{
if (x < 0) return(.)
return( 1 - 0.5 * (x + 1) * (x + 2) * exp(-2*x) )
}
mata mosave fx, replace
// Problem: find x, such that Pr(X < x) = 0.3
x = mynrinverter(.3, &fx(), &dfx(), ., .)
// make sure I get y = 0.3 back from the 'x' -mynrinverter()- found
y = fx(x)
x, y
// save the answer so I can graph it later
st_numscalar("answer", x)
end // mata:
// graph f(), F(), and visually confirm that I got a reasonable answer from
// -mynrinverter()-
tw function dfx = ( (x + 1)^2 - 0.5 ) * exp(-2*x), ///
range(0 5) ///
|| function fx = 1 - 0.5 * (x + 1) * (x + 2) * exp(-2*x), ///
range(0 5) ///
yline(.3) xline(`=scalar(answer)')
graph export test.eps, replace
exit
***** END: test.do
Now that I have a way to get the value of invF(u) for some random uniform
value, all I need is a way to generate a Stata dataset. Now that
-mynrinverter()- is available in the form of an mo-file, we can use the Mata
function it contains in an ado-file. Here is an example:
***** BEGIN: mynrquantiles.ado
program mynrquantiles
version 9.1
syntax varname(numeric) [if] [in], ///
gen(name) ///
f(name) /// Mata function name for f()
df(name) /// Mata function name for f'()
[ ///
maxiter(real -1) ///
epsilon(real -1) ///
]
confirm new var `gen', exact
marksample touse
quietly gen `gen' = .
quietly mata : mynrquantiles(&`f'(), &`df'(),`maxiter',`epsilon')
end
mata :
mata set matastrict on
// NOTE: We could even put the code for -mynrinverter()- here, but that would
// be a shame when we could put it in a library so that other's can use it
// too. The following Mata function is intimately related to this ado-file so
// it really does belong here.
void mynrquantiles(
pointer(real scalar function) f, // f() -- the function
pointer(real scalar function) df, // f'() -- f()'s derivative
real scalar maxiter, // maximum iterations allowed
real scalar epsilon // convergence criterion
)
{
real matrix data, nobs, i
string scalar x, y, touse
// x is the output variable
x = st_local("gen")
// y is the input variable
y = st_local("varlist")
// identify relevant obs.
touse = st_local("touse")
// use a view so we can update the Stata dataset with the answers
st_view(data, ., (x, y), touse)
// loop through the data to get the quantiles
nobs = rows(data)
for(i=1; i<=nobs; i++) {
data[i,1] = mynrinverter(data[i,2],f,df,maxiter,epsilon)
}
}
end
exit
***** END: mynrquantiles.ado
Here is small do-file to test that -mynrquantiles- works properly. It
basically generates some data using -mynrquantiles- and graphs it to visually
checks that the resulting quantiles are valid.
***** BEGIN: test2.do
clear
// generate a grid of 30 evenly spaced probabilities from 0 to 1; stop short
// of 1, since the anser is x = infinity
set obs 30
gen y = (_n-1)/_N
// use -mynrquantiles- to generate to x values associated with the above y
// values
mynrquantiles y, gen(x) f(fx) df(dfx)
// graph the results, the scatter plot should fall on the plot of F(x)
tw function fx = 1 - 0.5 * (x + 1) * (x + 2) * exp(-2*x), ///
range(0 5) ///
|| scatter y x
graph export test2.eps, replace
exit
***** END: test2.do
Cheers,
--Jeff Pitblado
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/