Benoit Dulong
> The following simulation program will be the first Stata program
> presented to my director at University of Montreal.
> It works, but it is slow, and it feels bizarre...
> (I have to drop _all two times)
>
> Question-1
> In general, is that how a simulation should be done with Stata
> (postfile, post, postclose) ?
There are lots of ways you could simulate. You need not
use -postfile-. Personally, I always try to do stuff in place
whenever possible without masses of file I/0.
> Question-2
> To generate a random number n1, I use two syntax lines:
> --------------------
> rndpoi 1 `lambda1'
> local n1 = xp
> --------------------
> Is it possible to do this in only one syntax line ?
> (pass directly the random number to local n1)
Not with -rndpoi-. But then -rndpoi- enforces its way
of seeing things upon you. It does not offer the flexibility
you need here. Avoid it and use -genpoisson-.
> Question-3
> It it possible not to drop _all two times ?
> (use subroutine ?)
Yes, you can avoid -drop-.
> ------------------------------------------------------------
> ----------
> MY ALGORITHM
> ------------------------------------------------------------
> ----------
>
> [1]
> generate random number n1
> from a Poisson distribution with parameter lambda1
>
> generate random number n2
> from a Poisson distribution with parameter lambda2
>
> n = n1+n1
>
> [2]
> generate random vector x (n component)
> from uniform distribution (generate x=uniform())
>
> generate random vector y (n component)
> from uniform distribution (generate y=uniform())
>
> [3]
> for each (x[i],y[i]), i=1...n,
> calculate the distance to the nearest neighbor
> using -- nearest -- from N.J COX (NJC 1.1.0 10 January 2003)
> the syntax
> nearest x y, id(id) dist(h)
> gives a file with n rows and 4 columns (x y id h)
>
> [4]
> from the file obtained in [3]
> calculate the mean (mh) and var (vh) of the n values of h
>
> [5]
> repeat [1] to [4] 1000 times
>
> ------------------------------------------------------------
> ----------
> MY PROGRAM
> ------------------------------------------------------------
> ----------
>
> program define myprog1
> version 7.0
> args lambda1 lambda2
>
> postfile mysim1 n1 n2 n mh vh using "myresults1", replace
>
> forvalues i=1(1)1000{
>
> drop _all
>
> display "i= " `i'
>
> rndpoi 1 `lambda1'
> local n1 = xp
>
> rndpoi 1 `lambda2'
> local n2 = xp
>
> local n = `n1' + `n2'
>
> drop _all
> set obs `n'
>
> gen x = uniform()
> gen y = uniform()
> nearest x y, id(id) dist(h)
>
> summarize h
> local mh = r(mean)
> local vh = r(Var)
>
> post mysim1 (`n1') (`n2') (`n') (`mh') (`vh')
>
> }
>
> postclose mysim1
>
> end
>
Various thoughts:
1. You are using -post-. Not essential.
2. You are invoking a program -rndpoi- to
generate 2000 randoms from a Poisson 2000
times. You can call -genpoisson- twice.
3. 2000 -drop _all- can be avoided.
4. -display- serves no purpose beyond showing
progress of program. Once it's working, you
can cut that.
5. Statements like
local n1 = xp
hinge on the accident that you have just
one observation in memory. Dangerous style.
6. Invoking -nearest- 1000 times imposes
an overhead of interpretation. Also you
are forcing it to generate identifiers
which you never use. Read the code into
your program and cut all surplus.
Here the result of a bit of hacking around.
program define myprog2
version 7.0
args lambda1 lambda2
clear
set obs 1000
genpoisson n1, mu(`lambda1')
genpoisson n2, mu(`lambda2')
gen n = n1 + n2
qui {
gen which = _n
expand n
gen x = uniform()
gen y = uniform()
gen dist = .
gen mh = .
gen vh = .
sort which
gen long obs = _n
forval i = 1/1000 {
noi su obs if which == `i', meanonly
local imin = r(min)
local imax = r(max)
forval j = `imin'/`imax' {
scalar mind = .
forval k = `imin'/`imax' {
if (`j' != `k') {
scalar d = /*
*/ (x[`j'] - x[`k'])^2 + (y[`j'] - y[`k'])^2
scalar mind = /*
*/ min(scalar(d), mind)
}
}
replace dist = sqrt(mind) in `j'
}
summarize dist in `imin'/`imax', detail
replace mh = r(mean) in `imin'
replace vh = r(Var) in `imin'
}
bysort which (mh) : keep if _n == 1
drop obs x y dist
}
end
Comments on that:
get all the Poissons at once; less fishing time
what if n is 0? I think it goes OK
get your randoms on unit square at once
do things -in- rather than -if-
wherever possible: by Blasnik's First Law, that's
a lot faster
as Michael points out in his email which
I've just seen, the -nearest- algorithm
is very crude.
Nick
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/