Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: Monte-Carlo simulation of regression models


From   "FEIVESON, ALAN H. (AL) (JSC-SK) (NASA)" <[email protected]>
To   "'[email protected]'" <[email protected]>
Subject   RE: st: Monte-Carlo simulation of regression models
Date   Tue, 20 Jan 2004 13:52:20 -0600

Just a reminder - if you generate a fresh "batch" of predictors for each run
of the simulation, you will end up with estimates of unconditional variances
- not the conditional ones that are given by standard formulas in ordinary
linear regression, for example.

If you want conditional variances you have to stick to one realization of
the predictors for all the Monte Carlo runs.

A major problem with simulation using the unconditional approach is how to
model the joint distribution of the predictors. For the conditonal approach,
the problem is how to decide what values of the predictors to use.

Al Feiveson



-----Original Message-----
From: [email protected]
[mailto:[email protected]]On Behalf Of
[email protected]
Sent: Tuesday, January 20, 2004 12:23 PM
To: [email protected]
Subject: Re: st: Monte-Carlo simulation of regression models


Jenkins S P <[email protected]> asks about programming with -simulate-:

> I wish to run some Monte-Carlo simulations of several discrete time
> competing risk survival analysis regression models, and have some
> questions about how best to handle this using -simulate-.  My intended
> programs look something like the following in outline:
> 
> ~~~~~~~~~~~
> create data set containing 'true' explanatory variables (X1-Xk).
> 
> -use- this data set
> 
> program define mysim
>    ... generate depvars YA, YB, and additional 'duration' vbles (D1-Dt),
>        from the X1-Xk and uniform()
> 
>    ... -ml-based regression #1 of YA, YB, on X1-Xk and D1-Dt (2-eqn model)
> 
>    ... -ml-based regression #2
> 
>    ... -ml-based regression #3
> 
>        [for each model, the main results of interest
>        are, for each of the 2 equations, the estimated
>        coefficients on the X1-Xk, and their associated standard errors]
> 
>    ... return scalar ???
>    ... return matrix ???
> 
> end
> 
> simulate mysim ....
> ~~~~~~~~~
> 
> Questions:
> 
> 1. What is the most efficient way of returning as saved results all
> the estimated coefficients and standard errors? If I define program
> mysim as rclass, does this mean that I have to have an -return scalar
> <result>- line for each and every one of the coeffs and SEs, or is there a
> better way?  [For example, I see that there is a -return matrix- command,
> but I am not sure if it can be used in this context, as the matrices
> presumably wouldn't accumulate in the simulation data set.]
> 
> 2. Complication: sometimes the -ml-based regressions may fall over because
> of collinearities among the D1-Dt variables. Does one simply trap these
> errors in a standard way?  [I think there has been Statalist posting on
> this but I couldn't find it in the archive.] And should one change
> the number of reps if this happens?
> 
> 3. What is the -nowarn- option in the example in [R] simulate, p.73?
> It doesn't appear to be in the -simulate- syntax diagram on p.69
> (though there is a -nocheck- option cited there)

I would write a separate -mysim- for each regression.  I would develop
-mysim1- say for the first regression, and once I was happy with it I would
then go back and modify it to create -mysim2- and -mysim3- (only changing
the
lines for the regression).  Based on the above description, it seems that
-mysim1- (and it's siblings) would look something like the following:

	program mysim1
		// generate depvars YA, YB, and additional 'duration' vbles
		// (D1-Dt), from the X1-Xk and uniform()

		// -ml-based regression #1 of YA, YB, on X1-Xk and D1-Dt
		// (2-eqn model)
	end

We could then use the estimation results from the regression in -mysim-
directly in the call to -simulate-:

	. // gen original explanatory variables
	. save xvars

	. use xvars
	. set seed 1234
	. simulate "mysim1" _b _se, reps(???)
	. gen iter = _n
	. save mysim1

	. use xvars
	. set seed 1234
	. simulate "mysim2" _b _se, reps(???)
	. gen iter = _n
	. save mysim2

	. use xvars
	. set seed 1234
	. simulate "mysim3" _b _se, reps(???)
	. gen iter = _n
	. save mysim3

We could later merge mysim1.dta, mysim2.dta, and mysim3.dta on the -iter-
variable to basically get the same dataset as indicated in the above problem
discussion.

1.  The most efficient way is to get your simulation program to post them to
-e(b)- and -e(V)- (the variances) by using an estimation command, then
supply
the _b and _se extended expressions of -simulate-.

2.  -simulate- will automatically handle, by posting missing values,
problems
with dropped covariates or failed regressions.  If this problem is
occurring,
you will have to up the number of simulations to ensure you get the sample
size you want.

3.  Mentioning -nowarn- in [R] simulate was a typo, it is not really an
option
of -simulate-, but part of the parse engine that -simulate- uses.  Thank you
for pointing this out, I will put a not to have it removed in the future.

--Jeff
[email protected]
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index