Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Improving code speed

From	Luis <[email protected]>
To	[email protected]
Subject	st: Improving code speed
Date	Wed, 22 May 2013 19:25:02 +0200

Dear statalist  users,

I am running into a "loop efficiency problem" in that I have to
construct a variable using many iterations and I am not sure whether I
am being as efficient as possible. Given the number of observations
that I have and with my current code, I have to wait days for my code
to finish running! Here's my problem:

I have a total of 50000 observations and need to construct a variable
Y that will be computed using different subsamples of these
observations. In particular,
Y=Y1 when the subsample contains only the first observation,
Y=Y2 when the subsample contains observations 1 and 2,
Y=Y3 when the subsample contains observations 1, 2 and 3 etc until
Y=Y50000.

The idea is therefore to loop over the sample and define the subsample
which contains observations 1 until k and construct the variable
Y`k'=Yk if id==k and Y`k'=0 if id!=k. Then sum the variables Y`k'
after each loop to end up with the final variable Y.

To further complicate things, the variable Y needs to be the average
of 100 simulations that depend on draws taken from a normal
distribution. Hence I need to do a loop within the initial loop in
order to do the 100 simulations.

My code therefore looks like this:

_____________________________________________________________________________________

gen Y=0

local reps=100 \\ define the number of simulations

gen epsilon=rnormal() \\ generate the random var for the simulations

forvalues k=1(1)50000{

gen subs=(id<=`k')   \\ Define the subsample to be used
gen Y`k'=0      \\ gen the intermediate Y`k'

	forvalues i=1(1)`reps'{

                gen x`i'=z \\ generate simulated variable
                gen x`i'=z + epsilon[`i',1] if id==`k' \\ Add the random part

	gen t=(x`i')^2
	bysort subs: egen tsum=sum(x`i')

	gen Y_`i'=t/tsum if id ==`k' \\ Construct Y for simulation i
	replace Y_`i'=0 if id!=`k'

	replace Y`k'=Y`k' + Y_`i'
                replace Y`k'=0 if id!=`k'

	drop Y_`i' t tsum x`i'
	}
	
replace Y`k'=Y`k'/`reps'      // average Y from the 100 simulations
replace Y= Y + Y`k' 	
drop Y`k' subs
	}

____________________________________________________________________________________


The code runs fine, but I takes a lot of time since it has to
construct 100 variables for each of the 50000 iterations. I have tried
many different possibilities and I can't think of another way of
constructing Y.

Any tip or suggestion that would help improve the efficiency of my
code would be greatly appreciated!!!

Many thanks in advance!
Luis
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: Improving code speed
  - From: Nick Cox <[email protected]>
- Re: st: Improving code speed
  - From: George Vega Yon <[email protected]>

Prev by Date: Re: st: Observations that keep a feature... an additional problem
Next by Date: Re: st: non-linear models not converging
Previous by thread: st: non-linear models not converging
Next by thread: Re: st: Improving code speed
Index(es):
- Date
- Thread