Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Luis <stataluis@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | st: Improving code speed |
Date | Wed, 22 May 2013 19:25:02 +0200 |
Dear statalist users, I am running into a "loop efficiency problem" in that I have to construct a variable using many iterations and I am not sure whether I am being as efficient as possible. Given the number of observations that I have and with my current code, I have to wait days for my code to finish running! Here's my problem: I have a total of 50000 observations and need to construct a variable Y that will be computed using different subsamples of these observations. In particular, Y=Y1 when the subsample contains only the first observation, Y=Y2 when the subsample contains observations 1 and 2, Y=Y3 when the subsample contains observations 1, 2 and 3 etc until Y=Y50000. The idea is therefore to loop over the sample and define the subsample which contains observations 1 until k and construct the variable Y`k'=Yk if id==k and Y`k'=0 if id!=k. Then sum the variables Y`k' after each loop to end up with the final variable Y. To further complicate things, the variable Y needs to be the average of 100 simulations that depend on draws taken from a normal distribution. Hence I need to do a loop within the initial loop in order to do the 100 simulations. My code therefore looks like this: _____________________________________________________________________________________ gen Y=0 local reps=100 \\ define the number of simulations gen epsilon=rnormal() \\ generate the random var for the simulations forvalues k=1(1)50000{ gen subs=(id<=`k') \\ Define the subsample to be used gen Y`k'=0 \\ gen the intermediate Y`k' forvalues i=1(1)`reps'{ gen x`i'=z \\ generate simulated variable gen x`i'=z + epsilon[`i',1] if id==`k' \\ Add the random part gen t=(x`i')^2 bysort subs: egen tsum=sum(x`i') gen Y_`i'=t/tsum if id ==`k' \\ Construct Y for simulation i replace Y_`i'=0 if id!=`k' replace Y`k'=Y`k' + Y_`i' replace Y`k'=0 if id!=`k' drop Y_`i' t tsum x`i' } replace Y`k'=Y`k'/`reps' // average Y from the 100 simulations replace Y= Y + Y`k' drop Y`k' subs } ____________________________________________________________________________________ The code runs fine, but I takes a lot of time since it has to construct 100 variables for each of the 50000 iterations. I have tried many different possibilities and I can't think of another way of constructing Y. Any tip or suggestion that would help improve the efficiency of my code would be greatly appreciated!!! Many thanks in advance! Luis * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/