Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Improving code speed
From
George Vega Yon <[email protected]>
To
[email protected]
Subject
Re: st: Improving code speed
Date
Wed, 22 May 2013 14:02:15 -0400
Dear Luis,
Without know much about what you are trying to calculate, in a fast
check I see two things that can help you:
(1) Try using mata instead (loops are much more faster than stata), a
mata loop looks like this
local nreps = 100
mata:
for(i=1;i<=nreps;i++) {
... mata code...
}
end
(2) You can try using my module "parallel" which can speedup your code
without much effort, here is an example of how this works (you'll need
two dofiles, 1 for initial config and another for the loop itself):
_____________________________________________________________________________________
clear all
vers 11
// Setup
set obs 1000
gen id = _n
gen z = rnormal()
gen Y=0
global reps=10 // define the number of simulations
gen epsilon=rnormal() // generate the random var for the simulations
// Parallel setup (if you have quad-core computer)
// ssc install parallel, all
parallel setclusters 4
// Serial fashion
preserve
timer on 1
do mydofile
timer off 1
restore
// Parallel fashion
timer on 2
parallel do mydofile.do
timer off 2
// How fast??
timer list
_____________________________________________________________________________________
_________________________mydofile.do___________________________________________________
forvalues k=`=id[1]'(1)`=id[_N]'{
gen subs=(id<=`k') // Define the subsample to be used
gen Y`k'=0 // gen the intermediate Y`k'
forvalues i=1(1)$reps{
gen x`i'=z // generate simulated variable
replace x`i'=z + rnormal() if id==`k' // Add the random part
gen t=(x`i')^2
bysort subs: egen tsum=sum(x`i')
gen Y_`i'=t/tsum if id ==`k' // Construct Y for simulation i
replace Y_`i'=0 if id!=`k'
replace Y`k'=Y`k' + Y_`i'
replace Y`k'=0 if id!=`k'
drop Y_`i' t tsum x`i'
}
replace Y`k'=Y`k'/$reps // average Y from the 100 simulations
replace Y= Y + Y`k'
drop Y`k' subs
}
_____________________________________________________________________________________
Hope it helps,
Cheers!
George Vega Yon
7 647 2552
http://cl.linkedin.com/in/georgevegayon
2013/5/22 Luis <[email protected]>:
> Dear statalist users,
>
> I am running into a "loop efficiency problem" in that I have to
> construct a variable using many iterations and I am not sure whether I
> am being as efficient as possible. Given the number of observations
> that I have and with my current code, I have to wait days for my code
> to finish running! Here's my problem:
>
> I have a total of 50000 observations and need to construct a variable
> Y that will be computed using different subsamples of these
> observations. In particular,
> Y=Y1 when the subsample contains only the first observation,
> Y=Y2 when the subsample contains observations 1 and 2,
> Y=Y3 when the subsample contains observations 1, 2 and 3 etc until
> Y=Y50000.
>
> The idea is therefore to loop over the sample and define the subsample
> which contains observations 1 until k and construct the variable
> Y`k'=Yk if id==k and Y`k'=0 if id!=k. Then sum the variables Y`k'
> after each loop to end up with the final variable Y.
>
> To further complicate things, the variable Y needs to be the average
> of 100 simulations that depend on draws taken from a normal
> distribution. Hence I need to do a loop within the initial loop in
> order to do the 100 simulations.
>
> My code therefore looks like this:
>
> _____________________________________________________________________________________
>
> gen Y=0
>
> local reps=100 \\ define the number of simulations
>
> gen epsilon=rnormal() \\ generate the random var for the simulations
>
> forvalues k=1(1)50000{
>
> gen subs=(id<=`k') \\ Define the subsample to be used
> gen Y`k'=0 \\ gen the intermediate Y`k'
>
> forvalues i=1(1)`reps'{
>
> gen x`i'=z \\ generate simulated variable
> gen x`i'=z + epsilon[`i',1] if id==`k' \\ Add the random part
>
> gen t=(x`i')^2
> bysort subs: egen tsum=sum(x`i')
>
> gen Y_`i'=t/tsum if id ==`k' \\ Construct Y for simulation i
> replace Y_`i'=0 if id!=`k'
>
> replace Y`k'=Y`k' + Y_`i'
> replace Y`k'=0 if id!=`k'
>
> drop Y_`i' t tsum x`i'
> }
>
> replace Y`k'=Y`k'/`reps' // average Y from the 100 simulations
> replace Y= Y + Y`k'
> drop Y`k' subs
> }
>
> ____________________________________________________________________________________
>
>
> The code runs fine, but I takes a lot of time since it has to
> construct 100 variables for each of the 50000 iterations. I have tried
> many different possibilities and I can't think of another way of
> constructing Y.
>
> Any tip or suggestion that would help improve the efficiency of my
> code would be greatly appreciated!!!
>
> Many thanks in advance!
> Luis
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/