Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Improving code speed
From
Nick Cox <[email protected]>
To
"[email protected]" <[email protected]>
Subject
Re: st: Improving code speed
Date
Wed, 22 May 2013 19:13:39 +0100
Please use your full real name. See Statalist FAQ for that request and why.
Some speed-ups are likely to be possible here, but I first I note
several puzzles with this code.
You don't say so, but presumably -id- is _n in a variable.
-z- is unexplained.
More problematically,
(a) You -generate- a variable -epsilon- but you refer to a matrix -epsilon-.
(b) The two lines below won't work as once -x`i'- exists the second
command will fail.
gen x`i'=z \\ generate simulated variable
gen x`i'=z + epsilon[`i',1] if id==`k' \\ Add the random part
Why do you say that it works? Did you copy some buggy version by accident?
Better to post self-contained code that works.
Speed-ups, apart from using Mata. (See George Vega Yon's post.)
1. Too much copying from one variable to another. I could be wrong,
but some variables appear to be mostly zero, and you are just copying
constants. Think in terms of scalars instead.
2. Use -summarize, meanonly- to get sums. -egen- is very slow at this.
3. Use -in 1/`k'- or -in `k'- wherever possible. Whenever there is a
choice between -if- and -in- for the same problem, -in- is faster.
Some example code:
gen x`i'=z
replace x`i'= x`i' + epsilon[`i'] in `k'
su x`i' in 1/`k', meanonly
replace Y`k' = (x`i')^2/r(sum) in `k'
Nick
[email protected]
On 22 May 2013 18:25, Luis <[email protected]> wrote:
> Dear statalist users,
>
> I am running into a "loop efficiency problem" in that I have to
> construct a variable using many iterations and I am not sure whether I
> am being as efficient as possible. Given the number of observations
> that I have and with my current code, I have to wait days for my code
> to finish running! Here's my problem:
>
> I have a total of 50000 observations and need to construct a variable
> Y that will be computed using different subsamples of these
> observations. In particular,
> Y=Y1 when the subsample contains only the first observation,
> Y=Y2 when the subsample contains observations 1 and 2,
> Y=Y3 when the subsample contains observations 1, 2 and 3 etc until
> Y=Y50000.
>
> The idea is therefore to loop over the sample and define the subsample
> which contains observations 1 until k and construct the variable
> Y`k'=Yk if id==k and Y`k'=0 if id!=k. Then sum the variables Y`k'
> after each loop to end up with the final variable Y.
>
> To further complicate things, the variable Y needs to be the average
> of 100 simulations that depend on draws taken from a normal
> distribution. Hence I need to do a loop within the initial loop in
> order to do the 100 simulations.
>
> My code therefore looks like this:
>
> _____________________________________________________________________________________
>
> gen Y=0
>
> local reps=100 \\ define the number of simulations
>
> gen epsilon=rnormal() \\ generate the random var for the simulations
>
> forvalues k=1(1)50000{
>
> gen subs=(id<=`k') \\ Define the subsample to be used
> gen Y`k'=0 \\ gen the intermediate Y`k'
>
> forvalues i=1(1)`reps'{
>
> gen x`i'=z \\ generate simulated variable
> gen x`i'=z + epsilon[`i',1] if id==`k' \\ Add the random part
>
> gen t=(x`i')^2
> bysort subs: egen tsum=sum(x`i')
>
> gen Y_`i'=t/tsum if id ==`k' \\ Construct Y for simulation i
> replace Y_`i'=0 if id!=`k'
>
> replace Y`k'=Y`k' + Y_`i'
> replace Y`k'=0 if id!=`k'
>
> drop Y_`i' t tsum x`i'
> }
>
> replace Y`k'=Y`k'/`reps' // average Y from the 100 simulations
> replace Y= Y + Y`k'
> drop Y`k' subs
> }
>
> ____________________________________________________________________________________
>
>
> The code runs fine, but I takes a lot of time since it has to
> construct 100 variables for each of the 50000 iterations. I have tried
> many different possibilities and I can't think of another way of
> constructing Y.
>
> Any tip or suggestion that would help improve the efficiency of my
> code would be greatly appreciated!!!
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/