Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: performance benchmarking stata (and mata) programs and improving program performance


From   Michael Barker <[email protected]>
To   [email protected]
Subject   Re: st: performance benchmarking stata (and mata) programs and improving program performance
Date   Tue, 23 Jul 2013 09:04:27 -0400

This post demonstrates use of the timer() functions in mata, and
discusses some ways to speed up large vector and matrix operations:

http://www.stata.com/statalist/archive/2006-11/msg00808.html


Also, you may want to pull your data into mata, rather than using
views. From the st_view help page:

help mf_st_view

When not to use views

    Do not use views as a substitute for scalars.  If you are going to
loop through the data an observation at a time, and if
    every usage you will make of X is in scalar calculations, use
_st_data().  There is nothing faster for that problem.


Mike



On Tue, Jul 23, 2013 at 12:00 AM, Sergiy Radyakin
<[email protected]> wrote:
> On Mon, Jul 22, 2013 at 6:40 PM, Brent McSharry (ADHB)
> <[email protected]> wrote:
>> I am trying to write a command to plot a risk adjusted exponentially weighted moving average chart. Calculation of the variance is computationally expensive by the formula I have.
>>
>> When doing run lengths of my testing data set of 6800+ records, Stata slows down. Because I would like to put the command up on SSC, I would like it to be more performant.
>>
>> The formula requires [sigma 1->N] calculations (so for 6800 records ~23 million). I rewrote the code in Mata, as I thought a compiled language must be faster. I also used a number of C 'performance optimisation' tricks (assuming a mata vector represents a memory serialized array). But it seems the new code is slightly slower.
>>
>> Questions are
>> 1. Is there a way to get system time in ticks or milliseconds, so that performance of different routines can be benchmarked (creturn is only in secs)?
>
> there is a parallel discussion of it here:
> http://hsphsun3.harvard.edu/cgi-bin/lwgate/STATALIST/archives/statalist.1307/date/article-800.html
>
>> 2. Why is the mata code taking so long?
>> 3. How do I optimize further (eg setting matastrict on or off, handling setbreakintr manually etc, colvector vs rowvector vs vector)?
>
> Perhaps let's discuss the algorithm? What is the array you are
> generating. The size of the array is nobs. There is no point computing
> alpha^n after a reasonably reachable n - it is just going to be zero.
> So perhaps your expensive procedure can be not populating an array of
> 6000, but an array of 600? For Stata anything smaller than 0.6^729 is
> zero anyways. Have a look at your array and check whether the loop
> must be so long.
>
>
>> 4. Is it possible to tell how much memory mata is using? There should not be more than a handful of single dimension arrays with a length of ~6000, but it would be nice to check the problem is not related to memory allocation and having to move RAM into scratch space.
>
> yes: mata mata des   reports a sizes of mata structures in bytes.
>
> Best, Sergiy
>>
>> Thank you very much for your thoughts/wisdom. The relevant code is below and I am NOT a stata programmer, so very appreciative of style tips as well as performance.
>>
>> Brent McSharry
>> Intensivist
>> Starship children's hospital
>> -----------------------------
>> //Original stata
>> count if `touse'
>> local count `r(N)'
>> gsort -`touse' `timevar'
>> noi di as txt "calculating std errors " as res "`=`count'*(`count'+1)/2'" as txt " calculations started at " as res "`c(current_time)'"
>> gen `ewma_o'=`startest' in 1
>> replace `ewma_o'= `smooth'*`observed' + (`oneMinusSm'*`ewma_o'[_n-1]) in 2/`count'
>> gen `ewma_p'=`startest' in 1
>> replace `ewma_p'= `smooth'*`predicted' + (`oneMinusSm'*`ewma_p'[_n-1]) in 2/`count'
>> gen `var_p' = `predicted' *(1-`predicted') in 1/`count'
>> gen `workingsum' = .
>> gen `ewma_se' = .forvalues i = 1(1)`count' {
>>         replace `workingsum' = (`oneMinusSm'^(2*(`i'-_n)) * `var_p') in 1/`i'
>>         sum `workingsum', meanonly
>>         replace `ewma_se' = sqrt(`r(sum)')* `smooth' in `i'
>> }
>> noi di as txt "di calculations finished at " as res "`c(current_time)'"
>> local z=invnormal(1-(`alpha1'/2))
>> gen `cl' = `ewma_se'  * `z'
>> gen `cl1_ub' = `ewma_p' + `cl'
>> gen `cl1_lb' = `ewma_p' - `cl'
>> local z=invnormal(1-(`alpha2'/2))
>> replace `cl' = sqrt(`ewma_se') * `smooth' * `z'
>> gen `cl2_ub' = `ewma_p' + `cl'
>> gen `cl2_lb' = `ewma_p' - `cl'
>> -----------------------------
>> //mata performing same function in loop
>> mata:
>> mata set matastrict on
>> mata set matadebug off
>> void getRaewma(string scalar expectVarname,
>>         string scalar obsVarname,
>>         real scalar obsCount,
>>         real scalar smooth,
>>         real scalar startVal,
>>         string scalar emwaObsName,
>>         string scalar emwaExpectName,
>>         string scalar emwaSeName)
>> {
>>         real colvector ewmaExp, ewmaObs, ewmaSe
>>         real vector expectVec, obsVec, expectVar, smoothPow, colIndices
>>         real scalar oneMinusSm, j, k, len, currExp, currObs, expectJ, cumSeCalc
>>         oneMinusSm = 1 - smooth
>>         st_view(expectVec, ., expectVarname)
>>         st_view(obsVec, ., obsVarname)
>>         smoothPow = getSmoothPow(oneMinusSm, obsCount-1)
>>         len = rows(expectVec)
>>         ewmaExp = J(len,1,.)
>>         ewmaObs = J(len,1,.)
>>         ewmaSe = J(len,1,.)
>>         expectVar = J(obsCount,1,.)
>>         currExp = currObs = startVal
>>         for (j=1;j<=obsCount;j++)
>>         {
>>                 currObs = ewmaObs[j,1] = obsVec[j]*smooth + currObs *oneMinusSm
>>                 expectJ = expectVec[j]
>>                 currExp = ewmaExp[j,1] = expectJ*smooth + currExp *oneMinusSm
>>                 cumSeCalc = expectVar[j] = expectJ*(1-expectJ)
>>                 for (k=1;k<j;k++)
>>                 {
>>                         cumSeCalc = cumSeCalc + smoothPow[j-k] * expectVar[k]
>>                 }
>>                 ewmaSe[j,1] = smooth*sqrt(cumSeCalc)
>>         }
>>         (void) setbreakintr(0)
>>         colIndices = st_addvar("double", (emwaExpectName, emwaObsName, emwaSeName), 1)
>>         st_store(.,colIndices[1],ewmaExp)
>>         st_store(.,colIndices[2],ewmaObs)
>>         st_store(.,colIndices[3],ewmaSe)
>>         (void) setbreakintr(1)
>> }
>> //build lookup for power calculations (which tend to be computationally expensive)
>> real vector function getSmoothPow(real scalar oneMinusSm, real scalar length)
>> {
>>         real scalar i, oneMinusSm2, currPow
>>         real vector returnVar
>>         oneMinusSm2 = oneMinusSm * oneMinusSm
>>         returnVar = J(1,length,.)
>>         currPow = 1
>>         for (i=1;i<=length;i++)
>>         {
>>                 returnVar[i] = currPow = oneMinusSm2 * currPow
>>         }
>>         return(returnVar)
>> }
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index