Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Mata - generalised cross products of the form X'SX
From
Gordon Hughes <[email protected]>
To
[email protected]
Subject
Re: st: Mata - generalised cross products of the form X'SX
Date
Wed, 09 Feb 2011 10:40:10 +0000
To save space I haven't reflected all of Bill Gould's response to my
original posting, but I would like to pick up on the question of
optimisation with respect to different ways of doing the same
thing. I understand that hardware, the nature of the problem, etc
matter, but others may be interested in what I have found. As
background I am running Stata/MP for 2 cores under Windows XP (32
bit) and Macintosh OS-X 10.6.6 (64 bit). The machines are
respectively Intel Quad Core (XP) and Intel 2 Core (Mac) with 4 GB of
RAM in each case. I run into memory limitations under XP and time
limitations on the Mac. My research problem is spatial panel
econometrics involving quite large N and T, but my test runs hit the
constraints at sample sizes much smaller than I need for real
work. For spatial panel data, the calculations are structured as the
sum over T of what are, in effect, GLS estimations over panel units
so nrows = ncols in many cases because of the repeated application of
spatial weights.
I have implemented most of the optimisations by hand including
calculating X'B'BX as T'*T where T=B*X. Trying to go further:
1. I replaced --T'*T -- by --cross(T,T)--. There is no significant
reduction in execution time. For a majority of cases execution takes
longer with cross(), but this is not always the case. My experience
is that the overhead for small problems is greater than the increase
in speed for large problems, so that the gain does not justify the
potential penalty.
2. Going further, suppose S is positive definite. Then, I found
that applying the Cholesky factorisation S=LL' and using the variant
X'SX=T'*T where T=L'*X pays off if you have to do the factorisation
for other purposes (e.g. prior inversion) but is any time gains are
very marginal otherwise.
3. Suppose S is square but not guaranteed to be positive definite
(unfortunately my case), then the LU factorisation S=PLU means that
X'SX=(X'*PL)*(U*X). My experience in this case is that there is a
very definite time penalty from using this formulation rather than
just using X'*S*X.
Unfortunately without the ability to monitor memory use I don't know
whether --cross()-- or some of the other formulations are
significantly more efficient in memory use. For anyone wondering
what this is all about, the largest of my test problems involves up
to 4,500 observations for 225 panel units and 20 time periods. My
real data is up to 500 panel units and 50 or more time periods. The
largest problem takes 4 hours to run on my Mac and the time required
for a run goes up exponentially with the number of time periods -
going from 8 to 20 increases execution time by more 50 times. The
program spends nearly 80% of execution time in one function, so that
there are huge gains from optimising this function, but I suspect
that I have got to the point where I need to go back to using much
larger distributed computing power.
Gordon Hughes
[email protected]
Date: Tue, 08 Feb 2011 09:37:14 -0600
From: [email protected] (William Gould, StataCorp LP)
Subject: Re: st: Mata - generalised cross products of the form X'SX
Gordon Hughes <[email protected]> writes,
> B. Roughly, under what conditions might this sequence of steps
> reduce execution time and/or memory use on the assumption
> that the dimension of S is large relative to cols(X)?
I'm sorry, I don't have an answer, except to say that the answer will
depend on whether you are using Stata/MP because MP is parallelizes
matrix multiplication, and thus performs it faster than does
Stata/SE. In cases were rows>>cols or cols>>rows, multiplication
time approaches theoretical limits, which is to say, execution time
becomes proportional to 1/cores.
> In a related context I have to calculate X'B'BX where B is square
> but not symmetric and found that use of --cross()-- does not save
> much (if any) time, though it is probably more efficient on memory.
Sometimes the Mata optimizer is not as smart as we wish that it were.
To calculate X'B'B'X as quickly as possible, code
T = B'X
R = T'T
That will execute more quickly than R = X'B'B'X. When you code
X'B'B'X, Mata does not notice that X'B'B'x = (B'X)'(B'X). Calculation
of X'B'B'X results in Mata proforming three transposes and four matrix
multiplications.
....
In addition, use parenthesis where necessary to force the order of the
operations. Concerning order of operations, Mata does not know what
you know, namely that rows >> cols. Mata makes all of its
optimization decisions at compile time, and therefore can can choose
inefficient order of operations. In this case, for memory
effinciency reasons, you do not want to calculate B'B to get to
X'B'B'X, you want to calculate X'B and B'X. That right there will
make a differece in execution time, even without exploiting X'B
= (B'X)', which can be used additionally to save an entire multiplication.
- -- Bill
[email protected]
*
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/