Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: how to parallelize Mata (or steal the performance of built-in -tab, summarize-)

From	László Sándor <[email protected]>
To	[email protected]
Subject	st: how to parallelize Mata (or steal the performance of built-in -tab, summarize-)
Date	Mon, 2 Apr 2012 15:07:00 -0400

Hi all,

I had several questions recently on this list about compiling Mata
code. I still could not deal with generating the compile time locals
with loops, but I typed them out and compiled. Now I had my test runs
but they are surprising. Let me ask you why:

My basic problem was to do a fast "collapse" to make binned scatter
plots. Collapse was unacceptably slow, probably because of the
necessary preserve-restore cycles, or inefficient coding of collapse
(for its general purpose).

I already had a version that parsed a log of -tabulate, summarize-.
Yes, it is as much of a hack as it sounds like. I was not expecting
this to be fast, at least because of the file I/O and the parsing.

Now I built a Mata function that "collapses" into new variables with
leaving the data intact otherwise. For this I used Ben Jann's
-mf_mm_collapse-, and compiled all the necessary functions myself in
the ado file.

And the test run with 100 million observations told me it was slower
than the hack. Before I give up and claim the hack unbeatable, I have
one suspicion. I had the test run on Stata 12 MP on a cluster, with 12
cores. Perhaps -tabulate- used all of them, and my code did not.

Are there guidelines how to speed up Mata in this situation (if it is
not MP-aware to begin with?).

Or, tentatively, can I ask for some guidance about the magic of
-tabulate, summarize-? Is that magic accessible/reproducible without
just logging its output?

Thanks,

Laszlo
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: how to parallelize Mata (or steal the performance of built-in -tab, summarize-)
  - From: Nick Cox <[email protected]>

Prev by Date: Re: st: Re: Normality and Granger Causality in Panel Data
Next by Date: st: Re: Normality and Granger Causality in Panel Data
Previous by thread: st: Re: Normality and Granger Causality in Panel Data
Next by thread: Re: st: how to parallelize Mata (or steal the performance of built-in -tab, summarize-)
Index(es):
- Date
- Thread