Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: how to parallelize Mata (or steal the performance of built-in -tab, summarize-)
From
László Sándor <[email protected]>
To
[email protected]
Subject
Re: st: how to parallelize Mata (or steal the performance of built-in -tab, summarize-)
Date
Mon, 2 Apr 2012 17:23:44 -0400
Nick,
thanks, I did follow up with your post. Sadly, I could not easily get
-by- working, or to be precise, to use the variables that it
generated. Below I have an attempt, if I can take liberty with your
time and expect you to parse it, I am grateful for comments to get it
working -- the indexing must be off. It tries to average two (x_r and
y_r) or three (y2_r extra) variables. It generates too large values
for some bins (i.e. from U[0,1] variables some averages become larger
than 20.)
I am happy if someone from StataCorp follows up too! :)
Thanks,
László
tempvar wsum tag ones
g byte `ones' = 1
if ("`y2_var'"!="") local y2 y2
else local y2 ""
if ("`weight1'"!="") g `wsum' = sum(`weight1') if `touse'
else g `wsum' = sum(`ones') if `touse'
sort `x_q'
by `x_q': g byte `tag' = _N if `touse'
foreach v in x y `y2' {
if "`weight1'"!=""{
by `x_q': g ``v'_mean' = sum(``v'_r'*`weight1') if `touse'
by `x_q': replace ``v'_mean' = ``v'_mean'/`wsum' if `tag' & `touse'
}
else {
by `x_q': g ``v'_mean' = sum(``v'_r') if `touse'
by `x_q': replace ``v'_mean' = ``v'_mean'/`wsum' if `tag' & `touse'
}
}
On Mon, Apr 2, 2012 at 3:36 PM, Nick Cox <[email protected]> wrote:
>
> We are back to the questions you asked a week ago. Mostly this is for
> StataCorp. Otherwise please see again my answers at
>
> http://www.stata.com/statalist/archive/2012-03/msg01144.html
>
> I've had dramatic speed-ups with Mata -- my record is reducing
> execution time from 5 days to 2 minutes, but that was partly because
> my original code was so dumb -- but I've not tried anything like the
> stuff you were using.
>
> -tabulate, summarize- is compiled C code. I think the nearest you can
> get is by using -by:- as explained in the post just quoted.
>
> Nick
>
> 2012/4/2 László Sándor <[email protected]>:
> > Hi all,
> >
> > I had several questions recently on this list about compiling Mata
> > code. I still could not deal with generating the compile time locals
> > with loops, but I typed them out and compiled. Now I had my test runs
> > but they are surprising. Let me ask you why:
> >
> > My basic problem was to do a fast "collapse" to make binned scatter
> > plots. Collapse was unacceptably slow, probably because of the
> > necessary preserve-restore cycles, or inefficient coding of collapse
> > (for its general purpose).
> >
> > I already had a version that parsed a log of -tabulate, summarize-.
> > Yes, it is as much of a hack as it sounds like. I was not expecting
> > this to be fast, at least because of the file I/O and the parsing.
> >
> > Now I built a Mata function that "collapses" into new variables with
> > leaving the data intact otherwise. For this I used Ben Jann's
> > -mf_mm_collapse-, and compiled all the necessary functions myself in
> > the ado file.
> >
> > And the test run with 100 million observations told me it was slower
> > than the hack. Before I give up and claim the hack unbeatable, I have
> > one suspicion. I had the test run on Stata 12 MP on a cluster, with 12
> > cores. Perhaps -tabulate- used all of them, and my code did not.
> >
> > Are there guidelines how to speed up Mata in this situation (if it is
> > not MP-aware to begin with?).
> >
> > Or, tentatively, can I ask for some guidance about the magic of
> > -tabulate, summarize-? Is that magic accessible/reproducible without
> > just logging its output?
> >
> > Thanks,
> >
> > Laszlo
> > *
> > * For searches and help try:
> > * http://www.stata.com/help.cgi?search
> > * http://www.stata.com/support/statalist/faq
> > * http://www.ats.ucla.edu/stat/stata/
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/