Re: st: where is StataCorp C code located? all in a single executable as compiled binary?

From   László Sándor <>
Subject   Re: st: where is StataCorp C code located? all in a single executable as compiled binary?
Date   Tue, 20 Aug 2013 16:19:18 -0400

So, I reran the test on 8 cores, with Stata/MP 13, with 32 GB RAM.

I made the following changes:
1. I maxed out the number of observations. (see -h limits- and -h maxlong-)
2. Made ten byte variables taking 20 integer values, this takes up 25
GB out of the 32, close to the StataCorp recommendations of leaving
50% extra. But I did not check if virtual memory is touched, maybe I
can scale dataset down a bit.
3. So I am taking 20 bins now, in case -tabulate, sum- and loops of
-sum if, meansonly- scale differently.
4. I take only oneway tabs, as that's what I need, testing twoway was a mistake.
5. I also try a -bys bins:- "looping".
+1. I mentioned I corrected Eric's code about not looping over all
values that were "tabbed over". Now the two are comparable.

In this setup,
-- -tabulate, sum nof noobs nol nost- completes in only 1516.36
seconds, or ~25 minutes.
-- the simple frequency tab takes only 583.51 s, but again, this is
not in the run.
-- -collapse, fast- took 4025.64 seconds, much slower than -tab, sum-,
very strange. (I am pretty sure I have exclusive use of this compute
node, no other process is running or scheduling me).
-- the if-loops took 3967s, shockingly comparable to -collapse, fast-,
but still much slower than (now oneway) -tab, sum-.
-- -bys bins: sum, meanonly- took 3205 s.

So -tab, sum- is unbeatable on big data for oneway tabs with a
moderate number of bins. Or others can run other tests.

So I stick to parsing the log of -tab, sum-.

Thanks for all your thoughts,


