Making Stata faster

Highlights

sort is much faster.
collapse is much much faster.
MKL-powered Mata functions and operators are faster.
Mixed models are faster.
import delimited is now parallelized in Stata/MP.

I just got my Stata 17 and I see huge improvements. I am using AMD Ryzen 7 4800H with 40GbRam. On my Stata 16.1 MP8, my data with 44.7 Million obs used to take 30 seconds to sort; now on stata 17 MP8, it is taking 16.1 seconds to sort. I also ran [a] few other commands where I manipulate the data. Stata 16.1 used to take 8 minutes to complete that task; now Stata 17 takes 3 minutes to complete similar tasks. I am not sure how Intel will perform with the new "Intel Math Kernel Library (MKL)" update. However, I am very happy with the new update.

— Ahmed Khan
Ph.D. scholar at University of Waikato School of Accounting, Finance, and Economics

Stata values accuracy and speed. There is often a tradeoff between the two, but Stata strives to give users the best of both worlds. We are continuously optimizing and improving our routines to utilize modern computing power and algorithms so that Stata runs even faster.

In Stata 17, we updated the algorithms behind sort and collapse to make these commands faster. Much faster. Because the sort command is used by many other Stata commands, these commands, too, are faster. sort is somewhere between 1.5 and 6 times faster, as is shown in Table 1, below. For example, with 10 million observations and 20 variables, timings dropped to close to 3 seconds in Stata/SE 17 from close to 20 seconds in Stata/SE 16!

**Table 1: Stata 17 versus Stata 16 timings in seconds for 20 variables and different observation numbers and edition combinations**
	Mean timings in seconds
Observations and edition	Stata 17	Stata 16	Speedup
10,000
SE	0.08	0.35	4.42
MP4	0.07	0.14	2.02
MP8	0.06	0.10	1.79
100,000
SE	0.14	0.54	3.75
MP4	0.10	0.23	2.36
MP8	0.08	0.16	1.97
1,000,000
SE	0.25	0.77	3.14
MP4	0.16	0.44	2.83
MP8	0.14	0.32	2.54
10,000,000
SE	3.34	19.76	5.92
MP4	2.06	6.90	3.35
MP8	1.89	5.50	2.91

Timings run in Windows 10 on a computer with an i9-9900KS processor at 4.00GHz and 64GB RAM

The collapse command creates a dataset of summary statistics and is one of the most commonly-used data management commands. As the size of the data grows, so necessarily does the runtime. In Stata 17, depending on dataset size, collapse sees speedups of between 6 and 13 times for computation of a simple mean and between 40 and 70 times for computation of statistics like medians and standard deviations. Table 2 shows the results for collapsing a dataset with 10,000,000 observations and varying numbers of collapsed variables for the case where we compute medians and standard deviations.

**Table 2: Stata 17 versus Stata 16 timings in seconds for 10,000,000 observations for different variable number and edition combinations**
Variables and edition	Stata 17	Stata 16	Speedup
10
SE	.3412143	13.96871	40.96581
MP4	.23	16.39493	71.29675
MP8	.2091429	13.41664	64.17162
100
SE	.3068571	13.86514	45.1849d
MP4	.2205714	16.06886	72.86166
MP8	.196	13.41314	68.43816
1,000
SE	.3437143	13.994	40.73298
MP4	.2277143	16.34614	71.79339
MP8	.2117143	13.39286	63.26852
10,000
SE	.3392857	13.92886	41.09007
MP4	.2287143	16.149	70.61243
MP8	.207	13.36543	64.58582
100,000
SE	.3177143	13.97943	44.03442
MP4	.224	16.22057	72.43024
MP8	.1944286	13.38586	68.85059

Timings run in Windows 10 on a computer with an i9-9900KS processor at 4.00GHz and 64GB RAM

For Stata 17, we also attained speed improvements for estimation. The Linear Algebra Package (LAPACK) underlying many of Mata's functions and operators is now powered by Intel Math Kernel Library (MKL). How much faster is the new MKL? Multiplying a 5,000-by-5,000 real matrix in Stata/SE with a real matrix of the same dimension takes about 13 seconds using MKL in Stata 17 compared with 70 seconds in Stata 16.

Timing of multiplication of two real matrices in seconds:

Edition	Size	MKL	non-MKL
MP8	5,000 by 5,000	2.55	10.26
MP8	10,000 by 10,000	17.28	85.60

MP4	5,000 by 5,000	3.62	15.95
MP4	10,000 by 10,000	28.22	127.24

SE	5,000 by 5,000	13.64	70.61
SE	10,000 by 10,000	108.33	566.99

Timings run in Windows 10 on a computer with an i9-9900KS processor at 4.00GHz and 64GB RAM

Timing of cholesky() in seconds:

Edition	Size	MKL	non-MKL
MP8	5,000 by 5,000	0.42	16.69
MP8	10,000 by 10,000	2.91	133.60

MP4	5,000 by 5,000	0.69	16.69
MP4	10,000 by 10,000	5.03	133.70

SE	5,000 by 5,000	2.41	18.62
SE	10,000 by 10,000	16.66	133.63

Timings run in Windows 10 on a computer with an i9-9900KS processor at 4.00GHz and 64GB RAM

LAPACK is used in computations by many estimation commands, so they are automatically faster too.

The import delimited command for importing data from CSV and other delimited text files is now parallelized in Stata/MP. It imports large datasets up to four times faster in Stata 17.

Last, but not least, the mixed command for fitting multilevel mixed-effects models is faster. In our timings, models with 10,000 panels, 10 time periods, and 5 random slope parameters run 2 to 3 times faster in Stata 17 than in Stata 16. Similar speed improvements occurred for different numbers of panels, time periods, and slope coefficients.

We continuously look for ways to make Stata faster. We actively investigate, code, and test new algorithms in data management and estimation routines, and we will keep you informed of the latest developments.

This page announced the new features in Stata 17. Please see our Stata 19 page for the new features in Stata 19.

Making Stata faster

Highlights

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies

Stata/MP4 Annual License (download)

This page announced the new features in Stata 17. Please see our Stata 19 page for the new features in Stata 19.

Making Stata faster

Highlights

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies