Fred Wolfe <[email protected]> asked about Stata 8's speed in
two contexts:
1. -use- and -merge-
2. -graph-
In this posting, I want to address (1).
Is Stata 8 slower using datasets?
---------------------------------
Stata 8 is just as fast saving and using datasets as Stata 7. Fred, however,
observed that Stata 8 appears to be 10 times slower! Using Stata 8, Fred
needs to -use- his old Stata 7 datasets and then -save- them again:
. use <whatever>
. save, replace
That will convert Fred's datasets into Stata 8 format, and thereafter, the
quickness Fred expects will return.
Why resaving datasets speeds up -use-
-------------------------------------
Stata 8 allows 26 new missing-value codes with the result that, internally,
Stata stores missing values differently. When you -use- (or -merge- or
-append-) an old-format dataset, Stata not only loads the dataset, Stata
converts it as well.
As an example, I created a 130MB dataset containing 200,000 observations
on 200 variables using Stata 7. Using Stata 7,
time to -save- 0.95 seconds
time to -use- 1.32 seconds
Then I fired up Stata 8 and used this Stata-7 format dataset:
time to -use- 8.92 seconds
Still in Stata 8, I resaved the data and tried -use- again:
time to -save, replace- 0.95 seconds
time to -use- 1.25 seconds
In this example, the time to convert is substantial, being 8.92 - 1.25 = 7.67
seconds. That same overhead will appear in -merge- and -append- if I
leave the dataset in Stata-7 format.
In smaller datasets, the conversion time is hardly noticable.
It is convenient that Stata can work with old datasets without you needing to
convert them into modern format, but understand that Stata is converting your
datasets on the fly each and every time you work with them. With large
datasets, I recommend converting the datasets only once.
-- Bill
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/