[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Speed issues with Stata 8

From	[email protected] (William Gould, Stata)
To	[email protected]
Subject	Re: st: Speed issues with Stata 8
Date	Thu, 16 Jan 2003 09:12:16 -0600

Fred Wolfe <[email protected]> asked about Stata 8's speed in 
two contexts:

    1.  -use- and -merge- 

    2.  -graph- 

In this posting, I want to address (1).


Is Stata 8 slower using datasets?
---------------------------------

Stata 8 is just as fast saving and using datasets as Stata 7.  Fred, however,
observed that Stata 8 appears to be 10 times slower!  Using Stata 8, Fred
needs to -use- his old Stata 7 datasets and then -save- them again:

        . use <whatever>
        . save, replace

That will convert Fred's datasets into Stata 8 format, and thereafter, the
quickness Fred expects will return.


Why resaving datasets speeds up -use-
-------------------------------------

Stata 8 allows 26 new missing-value codes with the result that, internally,
Stata stores missing values differently.  When you -use- (or -merge- or
-append-) an old-format dataset, Stata not only loads the dataset, Stata
converts it as well.

As an example, I created a 130MB dataset containing 200,000 observations 
on 200 variables using Stata 7.  Using Stata 7, 

        time to -save-              0.95 seconds
        time to -use-               1.32 seconds

Then I fired up Stata 8 and used this Stata-7 format dataset:

        time to -use-               8.92 seconds

Still in Stata 8, I resaved the data and tried -use- again:

        time to -save, replace-     0.95 seconds
        time to -use-               1.25 seconds

In this example, the time to convert is substantial, being 8.92 - 1.25 = 7.67
seconds.  That same overhead will appear in -merge- and -append- if I 
leave the dataset in Stata-7 format.

In smaller datasets, the conversion time is hardly noticable.

It is convenient that Stata can work with old datasets without you needing to
convert them into modern format, but understand that Stata is converting your
datasets on the fly each and every time you work with them.  With large
datasets, I recommend converting the datasets only once.

-- Bill
[email protected]
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: Speed issues with Stata 8
  - From: Fred Wolfe <[email protected]>

Prev by Date: st: Propensity score matching
Next by Date: st: RE: RE: RE: RE: reduce a variables numbers of digits to 4 counting from left
Previous by thread: st: Propensity score matching
Next by thread: Re: st: Speed issues with Stata 8
Index(es):
- Date
- Thread