Title | Large datasets under Windows | |
Author | Kevin S. Turner, StataCorp |
First, make sure you have installed enough memory or allowed for enough virtual memory. If you have and are still getting this error, continue reading.
Under all current 32-bit Windows operating systems (Windows 8, 7, Vista, XP, 2000, NT, ME, 98, 95), the total available address space for any application is 2.1 GB. If you have a dataset larger than 2.1 GB, you will not be able to load it on Stata for Windows. This is simply a limitation of the operating system.
Unfortunately, even if your dataset is under the 2.1-GB limit, you may run into difficulty when loading it into Stata. The fault again lies with how Windows manages the 2.1-GB address space. When a typical application loads, there are usually several libraries (or DLLs) that are loaded as well. These libraries are usually loaded into the 2.1-GB space on the upper end but not in any deterministic order. Microsoft has assured us that there is no way to prevent these libraries from loading into arbitrary addresses; thus, fragmenting the available space. When Stata tries to load a dataset, it requests from Windows the largest contiguous space in the 2.1-GB range. Depending on where Windows loaded the initial libraries, this may be 1.8 GB, 1.3 GB, or even less. You may be surprised to find that a 1.4-GB dataset loaded fine one time but failed to load later. This is simply an unfortunate side effect of Windows memory management.
As of Stata 11.1, some of the dependencies on external DLLs were removed, reducing memory fragmentation and increasing the amount of memory available to Stata. If you are using 32-bit Windows XP and you are still having trouble allocating memory, you should read “Memory allocation in Windows XP”.
By now, you are wondering what your alternatives are. Since July 2007, several operating system alternatives with 64-bit support have become available. See our list of operating systems compatible with Stata. The 64-bit platform will enable you to work with large datasets. Depending on your operating system, you should be able to allocate as much memory as you have on the machine, minus the system requirements. To take advantage of this technology, you will need 64-bit–compatible hardware, a 64-bit operating system, and, of course, a 64-bit version of Stata.
As a last resort, you may consider trimming any unnecessary data from your dataset or dividing the dataset into two files. You may want to use the second syntax of the use command to read in just the observations/variables you want. For example:
. describe using auto.dta Contains data 1978 Automobile Data obs: 74 26 Mar 2007 09:52 vars: 12 size: 3,478 ------------------------------------------------------------------------------- storage display value variable name type format label variable label ------------------------------------------------------------------------------- make str18 %-18s Make and Model price int %8.0gc Price mpg int %8.0g Mileage (mpg) rep78 int %8.0g Repair Record 1978 headroom float %6.1f Headroom (in.) trunk int %8.0g Trunk space (cu. ft.) weight int %8.0gc Weight (lbs.) length int %8.0g Length (in.) turn int %8.0g Turn Circle (ft.) displacement int %8.0g Displacement (cu. in.) gear_ratio float %6.2f Gear Ratio foreign byte %8.0g origin Car type ------------------------------------------------------------------------------- Sorted by: foreign . use mpg price for using auto.dta in 1/50, clear (1978 Automobile Data) . describe Contains data from auto.dta obs: 50 1978 Automobile Data vars: 3 24 June 2013 15:56 size: 250 (_dta has notes) ------------------------------------------------------------------------------- storage display value variable name type format label variable label ------------------------------------------------------------------------------- price int %8.0gc Price mpg int %8.0g Mileage (mpg) foreign byte %8.0g origin Car type ------------------------------------------------------------------------------- Sorted by: foreign
Depending on your data and analysis, this may not be feasible and is offered only as a suggestion.