Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | kcrow@stata.com (Kevin Crow, StataCorp LP) |
To | "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |
Subject | RE: st: odbc load a large table, hitting max obs limit |
Date | Fri, 07 Sep 2012 10:07:03 -0500 |
Dimitriy Masterov <dvmaster@gmail.com> wrote: >I am trying to load a very large table from a Teradata database. It's slower >than molasses on a cold winter morning, and then gives me an error message that >I have exceeded the maximum number of observations. However, the table only >contains only 55% of that limit in terms of rows (1,186,316,312 out of a >theoretical 2,147,483,647). I have Stata MP 12.1 on a powerful Linux machine >with 1 TB of RAM, so I hoped I could get close to the theoretical limit. When -odbc load- issues the SELECT command to the database the entire table is returned back from the database into memory. Next, the table data in memory is converted into a Stata dataset in memory. Dimitriy has two datasets in memory, the table data and Stata data, so his computer is running out of memory. The solution for Dimitriy is to divide the table data into multiple sections using a WHERE clause with his SELECT statement. Next, import each section of data into Stata using -odbc load, exec- and save the data. Last, append the sections of data together using the -append- command. Dividing the data into multiple sections will resolve this memory error and speed up the process. Kevin -Kevin Crow kcrow@stata.com * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/