[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: selecting obs while reading in huge data set

From	Michael Ingre <[email protected]>
To	[email protected]
Subject	Re: st: selecting obs while reading in huge data set
Date	Thu, 19 Aug 2004 15:11:17 +0200

On 2004-08-19, at 10.15, Steve Stillman wrote:

Sascha,
I have a recommendation that I wouldn't usually make. I have been
recently doing work with matched employer-employee data with over 30
million obs, so we have been running into the same problem as you. SAS
is much better for large dataset merges than Stata. In particular, proc
SQL is remarkably fast at doing these types of merges (likely because
SQL is written with this type of operation in mind).

Well there is was, likely the last time I will recommend SAS over Stata.

Cheers,
Steve

Steve and Sascha

There is a solution that does not include SAS. Stata also support SQL databases. If you set up an ODBC connection to an SQL database then the -odbc load- command will allow you to load datasets directly from the SQL server. It is even possible to execute a SQL statement directly from Stata with -odbc exec("SqlStmt")- or -odbc sqlfile("filename")-. This way you could merge and load only the observations you are interested in directly from Stata in one command.

Before you can do this you need to download your data to an SQL server. If you don't have access to one, you could download one for FREE here: http://dev.mysql.com/downloads/mysql/4.0.html. After that you need to set up and ODBS driver manager, and an ODBC driver and download your data.

This is a bit of work but if you plan to do it a lot, it should be worth it.

I tried it a couple of months ago and it worked very well with Stata, however, labels are lost and there is only one code for missing values. Also, saving and storing data is a bit slower than from disc files.

If you have a Mac I would recommend the Complete MySQL package that has an easy set up and comes with drivers and additional software: http://www.apple.com/downloads/macosx/unix_open_source/ completemysql.html

Michael

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/

References:
- RE: st: selecting obs while reading in huge data set
  - From: "Steve Stillman" <[email protected]>

Prev by Date: Re: st: Re: graph prediction of transformed variable from regression
Next by Date: st: Re: selecting obs while reading in huge data set
Previous by thread: RE: st: selecting obs while reading in huge data set
Next by thread: RE: st: selecting obs while reading in huge data set
Index(es):
- Date
- Thread