of some amazing things and Jeroen Weesie's mmerge command simplifies
many things that I used to use SAS's SQL command for. However, with
really large data sets, Stata still lags, sometimes unacceptably. Using
the odbc command with mysql or a similar SQL database program is a nice
compromise (although I must say that SAS's SQL command is much simpler
than most things SAS does. Also, their book "SAS Guide to the SQL
Procedure" is quite good and most of it is not SAS specific).
Glenn Hoetker
Assistant Professor of Strategy
College of Business
University of Illinois at Urbana-Champaign
217-265-4081
[email protected]
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Michael Ingre
Sent: Thursday, August 19, 2004 8:11 AM
To: [email protected]
Subject: Re: st: selecting obs while reading in huge data set
On 2004-08-19, at 10.15, Steve Stillman wrote:
> Sascha,
> I have a recommendation that I wouldn't usually make. I have
been
> recently doing work with matched employer-employee data with
over 30
> million obs, so we have been running into the same problem as
you. SAS
> is much better for large dataset merges than Stata. In
particular,
> proc
> SQL is remarkably fast at doing these types of merges (likely
because
> SQL is written with this type of operation in mind).
>
> Well there is was, likely the last time I will recommend SAS
over
> Stata.
>
> Cheers,
> Steve
Steve and Sascha
There is a solution that does not include SAS. Stata also support
SQL
databases. If you set up an ODBC connection to an SQL database
then the
-odbc load- command will allow you to load datasets directly from
the
SQL server. It is even possible to execute a SQL statement
directly
from Stata with -odbc exec("SqlStmt")- or -odbc
sqlfile("filename")-.
This way you could merge and load only the observations you are
interested in directly from Stata in one command.
Before you can do this you need to download your data to an SQL
server.
If you don't have access to one, you could download one for FREE
here:
http://dev.mysql.com/downloads/mysql/4.0.html. After that you
need to
set up and ODBS driver manager, and an ODBC driver and download
your
data.
This is a bit of work but if you plan to do it a lot, it should
be
worth it.
I tried it a couple of months ago and it worked very well with
Stata,
however, labels are lost and there is only one code for missing
values.
Also, saving and storing data is a bit slower than from disc
files.
If you have a Mac I would recommend the Complete MySQL package
that has
an easy set up and comes with drivers and additional software:
http://www.apple.com/downloads/macosx/unix_open_source/
completemysql.html
Michael
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/