Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: large data sets (was st: A faster way to gsort)
From
Jeph Herrin <[email protected]>
To
[email protected]
Subject
Re: large data sets (was st: A faster way to gsort)
Date
Fri, 14 Mar 2014 10:04:05 -0400
On 3/13/2014 10:44 PM, Joseph Coveney wrote:
It sounds like you're pulling modest-to-large result sets out of the database,
saving them as SAS dataset files and then going back and sort-merging them via
PROC SQL with multigigabyte-sized result sets likewise pulled out of the
database en passant--a situation that even SAS aficionados recommend avoiding in
favor of pass-through queries.
I have not being doing that, but it is what the SAS analysts in this
environment do - and it's one reason they prefer not to use Stata. I do
as much as I can in native SQL, and then roll the results up in Stata.
But this requires iterating queries over eg calendar year to ensure that
the results I pull down are manageably small.
But the first point is an important one - my primary role here is not
data analyst, mostly there are other analysts using SAS to create
datasets that I can analyze in Stata. And it is likely to stay that way
as long as SAS has the edge on data management using large databases.
cheers,
Jeph
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/