Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: large data sets (was st: A faster way to gsort)
From
Jeph Herrin <[email protected]>
To
[email protected]
Subject
Re: large data sets (was st: A faster way to gsort)
Date
Thu, 13 Mar 2014 09:29:09 -0400
On 3/12/2014 11:54 PM, Joseph Coveney wrote:
As for #1, wouldn't additional RAM be cheaper than a SAS license? And if
you're maxed-out on memory slots, wouldn't even a more powerful workstation be
cheaper than a SAS license?
My institutional SAS 9.4 license runs me $49, so no.
More pointedly, in this situation, I must work remotely (because the
database is on the order of several TB, and for data security reasons),
so I don't have a lot of control over the environment.
I don't quite follow #3. Aren't Stata's data management operations
incremental? I find a series of Stata's data management commands much easier to
walk through than a single SQL statement stretching for pages.
I wasn't very clear here. But when working with a >1TB database, it's
not practical to do everything in either SAS or Stata. But to *avoid*
writing pages of SQL one wants to submit a query that (say) pulls down a
list of identifiers, then submit a second query that uses that list of
identifiers to pull down related records. To do this second step in
Stata, one would need to be able to write SQL that referenced a Stata
file. The alternative to this incremental approach would be to write
unreadable SQL queries.
Obviously, we all have different wants and expections from Stata. For
me, this is the first 'big data' application I've had for Stata, and it
hasn't done well; I have other 'big data' proposals coming up, and
unfortunately I'm going to have to hedge my endorsement of Stata for
this kind of work.
cheers,
J
As for Stata's doing SQL natively, there is a comment to a post on the Stata
Blog similarly calling for Stata to adopt SQL standard syntax. I know that
Jeff's comment goes beyond that, almost as if to have an ODBC driver or
OLE DB provider for Stata dataset files.
I like SQL and use it daily, but I wouldn't want StataCorp to expend its finite
development resources in that direction. I say this for a number of reasons
(for a couple of examples: the three-valued logic of NULLs and other
peculiarities of SQL; considerations of when ad hoc SQL queries should be
permitted and where upstream data management operations should be manifest for
reasons of efficiency, security and regulatory compliance).
So, if there's a wish-list poll somewhere for Stata 14, put me down as against
SQL in favor of, say, -strunicode-, -menl-, -mcmc- or something along those
lines.
Joseph Coveney
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/