Thank you Stas and Nick for sharing these very interesting articles.
My humble opinion is that:
Data mining must be growing at a massive pace, and I can easily imagine there's a great market for non-statistician friendly software that can graphically summarize complicated data at clicks of a few buttons. For example let's imagine a scenario where a small company wants to investigate the buying behaviour of its clients. Let's imagine the business is similar to a supermarket. To find out which products tend to be bought together, he might ask the software to 'summarize it for him'. And the software outputs a PCA graph of the first 2 components. Out also come a dialog box 'Would you want to look at the data from another way?' Clicking 'Yes' gives a rotated factor analysis of the same data, with scores plot on the two axes. Another click gives a multidimensionally scaled version of the graph. Another click gives a 3-d scatter plot. Another click gives you a dendrogram from a cluster analysis, and so on... The business manager merely needs to choose the graph that he unders!
tands, that he can communicate to whoever he needs to. He doesn't need to care whether the assumptions of the analyses are correct. In any case, making decisions based on the 'best' model is probably not going to significantly improve his business performance over any other 'good-looking' model anyway. Of course the manager has to understand that the future is always unpredictable, no matter how good your analyses are.
I'm describing the scenario of a small hypothetical business, but we can imagine similar demands from the internet-using public wanting to quickly summarize data on the internet graphically. I think Wilkinson is making this point - there's a lot more opportunities out there in this area.
Of course traditional statistics will continue to have its place, and certainly within academia, and for anyone who needs to publish some serious results. Data mining itself grew from traditional statistics, and will continue to learn from traditional statistical techniques. So traditional statisticians must also try to learn from data-mining techniques.
So where does Stata come into all this?
Well I can easily imagine that 10 years down the line, SPSS and many other software will have incorporated many of the sophisticated graphical functions described in Wilkinson's book, and all easily accessible for a non-statistician. So long as it can still provide reliable regression and ANOVA results, many might be attracted to it by these amazing graphics that it is able to produce. If somebody only has a budget for one piece of general statistical software, which one would he choose?
Stata must therefore keep up with the technological development on the graphical and data-mining front. And I trust that Stata, being so very selective on its components, would surely only choose the best features to incorporate, rather than trying to do everything.
However, although Nick might disagree, at present, I don't really think that graphics is a strength in Stata. Compared to the myriads of graphs that R can do, Stata can only do simple plots. The main impediment is probably that Stata graphics is not programmable by most users. Could this possibly change in the coming years?
Mata must be a significant contribution to Stata. However, compared to R, I think it is difficult to use. Having to switch between two languages (and two environments) really confuses me. That'll always be its weakness. However, I still like Stata very much, not least because of the immensely helpful community here, and the excellent manuals and support. As I said in an earlier post, though, I think a debug mode in mata would be a welcome addition...
Hope my comments are useful.
Tim
This sort of software would have a great appeal to medium and large companies.
-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Nick Cox
Sent: 22 January 2009 19:18
To: [email protected]
Subject: st: RE: The Future of Statistical Computing
Thanks to Stas for publicising this paper. My take is the opposite of
his:
Data mining seems to me far more over-hyped than statistical software.
I reviewed Leland's book for the Journal of Statistical Software in
2007.
He exercised his right to reply. Both pieces are accessible at
<http://www.jstatsoft.org/v17/b03>
By an odd kind of symmetry, that makes me wonder whether the vendors of
competitor software will be allowed to reply in due course to Leland's
comments in this paper!
The Stata write-up doesn't look outrageous to me. (Clearly Leland
couldn't bring himself to compliment Stata's graphics.)
But it is behind the curve in not mentioning Mata.
Nick
[email protected]
Stas Kolenikov
The recent issue of Technometrics (vol 50 (4), I've just received it)
has an extensive article with the title in the subject line by Leland
Wilkinson, an extremely smart guy at the interface of statistics and
computer science, the author of SYSTAT and "The Grammar of Graphics"
book (totally incomprehensible to me, but a delight for Vince W, I am
sure :)). The link is http://pubs.amstat.org/toc/tech/50/4. He says,
"Statisticians interested in statistical computing and its future
incarnations will have to engage in joint research with computer
scientists to continue to have an influence." Catching up has been the
situation in data mining for some while now; and it may look like
advances in computing everywhere might phase statisticians out.
There are two paragraphs about Stata (ranked eighth in revenues after
SAS, SPSS, Matlab, Minitab, Statistica, S-Plus and JMP):
"Stata was originally the product of Bill Gould and a small group of
economists from UCLA. It has grown to be a full-featured analytic
company. The distinctive appeal of the package is its expressive and
concise programming language, based on C. Stata's unusual strengths
are in discrete variable modeling, longitudinal/panel designs,
survival analysis, time series analysis, and survey statistics.
Like S-PLUS, Stata will have to deal with the growth of R in its own
field-programmable statistics and data analysis. Unlike S-PLUS,
however, Stata's peculiar strengths and language are different enough
from R to make it a viable alternative, particularly for
economists.Moreover, the Stata user community is intensely loyal, so
we should expect Stata to continue to grow at a respectable rate."
An interesting reading. Stata developers including the top SSC
contributors might want to check it out.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/