Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: RE: The Future of Statistical Computing


From   David Airey <[email protected]>
To   [email protected]
Subject   Re: st: RE: RE: The Future of Statistical Computing
Date   Fri, 23 Jan 2009 12:55:10 -0600

Please post your code for these, or write a Stata Journal article on how you did this if for real. WOW.

On Jan 23, 2009, at 11:58 AM, Sergiy Radyakin wrote:

Dear Timothy,

allow me to disaggree on the graphics issue. Stata graphs can be quite
attractive (see links) and once you dig through the undocumented
graphics commands, the possibilities are endless. See examples with
climatic data plot, contour plot and the legendary Lenna test image
here:

http://img216.imageshack.us/my.php?image=climatedataineditordc4.png
http://img230.imageshack.us/my.php? image=contourplotingrapheditoim1.png
http://img216.imageshack.us/my.php?image=lennaingrapheditorry4.png

(the host site ImageShack will attempt to display pop-up
advertisements, which are not related to these examples)

Click the image and maximize the browser window to see it in full- size.
Contour plot was originally rendered at higher resolution (1.54MP),
but the host didn't accept such large images, so it was scaled down -
thus some noise around text.

So I think the tools are available. It is a matter of actually writing
code and using them.

Best regards,
   Sergiy Radyakin

PS: if you are reading this message in an archive after about a month
after this message is posted, the above links may be broken. Sorry for
an inconvenience, but statalist does not allow attachements.



On Fri, Jan 23, 2009 at 11:46 AM, Mak, Timothy
<[email protected]> wrote:
Thank you Stas and Nick for sharing these very interesting articles.

My humble opinion is that:

Data mining must be growing at a massive pace, and I can easily imagine there's a great market for non-statistician friendly software that can graphically summarize complicated data at clicks of a few buttons. For example let's imagine a scenario where a small company wants to investigate the buying behaviour of its clients. Let's imagine the business is similar to a supermarket. To find out which products tend to be bought together, he might ask the software to 'summarize it for him'. And the software outputs a PCA graph of the first 2 components. Out also come a dialog box 'Would you want to look at the data from another way?' Clicking 'Yes' gives a rotated factor analysis of the same data, with scores plot on the two axes. Another click gives a multidimensionally scaled version of the graph. Another click gives a 3-d scatter plot. Another click gives you a dendrogram from a cluster analysis, and so on... The business manager merely needs to choose the graph that he unde!
rs!
tands, that he can communicate to whoever he needs to. He doesn't need to care whether the assumptions of the analyses are correct. In any case, making decisions based on the 'best' model is probably not going to significantly improve his business performance over any other 'good-looking' model anyway. Of course the manager has to understand that the future is always unpredictable, no matter how good your analyses are.

I'm describing the scenario of a small hypothetical business, but we can imagine similar demands from the internet-using public wanting to quickly summarize data on the internet graphically. I think Wilkinson is making this point - there's a lot more opportunities out there in this area.

Of course traditional statistics will continue to have its place, and certainly within academia, and for anyone who needs to publish some serious results. Data mining itself grew from traditional statistics, and will continue to learn from traditional statistical techniques. So traditional statisticians must also try to learn from data-mining techniques.

So where does Stata come into all this?

Well I can easily imagine that 10 years down the line, SPSS and many other software will have incorporated many of the sophisticated graphical functions described in Wilkinson's book, and all easily accessible for a non-statistician. So long as it can still provide reliable regression and ANOVA results, many might be attracted to it by these amazing graphics that it is able to produce. If somebody only has a budget for one piece of general statistical software, which one would he choose?

Stata must therefore keep up with the technological development on the graphical and data-mining front. And I trust that Stata, being so very selective on its components, would surely only choose the best features to incorporate, rather than trying to do everything.

However, although Nick might disagree, at present, I don't really think that graphics is a strength in Stata. Compared to the myriads of graphs that R can do, Stata can only do simple plots. The main impediment is probably that Stata graphics is not programmable by most users. Could this possibly change in the coming years?

Mata must be a significant contribution to Stata. However, compared to R, I think it is difficult to use. Having to switch between two languages (and two environments) really confuses me. That'll always be its weakness. However, I still like Stata very much, not least because of the immensely helpful community here, and the excellent manuals and support. As I said in an earlier post, though, I think a debug mode in mata would be a welcome addition...

Hope my comments are useful.

Tim


This sort of software would have a great appeal to medium and large companies.

-----Original Message-----
From: [email protected] [mailto:[email protected] ] On Behalf Of Nick Cox
Sent: 22 January 2009 19:18
To: [email protected]
Subject: st: RE: The Future of Statistical Computing

Thanks to Stas for publicising this paper. My take is the opposite of
his:
Data mining seems to me far more over-hyped than statistical software.

I reviewed Leland's book for the Journal of Statistical Software in
2007.
He exercised his right to reply. Both pieces are accessible at

<http://www.jstatsoft.org/v17/b03>

By an odd kind of symmetry, that makes me wonder whether the vendors of competitor software will be allowed to reply in due course to Leland's
comments in this paper!

The Stata write-up doesn't look outrageous to me. (Clearly Leland
couldn't bring himself to compliment Stata's graphics.)
But it is behind the curve in not mentioning Mata.

Nick
[email protected]

Stas Kolenikov

The recent issue of Technometrics (vol 50 (4), I've just received it)
has an extensive article with the title in the subject line by Leland
Wilkinson, an extremely smart guy at the interface of statistics and
computer science, the author of SYSTAT and "The Grammar of Graphics"
book (totally incomprehensible to me, but a delight for Vince W, I am
sure :)). The link is http://pubs.amstat.org/toc/tech/50/4. He says,
"Statisticians interested in statistical computing and its future
incarnations will have to engage in joint research with computer
scientists to continue to have an influence." Catching up has been the
situation in data mining for some while now; and it may look like
advances in computing everywhere might phase statisticians out.

There are two paragraphs about Stata (ranked eighth in revenues after
SAS, SPSS, Matlab, Minitab, Statistica, S-Plus and JMP):

"Stata was originally the product of Bill Gould and a small group of
economists from UCLA. It has grown to be a full-featured analytic
company. The distinctive appeal of the package is its expressive and
concise programming language, based on C. Stata's unusual strengths
are in discrete variable modeling, longitudinal/panel designs,
survival analysis, time series analysis, and survey statistics.

Like S-PLUS, Stata will have to deal with the growth of R in its own
field-programmable statistics and data analysis. Unlike S-PLUS,
however, Stata's peculiar strengths and language are different enough
from R to make it a viable alternative, particularly for
economists.Moreover, the Stata user community is intensely loyal, so
we should expect Stata to continue to grow at a respectable rate."

An interesting reading. Stata developers including the top SSC
contributors might want to check it out.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index