|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: RE: RE: The Future of Statistical Computing
Please post your code for these, or write a Stata Journal article on
how you did this if for real. WOW.
On Jan 23, 2009, at 11:58 AM, Sergiy Radyakin wrote:
Dear Timothy,
allow me to disaggree on the graphics issue. Stata graphs can be quite
attractive (see links) and once you dig through the undocumented
graphics commands, the possibilities are endless. See examples with
climatic data plot, contour plot and the legendary Lenna test image
here:
http://img216.imageshack.us/my.php?image=climatedataineditordc4.png
http://img230.imageshack.us/my.php?
image=contourplotingrapheditoim1.png
http://img216.imageshack.us/my.php?image=lennaingrapheditorry4.png
(the host site ImageShack will attempt to display pop-up
advertisements, which are not related to these examples)
Click the image and maximize the browser window to see it in full-
size.
Contour plot was originally rendered at higher resolution (1.54MP),
but the host didn't accept such large images, so it was scaled down -
thus some noise around text.
So I think the tools are available. It is a matter of actually writing
code and using them.
Best regards,
Sergiy Radyakin
PS: if you are reading this message in an archive after about a month
after this message is posted, the above links may be broken. Sorry for
an inconvenience, but statalist does not allow attachements.
On Fri, Jan 23, 2009 at 11:46 AM, Mak, Timothy
<[email protected]> wrote:
Thank you Stas and Nick for sharing these very interesting articles.
My humble opinion is that:
Data mining must be growing at a massive pace, and I can easily
imagine there's a great market for non-statistician friendly
software that can graphically summarize complicated data at clicks
of a few buttons. For example let's imagine a scenario where a
small company wants to investigate the buying behaviour of its
clients. Let's imagine the business is similar to a supermarket. To
find out which products tend to be bought together, he might ask
the software to 'summarize it for him'. And the software outputs a
PCA graph of the first 2 components. Out also come a dialog box
'Would you want to look at the data from another way?' Clicking
'Yes' gives a rotated factor analysis of the same data, with scores
plot on the two axes. Another click gives a multidimensionally
scaled version of the graph. Another click gives a 3-d scatter
plot. Another click gives you a dendrogram from a cluster analysis,
and so on... The business manager merely needs to choose the graph
that he unde!
rs!
tands, that he can communicate to whoever he needs to. He doesn't
need to care whether the assumptions of the analyses are correct.
In any case, making decisions based on the 'best' model is probably
not going to significantly improve his business performance over
any other 'good-looking' model anyway. Of course the manager has to
understand that the future is always unpredictable, no matter how
good your analyses are.
I'm describing the scenario of a small hypothetical business, but
we can imagine similar demands from the internet-using public
wanting to quickly summarize data on the internet graphically. I
think Wilkinson is making this point - there's a lot more
opportunities out there in this area.
Of course traditional statistics will continue to have its place,
and certainly within academia, and for anyone who needs to publish
some serious results. Data mining itself grew from traditional
statistics, and will continue to learn from traditional statistical
techniques. So traditional statisticians must also try to learn
from data-mining techniques.
So where does Stata come into all this?
Well I can easily imagine that 10 years down the line, SPSS and
many other software will have incorporated many of the
sophisticated graphical functions described in Wilkinson's book,
and all easily accessible for a non-statistician. So long as it can
still provide reliable regression and ANOVA results, many might be
attracted to it by these amazing graphics that it is able to
produce. If somebody only has a budget for one piece of general
statistical software, which one would he choose?
Stata must therefore keep up with the technological development on
the graphical and data-mining front. And I trust that Stata, being
so very selective on its components, would surely only choose the
best features to incorporate, rather than trying to do everything.
However, although Nick might disagree, at present, I don't really
think that graphics is a strength in Stata. Compared to the myriads
of graphs that R can do, Stata can only do simple plots. The main
impediment is probably that Stata graphics is not programmable by
most users. Could this possibly change in the coming years?
Mata must be a significant contribution to Stata. However, compared
to R, I think it is difficult to use. Having to switch between two
languages (and two environments) really confuses me. That'll always
be its weakness. However, I still like Stata very much, not least
because of the immensely helpful community here, and the excellent
manuals and support. As I said in an earlier post, though, I think
a debug mode in mata would be a welcome addition...
Hope my comments are useful.
Tim
This sort of software would have a great appeal to medium and large
companies.
-----Original Message-----
From: [email protected] [mailto:[email protected]
] On Behalf Of Nick Cox
Sent: 22 January 2009 19:18
To: [email protected]
Subject: st: RE: The Future of Statistical Computing
Thanks to Stas for publicising this paper. My take is the opposite of
his:
Data mining seems to me far more over-hyped than statistical
software.
I reviewed Leland's book for the Journal of Statistical Software in
2007.
He exercised his right to reply. Both pieces are accessible at
<http://www.jstatsoft.org/v17/b03>
By an odd kind of symmetry, that makes me wonder whether the
vendors of
competitor software will be allowed to reply in due course to
Leland's
comments in this paper!
The Stata write-up doesn't look outrageous to me. (Clearly Leland
couldn't bring himself to compliment Stata's graphics.)
But it is behind the curve in not mentioning Mata.
Nick
[email protected]
Stas Kolenikov
The recent issue of Technometrics (vol 50 (4), I've just received it)
has an extensive article with the title in the subject line by Leland
Wilkinson, an extremely smart guy at the interface of statistics and
computer science, the author of SYSTAT and "The Grammar of Graphics"
book (totally incomprehensible to me, but a delight for Vince W, I am
sure :)). The link is http://pubs.amstat.org/toc/tech/50/4. He says,
"Statisticians interested in statistical computing and its future
incarnations will have to engage in joint research with computer
scientists to continue to have an influence." Catching up has been
the
situation in data mining for some while now; and it may look like
advances in computing everywhere might phase statisticians out.
There are two paragraphs about Stata (ranked eighth in revenues after
SAS, SPSS, Matlab, Minitab, Statistica, S-Plus and JMP):
"Stata was originally the product of Bill Gould and a small group of
economists from UCLA. It has grown to be a full-featured analytic
company. The distinctive appeal of the package is its expressive and
concise programming language, based on C. Stata's unusual strengths
are in discrete variable modeling, longitudinal/panel designs,
survival analysis, time series analysis, and survey statistics.
Like S-PLUS, Stata will have to deal with the growth of R in its own
field-programmable statistics and data analysis. Unlike S-PLUS,
however, Stata's peculiar strengths and language are different enough
from R to make it a viable alternative, particularly for
economists.Moreover, the Stata user community is intensely loyal, so
we should expect Stata to continue to grow at a respectable rate."
An interesting reading. Stata developers including the top SSC
contributors might want to check it out.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/