These discussions are interesting and enjoyable. Exchanging
impressions and prejudices is a lot of fun.
One thing that strikes me is that almost no-one has any pertinent
data, even on number of users (however defined). This is,
or should be, a statistically-minded list, but where are the
data? The key factual question here seems to be how many people
didn't buy Stata because they could get and use R. Who knows?
In any case, they are not on this list, presumably, to come forward
with their testimony (although you might know someone else who
fits the description). Conversely, there may well be people who
don't use R because Stata does what they want who are also on
this list.
R is a wonderful thing and many people can quite happily use R and
Stata and appreciate their different strengths. I have R on my
computer. I don't use it much, but the reasons for that are nothing
to do with any of its limitations. As an outside observer wishing
R well, I have been wrong repeatedly about R. I thought it would
struggle if its founders became too busy or too bored to keep
up the momentum. I thought it would struggle because it did not pay
much attention to its interface, and remained command-oriented.
I thought it would struggle because it was not well documented,
but then academic values took over and the number of books on
R is exploding. My incorrect guesses matter to no-one, but they
may exemplify how Stata people have been mis-reading R. More
seriously, I suspect that even some R people have been happily
surprised at its impact. (Some of that has to do with the way
that the people behind S-PLUS treated their academic market,
I guess.)
I have no data, but R shows the qualitative signs of being
on an exponential. (So does Stata, but I couldn't guess at
the relative growth rates.) Ecology teaches us that exponential
phases are followed by crashes or levelling off, however.
However, another wild guess is that most Stata users don't
think much about R because to them it does not appear a
real alternative. Naturally, "appear" is a key word. The
reasons are many and different, including the attractions
of many of Stata's specialised modelling commands, Stata's
uses for data management and the existence of tech support.
R's core market appears to be academic statistics. In that
field there is still some snobbery about people who use
statistics, even if they are biostatisticians,
econometricians, etc., who are in many cases also
developing new methodology. One leading light in the
R community has a roadshow in which he describes Stata
as a "niche product". I wrote to correct him, and not
surprisingly never got a reply. Relatively few
statisticians in Depts of Statistics use Stata. However,
academic statistics is in many ways the niche market!
In almost any institution, the number of people using
statistics is much greater than that of the statisticians
(strong sense). There are plenty of places in which the
Dept of Biostatistics (or whatever it is called) is much
bigger than the Dept of Statistics (ditto).
However, it is hazardous to generalise. There are several
application fields in which R is making a big impact
too. Ecology is one.
But it strikes me that apart from a very big difference
that R is free and Stata is not, the similarities are
striking. R and Stata are both pretty unfriendly to
casual users, but repay those who work at mastering syntax
and come to appreciate the power and consistency of each
language. Both appeal to user-programmers, much of whose
work becomes publicly available. In each case the
people at the top are driven by high standards and
"getting it right".
For all that it is based on
a commercial product, the Stata community is strongly
affected by "open source" ideals: you need look no further
than packages available through the Stata Journal, SSC or
individuals' websites. A long time ago one or two people
tried to sell their Stata programs to other
users, but they were just ignored and users became inspired
by the idea of sharing their code with each other. (It
is not unique in this: there are communities based on MATLAB
use.) (There are plenty of consultants on this list who
use Stata intensively; I am not clear on whether any of them
sells Stata programming.)
To summarise, my impression is that R and Stata
have some impact on each other, but that is plenty
of room for both. The R community and the Stata community
should be friendly to each other. Not many people are
prominent in both, but there are plenty of people well
disposed to the other in each.
The real enemy is
Gosh, I have to go.
Nick
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/