|
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: R and Stata efficiency
On Jul 9, 2007, at 1:13 PM, David Airey wrote:
I was talking to a statistician on our campus about his perception
of the relative efficiency of R compared to Stata. One of the
things he finds annoying about Stata is having to explicitly save
things in addition to estimation:
Examples are: in LR test, I need to save the results of a previous
model. In drawing graphs based on the model fit, I have to save
the coefficient matrix. In doing further inferences on predicted
values, I have to save the predicted values. etc. etc.
Stata commands seem to be designed to only take existant entities
as arguments, but cannot take the results of other functions as
arguments without explicit assignment. In R, results of functions
can always be used as arguments for other functions, making
explicit assignment unnecessary if not needed.
Stata does leave behind model coefficients and such that can be
directly accessed (without saving). But they disappear after the
next estimation. Whereas in R, estimates are always saved in the
way a command is issued (<-) and they remain accessible later,
unless they are overwritten by mistake or intent later.
Stata and R have very different architectures and interfaces, and
thus it's possible for someone very accustomed to one to feel
uncomfortable working in the other. Of course, it's also possible to
use both, and to appreciate the strengths of each (just like working
with different programming languages). That said, one can still have
one's own preferences, and, as a bit of truth-in-advertising, I spend
~99 percent of my statistical life in Stata.
I'm not sure I fully understand the comment above -- it seems to me
as though multiple issues are being raised. For example, a common
idiom in R is to fit a model like this:
results <- lm(y ~ x)
This command places the results into an object called
"results" (technically of class "lm"), which one can then use later
on. In Stata, this would look like:
reg y x
est store results
which would store the estimation results under the name "results" for
further use. Of course in R you can save the object "results" in a
file, whereas in Stata (at least as of version 9) you cannot save the
set of results in a file (though see Michael Blasnik's -estsave- and
Ben Jann's -estwrite- and -estread- wrappers for a workaround).
Now, I suppose you might complain that the Stata example requires two
lines of code while the R example requires only one. Fair enough --
you have a lot of flexibility on the command line in R. However, the
end result is essentially the same, since developers have complete
control over what they return in e() just as they have control over
how they define the object returned by an estimation command in R.
Note also that Stata does have one advantage over R here, at least
for a particular workflow. Suppose I want to fit a model, and then
perform several diagnostics in serial fashion immediately afterward.
In R, I must save the result of the model to do this; for example
results <- lm(y ~ x)
plot(fitted(results), resid(results))
cr.plots(results,"x")
...
where I'm using the cr.plots function from the car library. In
Stata, I don't have to save the results, as long as I don't disturb e():
reg y x
rvfplot
cprplot x
...
Perhaps a minor point, but I wanted to emphasize the fact that the
downside to creating results objects every time you fit a model (as
users often do in R) is that your workspace tends to fill up with
lots of old objects, and you have to clean them out manually. In sum:
1) saving the results of an estimation command requires an explicit
statement in
both Stata and R, though in R you can fit the model and save the
results in a
single statement, but
2) R has no concept of an "active" set of results, and therefore you
must save
any results you want to use in a subsequent command
Thus, you might cast the difference as a difference between making a
bit of additional effort each time you want to save a set of results
for comparison with those from other models versus making a bit of
additional effort to delete the results from models you've fit
previously. Depending on your own wheat-to-chaff ratio (mine is
often quite low), you can decide which you prefer.
One final comment RE use of stored results. In R, once you have fit
several models and stored their results you can do something like this:
plot(results)
plot(other_results)
Now, suppose you've done the same in Stata. What I often see users
do is the following:
est restore results
rvplot
est restore other_results
rvplot
However, this is unnecessary. Instead, you can simply do
est for results: rvplot
est for other_results: rvfplot
Notice that the -estimates for- prefix goes a long way toward
reducing the difference between Stata and R in the ease with which
you can use stored results.
-- Phil
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/