|
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: Stata code to run R code from within Stata and return certain pieces of the results as Stata macros
From |
Phil Schumm <[email protected]> |
To |
[email protected] |
Subject |
Re: st: Stata code to run R code from within Stata and return certain pieces of the results as Stata macros |
Date |
Sat, 31 May 2008 09:52:58 -0500 |
On May 30, 2008, at 7:20 PM, Salah Mahmud wrote:
I think a smart solution is possible and opens the door for Stata
users to access all the cutting edge statistical facilities only
available for R. A version 1 of this Rbridge might do the
following: 1. Export a subset/all the data into a csv file and
construct the necessary R code to import that data into R.
Why go through a text file? Why not just save a temporary file (in
Stata format), and read it into R with the foreign package?
I'm sure the devil is in the details (eg there are issues with
coordinating the running of Stata and R). For instance, Stata may
have to go to sleep until R signals that the code execution is over
etc. But the above does not seem any more daunting than the average
ado out there.
You're kidding, right?
The advantages are obvious. R statistical and graphical utilities
could be called from within Stata do files. For instance I could
plot a cumulative incidence curve in Stata and add a p-value that
is calculated using a test that is only available in R (e.g., Gray
test). I'm still able to use all Stata superb facilities for
handling complex time-to-event data but I could still pass a simple
dataset to R with instructions to run Gray test and return the p-
value that I will then add to my cumulative incidence plot.
This approach might be more efficient that trying to translate R
code to Stata code and definitely better than running separate R
and Stata scripts and transferring the results "manually" between
the two.
I'm not trying to sound negative, but since you posted this to the
list, I presume you are interested in getting feedback. While it
might be fun to think about a world in which one could seamlessly
call R functions from within Stata (and have them act on Stata
objects), trying to simulate this with a bunch of hacks would, IMHO,
probably not be worth doing. However, the general goal of making it
easier for Stata users to occasionally use R functions and/or
packages is a good one. Currently, doing this requires:
1) getting data and/or other objects (e.g., matrices) out of Stata
and into R
2) writing the R command(s) necessary to do the task
3) getting the results (in whatever form) back into Stata, if necessary
(Note that the existing command -rsource- doesn't really address 1-3,
but instead, once 1-2 (and perhaps 3) are solved, facilitates
workflow by permitting you to execute an R source file from Stata and
capture the printed output to the screen and/or log file.)
There is, I think, quite a lot that could be profitably done to
facilitate (1) and (3). It is currently pretty straightforward to
open a Stata dataset in R using the foreign package, but I don't
believe there's an easy way to read a collection of Mata objects
(e.g., as saved by -mata matsave-) into R. Similarly, while you can
also write a Stata file using foreign, as you have noted, most R
results come in the form of compound objects, and there's no easy way
to get these back into Stata.
One way to approach this would be to create an abstraction layer that
could read both Stata and R datasets/object files, and translate
between them. Python would be ideal for writing such a layer, since
much of the work to interface with R has already been done (e.g., see
http://www.omegahat.org/RSPython/index.html). You could then write
Python methods to read both .dta files and files containing Mata
objects (i.e., as created using -mata matsave- or -fopen()-). This
has been on my to-do list for some time, since we do a lot with
Python and it would be great to be able to pass data from Python to
Stata (and vice versa) more easily.
Once this has been done, one could imagine a Stata command that
automatically saved most (if not all) Stata objects in memory (i.e.,
the dataset, macros, Mata objects, and the contents of r(), e(),
etc.) into a set of temporary files (in standard Stata formats). One
could then switch to R, and access these objects through the
abstraction layer. Similarly, one could then use the abstraction
layer to save one or more R objects to disk in Stata formats so that
they could be read back in from Stata (using standard Stata commands,
which could also be wrapped for ease of use, if necessary).
Alternatively, one could just finish the R session by saving the
entire workspace, and then access the abstraction layer from Stata to
pull objects selectively out of this workspace.
Note that this approach would not involve any interprocess
communication between Stata and R, and would therefore be easily
transferrable to all platforms on which both Stata and R run (since
Python is easily available for all of these).
Now, Stata's complete set of data structures (i.e., variables,
matrices, macros, scalars, etc.) is quite different from R's;
moreover, figuring out how to move R's various types of result
objects into Stata would take some serious work. For this reason, a
complete implementation of an abstraction layer would take *a lot* of
work, and there may be some areas that simply cannot be addressed in
a practical way. Thus, if I were going to do this project, I'd start
by creating an outline of what the abstraction layer might look like,
and then pick just one, clearly defined area to implement first as a
proof-of-concept. This would, by itself, give you some
functionality, and you could then decide whether and how to begin
extending it.
-- Phil
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/