Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: where is StataCorp C code located? all in a single executable as compiled binary?
From
"Eric A. Booth" <[email protected]>
To
[email protected]
Subject
Re: st: where is StataCorp C code located? all in a single executable as compiled binary?
Date
Sun, 18 Aug 2013 17:31:02 -0500
<>
Hi Laszlo: I agree that it would be nice if -tabulate,summarize()-
stored values but it doesnt. There are several options available to
store those values and then use them elsewhere. The issues seem to be
(1) ease of parsing the values into a format that you can use for
other analyses and (2) (and more important for you) the speed with
which you can calculate, store, parse, and then use those values.
Some alternatives to collapse include logging the -tabulate,
summarize()- output and then parsing it, using -collapse- to get your
values, or using the compiled -summarize- command to obtain the
values of interest and store them for use elsewhere. I'm sure there
are other options, but below is a comparison of these methods against
the speed of the desired -tabulate, summarize()- solution on a
large-ish fake dataset.
This is not a clean comparison and the values I store for later use
are not exactly the same in every example, but it gives you an idea of
the speed differences of the steps that might be involved for each
approach (that is, preserving the data, summarizing or collapsing or
XX, storing and parsing the output, and restoring the data). The
upshot is that, for this example on my computer, it seems that running
-summarize- in a loop to grab the values you want and store them in a
dataset was the quickest non-tab, summarize()- option I tried (example
4 below), but this would be slower on a lot of data points. Plus,
both Examples 3 & 4 below are both faster than running -tabulate,
summarize()-.
Using -tabulate, summarize()- to get values takes about 101 seconds
to run in my example.
Example 1 is regular tabulate example with cells stored in a matrix --
this took about 9 seconds, but doesnt require any calculation of means
or what not. Ex 2 is using -logout- to parse the syntax (you could do
this manually too) and took the longest at about 109 seconds. Ex 3
uses -collapse- with preserve/restore and takes about 36 seconds. Ex
4 uses a loop to grab means from summarize for certain values and
takes about 27 seconds.
*********************! Begin Example
//intro stuff//
clear all
timer clear
set rmsg on
*--install packages for the example
cap which logout
if _rc ssc install logout , replace
*--make fake data
sa master.dta, replace emptyok //for later
set obs `=2^25' //run on a big dataset
forval x = 1/10 {
g v`x' = round(runiform()*5)
}
//examples//
**
tabulate v1 v2, summarize(v3) //for ref. takes c.108 Seconds
**
*--ex1. time working with -tab- stored values**
**this doesnt get the values you need..
**but allows us to compare speed of these approaches somewhat
tab v1 v2, matcell(A)
mat list A
preserve
clear
svmat A, names(A)
keep A1
keep in 1/3 //parse
l
restore
*--ex2. parsing the tab, summarize() output**
*logout*
preserve
caplog using mystuff.txt, replace: tabulate v1 v2, summarize(v3) nof nost
logout, use(mystuff.txt) save(mytable) clear dta replace
u mytable.dta, clear
keep v1 v2
keep in 4/6 //parse as needed
restore
*! or just log this and parse it yourself, probably faster to do so
*--ex3. using collapse**
*this might be your best option if you have a lot of datapoints to
calculate/store*!
preserve
collapse (mean) v3 , by(v1 v2)
keep v2 v3
keep in 2/5 //parse
l
restore
*--ex4. using summarize**
forval x = 4(-1)1 {
forval y = 3(-1)1 {
qui sum v3 if v1==`x' & v2 == `y', meanonly
loc val`x' `r(mean)'
preserve
clear
set obs 1
g name = "`x' and `y'"
g v1 = `val`x'' in 1
append using master.dta
sa master.dta, replace //values you need are in this dta file
restore
} //end of y loop
} //end of x loop
*********************! End Example
note: -timer- was reseting after the internal programming of -logout-
was clearing the timer each time, so I just added up across the -rmsg-
timings.
HTH,
Eric
___
Eric A. Booth
Research Scientist
Gibson Consulting Group
[email protected]
On Sun, Aug 18, 2013 at 4:26 PM, László Sándor <[email protected]> wrote:
>
> Thanks again!
>
> I am not sure if those preserve-and-restore the data, but I should check.
>
> I think what will happen is that I log the -tab, sum()-, and somehow
> read in numbers from the log file without opening a new dataset, and
> plot "immediately" with -scatteri-.
>
> Laszlo
>
> On Sun, Aug 18, 2013 at 5:04 PM, Roger B. Newson
> <[email protected]> wrote:
> > One way of doing what you want is probably to use the -xcontract- and
> > -xcollapse- packages, which you can download from SSC. These are extended
> > versions of -collapse- and -contract-, which can save the output datasets
> > (or resultssets) to Stata .dta files on disk, with which the user can do all
> > kinds of plotting and tabulating.
> >
> >
> > Best wishes
> >
> > Roger
> >
> > Roger B Newson BSc MSc DPhil
> > Lecturer in Medical Statistics
> > Respiratory Epidemiology and Public Health Group
> > National Heart and Lung Institute
> > Imperial College London
> > Royal Brompton Campus
> > Room 33, Emmanuel Kaye Building
> > 1B Manresa Road
> > London SW3 6LR
> > UNITED KINGDOM
> > Tel: +44 (0)20 7352 8121 ext 3381
> > Fax: +44 (0)20 7351 8322
> > Email: [email protected]
> > Web page: http://www.imperial.ac.uk/nhli/r.newson/
> > Departmental Web page:
> > http://www1.imperial.ac.uk/medicine/about/divisions/nhli/respiration/popgenetics/reph/
> >
> > Opinions expressed are those of the author, not of the institution.
> >
> > On 18/08/2013 21:49, László Sándor wrote:
> >>
> >> Thanks, Roger.
> >>
> >> I never meant that StataCorp should give away their source. I was only
> >> hoping to squeeze out some more interoperability. And so much of the
> >> rest of the code is in smaller chunks. Not -tabulate-, I see.
> >>
> >> I should have thought of -which-.
> >>
> >> I only wanted to capture some of the results/output without logging
> >> and parsing the log.
> >>
> >> Thanks,
> >>
> >> Laszlo
> >>
> >> On Sun, Aug 18, 2013 at 4:31 PM, Roger B. Newson
> >> <[email protected]> wrote:
> >>>
> >>> I think you'll find that everything really is in the executable
> >>> "/Applications/Stata/StataMP.app/Contents/MacOS/StataMP". This is because
> >>> Stata is not open-source, and was never supposed to be. StataCorp have to
> >>> make a living, and would probably not be able to do so if it was
> >>> open-source
> >>> and users could make generic copies.
> >>>
> >>> A lot of the code for a lot of official Stata is open-source (ie in
> >>> ado-files), but -tabulate- isn't. If you type, in Stata,
> >>>
> >>> which tabulate
> >>>
> >>> then Stata will answer
> >>>
> >>> built-in command: tabulate
> >>>
> >>> meaning that there is no file -tabulate.ado-.
> >>>
> >>> I hope this helps.
> >>>
> >>> Best wishes
> >>>
> >>> Roger
> >>>
> >>> Roger B Newson BSc MSc DPhil
> >>> Lecturer in Medical Statistics
> >>> Respiratory Epidemiology and Public Health Group
> >>> National Heart and Lung Institute
> >>> Imperial College London
> >>> Royal Brompton Campus
> >>> Room 33, Emmanuel Kaye Building
> >>> 1B Manresa Road
> >>> London SW3 6LR
> >>> UNITED KINGDOM
> >>> Tel: +44 (0)20 7352 8121 ext 3381
> >>> Fax: +44 (0)20 7351 8322
> >>> Email: [email protected]
> >>> Web page: http://www.imperial.ac.uk/nhli/r.newson/
> >>> Departmental Web page:
> >>>
> >>> http://www1.imperial.ac.uk/medicine/about/divisions/nhli/respiration/popgenetics/reph/
> >>>
> >>> Opinions expressed are those of the author, not of the institution.
> >>>
> >>>
> >>> On 18/08/2013 21:21, László Sándor wrote:
> >>>>
> >>>>
> >>>> Hi all,
> >>>>
> >>>> I am trying to understand how -tabulate, summarize- works. I
> >>>> understand that much of it is written in C code, but I would still
> >>>> expect to find some black boxes of files that do the magic. I think I
> >>>> checked all folders, incl. hidden folders within /Applications/Stata
> >>>> on my mac, and even checked the package contents of
> >>>> /Applications/Stata/StataMP. I found no trace of -tabulate-, or any
> >>>> other plugin/DLL whatsoever. Is everything wrapped into the Unix
> >>>> executable "/Applications/Stata/StataMP.app/Contents/MacOS/StataMP"?
> >>>> Really?
> >>>>
> >>>> As I only need the results of -tab, sum()-, I hope to see some code
> >>>> calling -_tab.ado- or some other code to display the results. Is
> >>>> everything in the compiled binary instead?
> >>>>
> >>>> Well, something must add up those 33.9 MBs…
> >>>>
> >>>> Thanks for any thoughts,
> >>>>
> >>>> Laszlo
> >>>>
> >>>> *
> >>>> * For searches and help try:
> >>>> * http://www.stata.com/help.cgi?search
> >>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
> >>>> * http://www.ats.ucla.edu/stat/stata/
> >>>>
> >>> *
> >>> * For searches and help try:
> >>> * http://www.stata.com/help.cgi?search
> >>> * http://www.stata.com/support/faqs/resources/statalist-faq/
> >>> * http://www.ats.ucla.edu/stat/stata/
> >>
> >>
> >> *
> >> * For searches and help try:
> >> * http://www.stata.com/help.cgi?search
> >> * http://www.stata.com/support/faqs/resources/statalist-faq/
> >> * http://www.ats.ucla.edu/stat/stata/
> >>
> > *
> > * For searches and help try:
> > * http://www.stata.com/help.cgi?search
> > * http://www.stata.com/support/faqs/resources/statalist-faq/
> > * http://www.ats.ucla.edu/stat/stata/
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/