Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: where is StataCorp C code located? all in a single executable as compiled binary?

From   Phil Clayton <[email protected]>
To   [email protected]
Subject   Re: st: where is StataCorp C code located? all in a single executable as compiled binary?
Date   Mon, 19 Aug 2013 10:06:50 +1000

If you can avoid the -preserve- and -restore- you save loads of time (at least on my modest system...)

*--ex5.  using summarize and postfile**
tempname post
tempfile postfile
postfile `post' v1 v2 mean sd n using "`postfile'"
forval x = 4(-1)1 {
	forval y = 3(-1)1 {
		display "v1=`x', v2=`y'"
		qui sum v3 if v1==`x' & v2 == `y'
		post `post' (`x') (`y') (`r(mean)') (`r(sd)') (`r(N)')
	} //end of y loop
} //end of x loop
postclose `post'
use "`postfile'", clear

On 19/08/2013, at 8:31 AM, Eric A. Booth <[email protected]> wrote:

> <>
> Hi Laszlo:   I agree that it would be nice if -tabulate,summarize()-
> stored values but it doesnt.  There are several options available to
> store those values and then use them elsewhere.  The issues seem to be
> (1) ease of parsing the values into a format that you can use for
> other analyses and (2) (and more important for you) the speed with
> which you can calculate, store, parse, and then use those values.
> Some alternatives to collapse include logging the -tabulate,
> summarize()- output and then parsing it, using -collapse- to get your
> values,  or using the compiled  -summarize- command to obtain the
> values of interest and store them for use elsewhere.  I'm sure there
> are other options, but below is a comparison of these methods against
> the speed of the desired -tabulate, summarize()- solution on a
> large-ish fake dataset.
> This is not a clean comparison and the values I store for later use
> are not exactly the same in every example, but it gives you an idea of
> the speed differences of the steps that might be involved for each
> approach (that is, preserving the data, summarizing or collapsing or
> XX, storing and parsing the output, and restoring the data).  The
> upshot is that, for this example on my computer, it seems that running
> -summarize- in a loop to grab the values you want and store them in a
> dataset was the quickest non-tab, summarize()- option I tried (example
> 4 below), but this would be slower on a lot of data points.  Plus,
> both Examples 3 & 4 below are both faster than running -tabulate,
> summarize()-.
> Using -tabulate, summarize()-  to get values takes about 101 seconds
> to run in my example.
> Example 1 is regular tabulate example with cells stored in a matrix --
> this took about 9 seconds, but doesnt require any calculation of means
> or what not.  Ex 2 is using -logout- to parse the syntax (you could do
> this manually too) and took the longest at about 109 seconds.  Ex 3
> uses -collapse- with preserve/restore and takes about 36 seconds.  Ex
> 4 uses a loop to grab means from summarize for certain values and
> takes about 27 seconds.
> *********************! Begin Example
> //intro stuff//
> clear all
> timer clear
> set rmsg on
> *--install  packages for the example
> cap which logout
> if _rc ssc install logout , replace
> *--make fake data
> sa master.dta, replace emptyok //for later
> set obs `=2^25' //run on a big dataset
> forval x = 1/10 {
>   g v`x' = round(runiform()*5)
> }
> //examples//
>   **
>   tabulate v1 v2, summarize(v3)  //for ref. takes c.108 Seconds
>   **
> *--ex1. time working with -tab- stored values**
> **this doesnt get the values you need..
> **but allows us to compare speed of these approaches somewhat
> tab v1 v2,  matcell(A)
> mat list A
> preserve
>  clear
> svmat A, names(A)
> keep A1
> keep in 1/3 //parse
> l
> restore
> *--ex2.  parsing the tab, summarize() output**
> *logout*
> preserve
>     caplog using mystuff.txt, replace: tabulate v1 v2, summarize(v3) nof nost
>     logout, use(mystuff.txt) save(mytable) clear dta replace
> u mytable.dta, clear
> keep v1 v2
> keep in 4/6 //parse as needed
> restore
> *! or just log this and parse it yourself, probably faster to do so
> *--ex3. using collapse**
>  *this might be your best option if you have a lot of datapoints to
> calculate/store*!
> preserve
> collapse (mean) v3 , by(v1 v2)
> keep v2 v3
> keep in 2/5 //parse
> l
> restore
> *--ex4.  using summarize**
>  forval x = 4(-1)1 {
>    forval y = 3(-1)1 {
> qui sum v3 if v1==`x' & v2 == `y', meanonly
> loc val`x' `r(mean)'
> preserve
> clear
> set obs 1
> g name = "`x' and `y'"
> g v1 = `val`x'' in 1
> append using master.dta
> sa master.dta, replace  //values you need are in this dta file
> restore
>  } //end of y loop
> } //end of x loop
> *********************! End Example
> note: -timer- was reseting after the internal programming of -logout-
> was clearing the timer each time, so I just added up across the -rmsg-
> timings.
> HTH,
> Eric
> ___
> Eric A. Booth
> Research Scientist
> Gibson Consulting Group
> [email protected]
> On Sun, Aug 18, 2013 at 4:26 PM, László Sándor <[email protected]> wrote:
>> Thanks again!
>> I am not sure if those preserve-and-restore the data, but I should check.
>> I think what will happen is that I log the -tab, sum()-, and somehow
>> read in numbers from the log file without opening a new dataset, and
>> plot "immediately" with -scatteri-.
>> Laszlo
>> On Sun, Aug 18, 2013 at 5:04 PM, Roger B. Newson
>> <[email protected]> wrote:
>>> One way of doing what you want is probably to use the -xcontract- and
>>> -xcollapse- packages, which you can download from SSC. These are extended
>>> versions of -collapse- and -contract-, which can save the output datasets
>>> (or resultssets) to Stata .dta files on disk, with which the user can do all
>>> kinds of plotting and tabulating.
>>> Best wishes
>>> Roger
>>> Roger B Newson BSc MSc DPhil
>>> Lecturer in Medical Statistics
>>> Respiratory Epidemiology and Public Health Group
>>> National Heart and Lung Institute
>>> Imperial College London
>>> Royal Brompton Campus
>>> Room 33, Emmanuel Kaye Building
>>> 1B Manresa Road
>>> London SW3 6LR
>>> Tel: +44 (0)20 7352 8121 ext 3381
>>> Fax: +44 (0)20 7351 8322
>>> Email: [email protected]
>>> Web page:
>>> Departmental Web page:
>>> Opinions expressed are those of the author, not of the institution.
>>> On 18/08/2013 21:49, László Sándor wrote:
>>>> Thanks, Roger.
>>>> I never meant that StataCorp should give away their source. I was only
>>>> hoping to squeeze out some more interoperability. And so much of the
>>>> rest of the code is in smaller chunks. Not -tabulate-, I see.
>>>> I should have thought of -which-.
>>>> I only wanted to capture some of the results/output without logging
>>>> and parsing the log.
>>>> Thanks,
>>>> Laszlo
>>>> On Sun, Aug 18, 2013 at 4:31 PM, Roger B. Newson
>>>> <[email protected]> wrote:
>>>>> I think you'll find that everything really is in the executable
>>>>> "/Applications/Stata/". This is because
>>>>> Stata is not open-source, and was never supposed to be. StataCorp have to
>>>>> make a living, and would probably not be able to do so if it was
>>>>> open-source
>>>>> and users could make generic copies.
>>>>> A lot of the code for a lot of official Stata is open-source (ie in
>>>>> ado-files), but -tabulate- isn't. If you type, in Stata,
>>>>> which tabulate
>>>>> then Stata will answer
>>>>> built-in command:  tabulate
>>>>> meaning that there is no file -tabulate.ado-.
>>>>> I hope this helps.
>>>>> Best wishes
>>>>> Roger
>>>>> Roger B Newson BSc MSc DPhil
>>>>> Lecturer in Medical Statistics
>>>>> Respiratory Epidemiology and Public Health Group
>>>>> National Heart and Lung Institute
>>>>> Imperial College London
>>>>> Royal Brompton Campus
>>>>> Room 33, Emmanuel Kaye Building
>>>>> 1B Manresa Road
>>>>> London SW3 6LR
>>>>> Tel: +44 (0)20 7352 8121 ext 3381
>>>>> Fax: +44 (0)20 7351 8322
>>>>> Email: [email protected]
>>>>> Web page:
>>>>> Departmental Web page:
>>>>> Opinions expressed are those of the author, not of the institution.
>>>>> On 18/08/2013 21:21, László Sándor wrote:
>>>>>> Hi all,
>>>>>> I am trying to understand how -tabulate, summarize- works. I
>>>>>> understand that much of it is written in C code, but I would still
>>>>>> expect to find some black boxes of files that do the magic. I think I
>>>>>> checked all folders, incl. hidden folders within /Applications/Stata
>>>>>> on my mac, and even checked the package contents of
>>>>>> /Applications/Stata/StataMP. I found no trace of -tabulate-, or any
>>>>>> other plugin/DLL whatsoever. Is everything wrapped into the Unix
>>>>>> executable "/Applications/Stata/"?
>>>>>> Really?
>>>>>> As I only need the results of -tab, sum()-, I hope to see some code
>>>>>> calling -_tab.ado- or some other code to display the results. Is
>>>>>> everything in the compiled binary instead?
>>>>>> Well, something must add up those 33.9 MBs…
>>>>>> Thanks for any thoughts,
>>>>>> Laszlo
>>>>>> *
>>>>>> *   For searches and help try:
>>>>>> *
>>>>>> *
>>>>>> *
>>>>> *
>>>>> *   For searches and help try:
>>>>> *
>>>>> *
>>>>> *
>>>> *
>>>> *   For searches and help try:
>>>> *
>>>> *
>>>> *
>>> *
>>> *   For searches and help try:
>>> *
>>> *
>>> *
>> *
>> *   For searches and help try:
>> *
>> *
>> *
> *
> *   For searches and help try:
> *
> *
> *

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index