Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: where is StataCorp C code located? all in a single executable as compiled binary?
From
Phil Clayton <[email protected]>
To
[email protected]
Subject
Re: st: where is StataCorp C code located? all in a single executable as compiled binary?
Date
Mon, 19 Aug 2013 10:06:50 +1000
If you can avoid the -preserve- and -restore- you save loads of time (at least on my modest system...)
*--ex5. using summarize and postfile**
tempname post
tempfile postfile
postfile `post' v1 v2 mean sd n using "`postfile'"
forval x = 4(-1)1 {
forval y = 3(-1)1 {
display "v1=`x', v2=`y'"
qui sum v3 if v1==`x' & v2 == `y'
post `post' (`x') (`y') (`r(mean)') (`r(sd)') (`r(N)')
} //end of y loop
} //end of x loop
postclose `post'
use "`postfile'", clear
On 19/08/2013, at 8:31 AM, Eric A. Booth <[email protected]> wrote:
> <>
> Hi Laszlo: I agree that it would be nice if -tabulate,summarize()-
> stored values but it doesnt. There are several options available to
> store those values and then use them elsewhere. The issues seem to be
> (1) ease of parsing the values into a format that you can use for
> other analyses and (2) (and more important for you) the speed with
> which you can calculate, store, parse, and then use those values.
>
> Some alternatives to collapse include logging the -tabulate,
> summarize()- output and then parsing it, using -collapse- to get your
> values, or using the compiled -summarize- command to obtain the
> values of interest and store them for use elsewhere. I'm sure there
> are other options, but below is a comparison of these methods against
> the speed of the desired -tabulate, summarize()- solution on a
> large-ish fake dataset.
>
> This is not a clean comparison and the values I store for later use
> are not exactly the same in every example, but it gives you an idea of
> the speed differences of the steps that might be involved for each
> approach (that is, preserving the data, summarizing or collapsing or
> XX, storing and parsing the output, and restoring the data). The
> upshot is that, for this example on my computer, it seems that running
> -summarize- in a loop to grab the values you want and store them in a
> dataset was the quickest non-tab, summarize()- option I tried (example
> 4 below), but this would be slower on a lot of data points. Plus,
> both Examples 3 & 4 below are both faster than running -tabulate,
> summarize()-.
>
> Using -tabulate, summarize()- to get values takes about 101 seconds
> to run in my example.
> Example 1 is regular tabulate example with cells stored in a matrix --
> this took about 9 seconds, but doesnt require any calculation of means
> or what not. Ex 2 is using -logout- to parse the syntax (you could do
> this manually too) and took the longest at about 109 seconds. Ex 3
> uses -collapse- with preserve/restore and takes about 36 seconds. Ex
> 4 uses a loop to grab means from summarize for certain values and
> takes about 27 seconds.
>
> *********************! Begin Example
> //intro stuff//
> clear all
> timer clear
> set rmsg on
> *--install packages for the example
> cap which logout
> if _rc ssc install logout , replace
> *--make fake data
> sa master.dta, replace emptyok //for later
> set obs `=2^25' //run on a big dataset
> forval x = 1/10 {
> g v`x' = round(runiform()*5)
> }
>
>
> //examples//
> **
> tabulate v1 v2, summarize(v3) //for ref. takes c.108 Seconds
> **
>
> *--ex1. time working with -tab- stored values**
> **this doesnt get the values you need..
> **but allows us to compare speed of these approaches somewhat
> tab v1 v2, matcell(A)
> mat list A
> preserve
> clear
> svmat A, names(A)
> keep A1
> keep in 1/3 //parse
> l
> restore
>
>
> *--ex2. parsing the tab, summarize() output**
> *logout*
> preserve
> caplog using mystuff.txt, replace: tabulate v1 v2, summarize(v3) nof nost
> logout, use(mystuff.txt) save(mytable) clear dta replace
> u mytable.dta, clear
> keep v1 v2
> keep in 4/6 //parse as needed
> restore
> *! or just log this and parse it yourself, probably faster to do so
>
>
>
> *--ex3. using collapse**
> *this might be your best option if you have a lot of datapoints to
> calculate/store*!
> preserve
> collapse (mean) v3 , by(v1 v2)
> keep v2 v3
> keep in 2/5 //parse
> l
> restore
>
>
> *--ex4. using summarize**
> forval x = 4(-1)1 {
> forval y = 3(-1)1 {
> qui sum v3 if v1==`x' & v2 == `y', meanonly
> loc val`x' `r(mean)'
> preserve
> clear
> set obs 1
> g name = "`x' and `y'"
> g v1 = `val`x'' in 1
> append using master.dta
> sa master.dta, replace //values you need are in this dta file
> restore
> } //end of y loop
> } //end of x loop
> *********************! End Example
> note: -timer- was reseting after the internal programming of -logout-
> was clearing the timer each time, so I just added up across the -rmsg-
> timings.
>
>
>
> HTH,
>
> Eric
> ___
> Eric A. Booth
> Research Scientist
> Gibson Consulting Group
> [email protected]
>
>
>
>
> On Sun, Aug 18, 2013 at 4:26 PM, László Sándor <[email protected]> wrote:
>>
>> Thanks again!
>>
>> I am not sure if those preserve-and-restore the data, but I should check.
>>
>> I think what will happen is that I log the -tab, sum()-, and somehow
>> read in numbers from the log file without opening a new dataset, and
>> plot "immediately" with -scatteri-.
>>
>> Laszlo
>>
>> On Sun, Aug 18, 2013 at 5:04 PM, Roger B. Newson
>> <[email protected]> wrote:
>>> One way of doing what you want is probably to use the -xcontract- and
>>> -xcollapse- packages, which you can download from SSC. These are extended
>>> versions of -collapse- and -contract-, which can save the output datasets
>>> (or resultssets) to Stata .dta files on disk, with which the user can do all
>>> kinds of plotting and tabulating.
>>>
>>>
>>> Best wishes
>>>
>>> Roger
>>>
>>> Roger B Newson BSc MSc DPhil
>>> Lecturer in Medical Statistics
>>> Respiratory Epidemiology and Public Health Group
>>> National Heart and Lung Institute
>>> Imperial College London
>>> Royal Brompton Campus
>>> Room 33, Emmanuel Kaye Building
>>> 1B Manresa Road
>>> London SW3 6LR
>>> UNITED KINGDOM
>>> Tel: +44 (0)20 7352 8121 ext 3381
>>> Fax: +44 (0)20 7351 8322
>>> Email: [email protected]
>>> Web page: http://www.imperial.ac.uk/nhli/r.newson/
>>> Departmental Web page:
>>> http://www1.imperial.ac.uk/medicine/about/divisions/nhli/respiration/popgenetics/reph/
>>>
>>> Opinions expressed are those of the author, not of the institution.
>>>
>>> On 18/08/2013 21:49, László Sándor wrote:
>>>>
>>>> Thanks, Roger.
>>>>
>>>> I never meant that StataCorp should give away their source. I was only
>>>> hoping to squeeze out some more interoperability. And so much of the
>>>> rest of the code is in smaller chunks. Not -tabulate-, I see.
>>>>
>>>> I should have thought of -which-.
>>>>
>>>> I only wanted to capture some of the results/output without logging
>>>> and parsing the log.
>>>>
>>>> Thanks,
>>>>
>>>> Laszlo
>>>>
>>>> On Sun, Aug 18, 2013 at 4:31 PM, Roger B. Newson
>>>> <[email protected]> wrote:
>>>>>
>>>>> I think you'll find that everything really is in the executable
>>>>> "/Applications/Stata/StataMP.app/Contents/MacOS/StataMP". This is because
>>>>> Stata is not open-source, and was never supposed to be. StataCorp have to
>>>>> make a living, and would probably not be able to do so if it was
>>>>> open-source
>>>>> and users could make generic copies.
>>>>>
>>>>> A lot of the code for a lot of official Stata is open-source (ie in
>>>>> ado-files), but -tabulate- isn't. If you type, in Stata,
>>>>>
>>>>> which tabulate
>>>>>
>>>>> then Stata will answer
>>>>>
>>>>> built-in command: tabulate
>>>>>
>>>>> meaning that there is no file -tabulate.ado-.
>>>>>
>>>>> I hope this helps.
>>>>>
>>>>> Best wishes
>>>>>
>>>>> Roger
>>>>>
>>>>> Roger B Newson BSc MSc DPhil
>>>>> Lecturer in Medical Statistics
>>>>> Respiratory Epidemiology and Public Health Group
>>>>> National Heart and Lung Institute
>>>>> Imperial College London
>>>>> Royal Brompton Campus
>>>>> Room 33, Emmanuel Kaye Building
>>>>> 1B Manresa Road
>>>>> London SW3 6LR
>>>>> UNITED KINGDOM
>>>>> Tel: +44 (0)20 7352 8121 ext 3381
>>>>> Fax: +44 (0)20 7351 8322
>>>>> Email: [email protected]
>>>>> Web page: http://www.imperial.ac.uk/nhli/r.newson/
>>>>> Departmental Web page:
>>>>>
>>>>> http://www1.imperial.ac.uk/medicine/about/divisions/nhli/respiration/popgenetics/reph/
>>>>>
>>>>> Opinions expressed are those of the author, not of the institution.
>>>>>
>>>>>
>>>>> On 18/08/2013 21:21, László Sándor wrote:
>>>>>>
>>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I am trying to understand how -tabulate, summarize- works. I
>>>>>> understand that much of it is written in C code, but I would still
>>>>>> expect to find some black boxes of files that do the magic. I think I
>>>>>> checked all folders, incl. hidden folders within /Applications/Stata
>>>>>> on my mac, and even checked the package contents of
>>>>>> /Applications/Stata/StataMP. I found no trace of -tabulate-, or any
>>>>>> other plugin/DLL whatsoever. Is everything wrapped into the Unix
>>>>>> executable "/Applications/Stata/StataMP.app/Contents/MacOS/StataMP"?
>>>>>> Really?
>>>>>>
>>>>>> As I only need the results of -tab, sum()-, I hope to see some code
>>>>>> calling -_tab.ado- or some other code to display the results. Is
>>>>>> everything in the compiled binary instead?
>>>>>>
>>>>>> Well, something must add up those 33.9 MBs…
>>>>>>
>>>>>> Thanks for any thoughts,
>>>>>>
>>>>>> Laszlo
>>>>>>
>>>>>> *
>>>>>> * For searches and help try:
>>>>>> * http://www.stata.com/help.cgi?search
>>>>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>>> * http://www.ats.ucla.edu/stat/stata/
>>>>>>
>>>>> *
>>>>> * For searches and help try:
>>>>> * http://www.stata.com/help.cgi?search
>>>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>> * http://www.ats.ucla.edu/stat/stata/
>>>>
>>>>
>>>> *
>>>> * For searches and help try:
>>>> * http://www.stata.com/help.cgi?search
>>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>>> * http://www.ats.ucla.edu/stat/stata/
>>>>
>>> *
>>> * For searches and help try:
>>> * http://www.stata.com/help.cgi?search
>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>> * http://www.ats.ucla.edu/stat/stata/
>>
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>> * http://www.ats.ucla.edu/stat/stata/
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/