Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Mata version control


From   [email protected] (William Gould, StataCorp LP)
From   Partha Deb <[email protected]> wrote
To   [email protected]
Subject   Re: st: Mata version control
Date   Mon, 16 Feb 2009 10:19:47 -0600

> I am trying to compile and save a Mata function as a .mo file.  I am 
> using Stata 10.1 but would like to compile and save the .mo file as a 
> Stata 9.2 file.  My understanding is that this should be possible under 
> version control but I'm not succeeding.  

As has already been noted on the list, setting version to 9.2, compiling 
a Mata function, and saving it, will not produce an executable version that 
will run under Stata 9.2.

Partha later asked, "Dare I call this a bug?", to which I answer, "No, it 
is a feature."  I will explain.

Version control is more complicated in a compiled language such as Mata than
in an interpreted language such as Stata's ado.  In Stata's ado langauge, one
number (the version) is sufficient to encode all that needs to be known.
In a compiled language such as Mata, than one number turns into 3, or even 
4:


    1.  Version of the Stata.  This is the version we are all familiar with.
        This number specfies how Stata works.  In Mata, this number 
        specifies how Stata will work when Mata uses Stata, just as it 
        does when you use Stata directly.

    2.  Version of Mata's source language.  This number records the 
        version of Mata's source language that the compiler understands
        and, correspondingly, the version of the pre-compiled Mata libraries
        that the compiler assumes will be used when the compiled code is run.
        (2) is logically independent of (1), but because of the way we do
        things at StataCorp, you can think of (2) as being equal to (1), 
        although that is an over simplification.  What is litterally true is
        that (2) <= (1).  The way we work at StataCorp, if we need a new (2),
        we increment (1) and set (2) = (1).  On the other hand, if a change 
        affects Stata but not Mata, we increment (1) but leave (2) unchanged.

    3.  Version of the object-code language.
        This number identifies how the Mata's object language works. 
        Mata's compiler takes what you code -- called source code -- and
        compiles it into object code.  Sometimes, as new features are added,
        or as Mata is made more efficient, new opcodes are added to the object
        language, and the compiler uses those new opcodes.

        This number is not settable and is not a number you ever see.
        It is not settable, not even at StataCorp.  This number simply
        identifies how the compiler works.

    4.  Inside files such as .mo files, there is yet another version 
        number that reflects how the objects files are written.  Over 
        time, new concepts might be added.  The file format is made 
        richer (more complicated) in order to record these new 
        objects, and older Stata's wouldn't know what to make of them. 

        This number, like (3), is also not settable.  (4) reflects how the
        file was written.  Newer Statas know how to read older file formats,
        but older Statas know nothing about new formats.

(4) is an issue we with which we are all familiar.  Many of us have run into
instances of (4) in the generic sense, for example when attempting to read a
Stata 10 dataset using Stata 4.  Stata 10 can read Stata 4 datasets, but not
the other way around.  The same issues apply to .mo files.

(4), by itself, is reason enough why a modern Stata cannot be used to compile
object code for an old Stata.  That is, it is reason enough assuming the 
file format has changed.  At StataCorp, however, we do not change the 
file format willy nilly.  Nonetheless, (4) does sometimes change.

What I want to empahsize, however, is (3).  (3) is unique to compiled
languges.  When you think about Stata's ordinary version number, it should not
surprise you that -some string- might mean one thing to Stata 4 and another to
Stata 10.  That is, after all, the problem Stata's version number is designed
to solve.  In Mata, that issue is handled by (2).

In Mata, however, there is another issue.  Mata is a compiler.  You code
-some string- and Mata turns that into "5a0000111a2e701".  Just as (2)
handles the interpretation of -some string-, (3) handles the interpretation of
"5a0000111a2e701".  "5a0000111a2e701" might mean on thing to an older Stata
and something else to a newer Stata.  More likely, "5a0000111a2e701" means
nothing an an older Stata and would cause the older Mata+Stata to crash if it
were encountered.

Some of you are doubtlessly thinking, "Well, arrange so that the Mata compiler
doesn't translate -some string- to "5a0000111a2e701" when I set the version
number back.  I have two responses:  you wouldn't like it if we could, 
and we can't.  You wouldn't like it because then older code would not 
experience speedup improvements that have been made to Mata unless you 
changed the version number, and then you would have to worry about
source-code incompatibilities.  Right now, all you have to do to get the speed
improvements made to Mata is recompile, and you can be certain that your more
efficient code will work if the old code worked.  Some of you may be thinking
of some ways around that problem, but don't bother, because there's another
brick wall in front of you.

Mata is an optimizing compiler.  With optimization on, it is virtually
impossible to prevent "5a0000111a2e701" from showing up due to some recurssive
substitution rule.  I say virtually because I can think of some ways one could
try, but one could never prove they work.  The only way to know that
"5a0000111a2e701" did not show up would be to turn optimization off, and at
that point, while it is true that Mata runs faster than ado, it does not
run that much faster.

I should add that, even if you were willing to put with all of the above --
which you would not -- maintaining multiple versions of the compiler and
certifying that each works when code is compiled using a modern Mata and
executed using an older Mata would be a monumental task.

It would be such a monumental task that we dco not even attempt it with the
simpler ado language.  From a computer-science point of view, version control
in Stata is about forward compatiblity, not backwrds.  Old code runs
correctly on modern Statas.  The same applies for Mata.  You do not even have
to recompile.  Using a modern Stata to develop code for older Statas, even in
ado, will always be problematic.

-- Bill
[email protected]
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index