News and Announcements

Language and programming

This page contains only historical information and is not about the current release of Stata. Please see our features page for information on the current version of Stata.

Long variable names and new wildcards

Stata now allows names to be up to 32 characters long. That includes variable names, label names, macro names, and any other name you can think of. This includes program names, and we have renamed a few existing Stata programs:

Prior name New name

llogist llogistic

xthaus xthausman

spikeplt spikeplot

stcurv stcurve

svyintrg svyintreg

svyprobt svyprobit

svymlog svymlogit

svyolog svyologit

svyoprob svyoprobit

You will not find the old names documented, but they continue to work.
In any case, now you do not have to name your variable f_inc1999, you can name it farm_inc_1999 or farm_income_1999 or even farm_income_in_fiscal_year_1999. Where possible, we have adjusted Stata output to allow 12 spaces for displaying names. When names are longer than that, you will discover that Stata abbreviates and shows, for instance, farm_in~1999. ~ is the new Stata abbreviation character, which Stata not only uses in output but which you can use in input (which is to say, in varlists). If you type farm_in~1999, f~1999, or f~in~1999, Stata will understand that you mean farm_income_in_fiscal_year_1999. Thus, if in output Stata presents dose~d1~42, that name is unique and you can type it and Stata will understand it.
describe now has two new options, fullname and numbers. fullname shows the full, 32-character names, instead of shorter ~-abbreviations, and numbers shows the variable number.
In conjunction with the longer variable names, there are new varlist abbreviation rules. Varlists now understand * when used as other than a suffix. You can still type pop*, but you can also type pop*99 or pop*30_40*1999 or even *1999. * means "zero or more characters go here". Also understood is the new ~ abbreviation character mentioned above. * and ~ really mean the same thing and work the same way, except ~ adds the claim "and only one variable matches this pattern", whereas * means "give me all the variables that match this pattern".
The other new abbreviation character is ?, which means "one character goes here", so result?10 might match resultb10 and resultc10, but would not match resultb110.
Programmers will want to read the discussion of long names from the What's new section of the new programmers manual.

Enhancements to by:

by varlist: now has a sort option. You can type, for instance, `by foreign, sort: summarize mpg' or, equivalently, `bysort foreign: summarize mpg', rather than first sorting the data and then typing the by command.
by has a new parenthesis notation: `by id (time): ...' means to perform ... by id, but first verify that the data are sorted by id and time. `by id (time), sort: ...' says to sort the data by id and time and then perform ... by id.
There is also a new rc0 option, which says to keep on going even if one of the by-groups results in an error.
More importantly, by varlist: is now allowed with virtually every Stata command, including commands implemented as ado-files, including egen. We have been claiming for some time that whether a command is built-in or implemented as an ado-file is irrelevant, it has the same features. Now the claim is true.
User's ado-files (programs) can also be made to work with by varlist:. The jargon for this is "byable"; a program is byable if it works with the by prefix. We have modified all our ado-files to be byable, and you can modify your programs, too. In most cases, it's as easy as adding byable(recall) to the program define line.
The commands generate, replace, drop, keep, and assert no longer present the detailed, group-by-group report when prefixed with by, meaning you no longer need to prefix them with quietly:
Programmers may be interested in what is said about by: in the What's new section of the new programmers manual.

Sort stability

Commands that report results of calculations (commands not intended to change the data) no longer change the sort order of the data. If you type `sort id time', you can be assured that your dataset will stay sorted by id and time. This is true even if the command is implemented as an ado-file.
Again, programmers will want to refer to the What's new section of the new programmers manual.

Loop controls for iterating over lists and number sets

foreach is a new programming command for processing items in a list. it can be used directly and is a useful alternative to for and while. With foreach, you can type things such as . foreach file in this.dta that.dta theother.dta { 2. use `file', clear 3. replace bp=. if bp==999 4. save `file', replace 5. }

forvalues will perform the same looping operation for lists of numbers -- or numlists as they are called in Stata.
Programmers will appreciate further details from the extract of the What's new section of the new programmers manual.

European decimal format

Stata now understands output formats such as %9,2f as well as %9.2f. In %9,2f, the number 500.5 is displayed as 500,50. In %9,2fc format, the number 1,000.5 is displayed as 1.000,50.
Even better, you can now set dp comma to modify all of Stata's output to use the European format, including all statistical output.

New egen functions

The following new egen functions have been added: any(), concat(), cut(), eqany(), ends(), kurt(), mad(), mdev(), mode(), neqany(), pc(), seq(), skew(), and tag(). In addition, group() and rank() have new options.

Programmability

We have made lots of other improvements on this score: 5.5 pages worth measured by the What's New description in the new Stata Programming Manual. See the excerpts for a list of Stata's new programming features.

New string functions

There are four new string functions: match(), subinstr(), subinword(), and reverse().
match(s₁,s₂) returns 1 if string s₁ "matches" s₂. In the match, * in s₂ is understood to mean zero or more characters go here, and ? is understood to mean one character goes here. match("this","*hi*") is true. In s₂, \\, \?, and \* can be used if you really want a \, ?, or * character.
subinstr(s₁,s₂,s₃,n) and subinword(s₁,s₂,s₃,n) substitute the first n occurrences of s₂ in s₁ with s₃. subinword() restricts "occurrences" to be occurrences of words. In either, n may be coded as missing value, meaning to substitute all occurrences. For instance, subinword("measure me","me","you",.) returns "measure you", and subinstr("measure me","me","you",.) returns "youasure you".
reverse(s) returns s turned around. reverse("string") returns "gnirts".
A fifth new string function is really intended for programmers: abbrev(s,n) returns the n-character ~-abbreviation of the variable name s. abbrev(s,12) is the function used throughout Stata to make 32-character names fit into 12 spaces.
The new functions inrange() and inlist() make choosing the right observations easier.
inrange() handles missing values elegantly when selecting subsamples such as a <= x <= b. inrange(x,a,b) answers the question, "Is x known to be in the range a to b?" Obviously, inrange(.,1000,2000) is false. a or b may be missing. inrange(x,a,.) answers whether it is known that x >= a, and inrange(x,.,b) answers whether it is known that x <= b. inrange(.,.,.) returns 0 which, if you think about it, is inconsistent but is probably what you want.
inlist(x,a,b,...) selects observations if x=a or x=b or ....
Other functions have been added. _by(), _bylastcall(), and _byindex() deal with making programs and ado-files allow the by varlist:prefix.

Other new commands and features

destring makes converting variables from string to numeric easier.
xi has been modified to exploit Stata's longer variable names to create more readable names for the interaction terms.
hexdump will give you a hexadecimal dump of a file. Even more useful is its analyze option, which will analyze the dump for you and report just the summary. This can be useful for diagnosing problems with raw datasets.
format now allows you to type the %fmt first or last, so you can equally well type `format mpg weight %9.2f' or `format %9.2f mpg weight'.
type has a new asis option. The default behavior of type has been changed when the filename ends in .smcl to interpret the SMCL codes. This way, if you previously created a session log by typing `log using mylog', you can type `type mylog.smcl' to display it as you probably want to see it. If you wanted to see the raw SMCL codes, you would type `type mylog.smcl, asis'.
net stata.toc and *.pkg files now allow the v directive. You are supposed to code `v 2' at the top of the files and, if you do that, you may use SMCL directives in the files.
version may now be used as a prefix command; you can type `version 6: ...' to mean that ... is to be run under version 6.
Numlists may now be specified as a[b]c as well as a(b)c.
list now has a doublespace option.

Faster

Stata 7 has more features, but continuing our long tradition, it is also faster; ado-files execute between 8.8 and 11.8 percent faster. Some programs, we have observed, execute 13 percent faster.

Order Stata 7 Upgrade to Stata 7 Call to order or upgrade

Contact StataCorp

Contact [email protected] with comments or suggestions about this website.