What’s new in Stata programming
- The big news in programming concerns parsing varlists containing factor variables,
dealing with factor variables, and processing matrices whose row or
column names contain factor variables.
- syntax will allow varlists to contain factor variables
if new specifier fv is among the specifiers in the description
of the varlist, for instance,
syntax varlist(fv) [if] [in] [, Detail]
Similarly, syntax will allow a varlist option to
include factor variables if fv is included among its specifiers:
syntax varlist(fv) [if] [in] [, Detail] EQ(varlist fv)
- You can use resulting macro ‘varlist’ as the varlist
for any Stata command that allows factor varlists.
- Factor varlists come in two flavors, general and specific.
An example of a general factor varlist is mpg i.foreign.
The corresponding specific factor varlist might be
mpg i(0 1)b0.foreign
A specific factor varlist is specific with respect to a
given problem, which is to say, a given dataset and subsample. The
specific varlist identifies the values taken on by factor
variables and the base.
Users usually specify general factor varlists, although they
can specify specific ones. In the process of your program, a
factor varlist, if it is general, will become specific. This is
usually automatic.
Existing commands _rmcoll and _rmdcoll now accept a
general or specific factor varlist and return a specific varlist
in r(varlist).
Existing command ml accepts a general or specific factor
varlist and returns a specific varlist, in this case in the row and
column names of the vectors and matrices it produces.
The same applies to Mata’s new moptimize() function,
which is equivalent to ml.
Similarly, all Stata estimation commands that allow factor
varlists return the specific varlist in the row and column names of
e(b) and e(V).
Factor varlist mpg i(0 1)b0.foreign is specific. The same
varlist could be written mpg i0b.foreign i1.foreign,
so that is specific, too. The first is specific and unexpanded. The
second is specific and expanded. New command fvexpand takes a
general or specific (expanded or unexpanded) factor varlist, if
or in, and returns a fully expanded, specific varlist.
New command fvunab takes a general or specific factor
varlist and returns it in the same form, but with variable
names unabbreviated.
- Matrix row and column names are now generalized to include factor
variables. The row or column names contain the elements from a
fully expanded, specific factor varlist. Because a fully expanded,
specific factor varlist is a factor varlist, the contents of the
row or column names can be used with other Stata commands as a
varlist. Unrelatedly, the equation portion of the row or column
name now has a maximum length of 127 rather than the previous 32.
- The treatment of variables that are omitted because of
collinearity has changed. Previously, such variables were dropped
from e(b) and e(V) except by regress, which
included the variables but set the corresponding element of
e(b) to zero and similarly set the corresponding row and column of
e(V) to zero. Now all Stata estimators that allow factor
variables work like regress.
Also, if you want to know why the variable was dropped, you can
look at the corresponding element of the row or column name. The
syntax of an expanded, specific varlist allows operators o
and b. Operator o indicates omitted either because the
user specified omitted or because of collinearity; b
indicates omitted due to being a base category. For instance,
o.mpg would indicate that mpg was omitted, whereas
i0b.foreign would indicate that foreign==0 was
omitted because it was the base category. Either way, the
corresponding element of e(b) will be zero, as will the
corresponding rows and columns of e(V).
This new treatment of omitted variables—previously called
dropped variables—can cause old user-written programs to
break. This is especially true of old postestimation commands not
designed to work with regress. If you set version to 10 or
earlier before estimation, however, then estimation results will be
stored in the old way and the old postestimation commands will
work. The solution is
. version 10
. estimation_command ...
. old_postestimation_command ...
. version 11
When running under version 10 or earlier, you may not use
factor variables with the estimation command.
- Because omitted variables are now part of estimation results,
constraints play a larger role in the implementation of estimators.
Omitted variables have coefficients constrained to be zero.
ml now handles such constraints automatically and posts in
e(k\_autoCns) the number of such constraints, which can be
due to the variable being used as the base, being empty, or being
omitted. makecns similarly saves in r(k_autoCns) the
number of such constraints, and in r(clist), the constraints
used. The matrix of constraints is now posted with ereturn
post and saved, as usual, in e(Cns). ereturn
matrix no longer posts constraints. Old behavior is
preserved under version control.
- There are additional commands to assist in using and
manipulating factor varlists that are documented only online;
type help undocumented in Stata.
- Factor variables also allow interactions. Up to eight-way
interactions are allowed.
- Consider the interaction a#b. If each took on two levels,
the unexpanded, specific varlist would be i(1 2)b1.a#i(1 2)b1.b.
The expanded, specific varlist would be 1b.a#1b.b 1b.a#2.b 2.a#1b.b 2.a#2.b.
- Consider the interaction c.x#c.x, where x
is continuous. The unexpanded and expanded, specific varlists are
the same as the general varlist: c.x#c.x.
- Consider the interaction a#c.x. The unexpanded, specific
varlist is i(1 2).a#c.x, and the expanded, specific varlist is
1.a#c.x 2.a#c.x.
- All these varlists are handled in the same way that factor variables
are handled, as outlined in item 1) above.
- New command fvrevar creates equivalent, temporary variables for
any factor variables, interactions, or time-series–operated variables
so that older commands can be easily converted to working with
factor variables. We hasten to add that, in general, Stata does not
follow the fvrevar approach. Think of this fvrevar as a
generalization of tsrevar.
- Factor variables lead to a number of additions to what is saved in
e() and sometimes r():
- Estimation commands that post e(V) now post the corresponding
rank of the matrix in scalar e(rank).
- Estimation commands that allow constraints now post the constraints
matrix in matrix e(Cns).
- In many estimation commands allowing constraints, and in the
programming command makecns,
scalar e(k_autoCns) is
now posted containing the sum of the the number of base, empty, and
omitted constraints.
- Programming command makecns now save the constraints used
in macro r(rclist).
- Estimation commands that allow factor variables now post in macro
e(asbalanced) the name of each factor variable participating in
e(b) that was fvset design asbalanced and post in
macro e(asobserved) the name of each factor variable
participating in e(b) that was fvset design asobserved.
- Estimation commands now post in macros how new command
margins
is to treat their prediction statistics when the statistics require
special treatment. These macros are e(marginsok),
e(marginsnotok), and e(marginsprop).
e(marginsok) specifies the name of predictors that are to be
allowed and that appear to violate margins’ usual rules, such
as dependent variables being involved in the calculation.
e(marginsnotok) are statistics that margins fails to
identify as violating assumptions but that do and should not be allowed.
e(emarginsprop) provides special signals as to how statistics
for the estimator must be handled. Currently allowed are
combinations of addcons, noeb, and nochainrule.
addcons means that the estimated equations have no constant even if the
user did not specify noconstant at estimation time.
noeb means that the estimator does not store the covariate names on the
name stripe of e(b). nochainrule means that the chain rule
may not be used to calculate derivatives.
- Matrix e(V_modelbased), the model-based VCE, is now
posted by most estimation commands that allow robust variance
estimation by
bootstrap and
jackknife.
- Existing command sktest
now returns in matrix r(N) the matrix of observation counts
and in matrix r(Utest) the matrix of test results.
- Existing command estimates describe using now saves in
scalar r(nestresults) the number of sets of estimation results
saved in the .ster file.
- Existing command correlate
saves in matrix r(C) the correlation or covariance matrix.
- Existing command ml has been rewritten. It is now implemented
in terms of new Mata function and optimization engine moptimize().
The new ml handles automatic or implied constraints, posts some
additional information to e(), and allows evaluators written in
Mata as well as ado.
- Existing command estimates save now has option
append, which allows storing more than one set of estimation
results in the same file.
- Existing commands ereturn post and ereturn repost
now work with more commands, including
logit, mlogit, ologit, oprobit, probit,
qreg, _qreg, regress, stcox, and tobit.
Also, ereturn post and ereturn repost
now allow weights to be specified and save them in e(wtype) and
e(wexp).
- Existing command markout has new option sysmissok,
which excludes observations with variables equal to system missing
(.) but not to extended missing (.a, .b, ..., .z).
This has to do with new emphasis on imputation of missing values.
- New commands varabbrev and unabbrev make it easy
to temporarily reset whether Stata allows variable-name abbreviations.
- New programming function smallestdouble() returns the
smallest double-precision number greater than zero.
- creturn has new returned values:
- c(noisily) returns 0 when output is being suppressed and
1 otherwise. Thus programmers can avoid executing code whose only
purpose is to display output.
- c(smallestdouble) returns the smallest double-precision
value that is greater than 0.
- c(tmpdir) returns the temporary directory being used by Stata.
- c(eqlen) returns the maximum length that Stata allows for
equation names.
- Existing extended macro function :dir has new option
respectcase, which causes :dir to respect uppercase and
lowercase when performing filename matches. This option is relevant only
for Windows.
- Stata has new string functions strtoname(), soundex(),
and soundex_nara().
- Stata has 17 new numerical functions:
sinh(), cosh(), asinh(), and acosh();
hypergeometric() and hypergeometricp();
nbinomial(), nbinomialp(), and nbinomialtail();
invnbinomial() and invnbinomialtail();
poisson(), poissonp(), and poissontail();
invpoisson() and invpoissontail(); and
binomialp().
- Stata has nine new random-variate functions for
beta, binomial, chi-squared, gamma, hypergeometric, negative binomial,
normal, Poisson, and Student’s t: rbeta(), rbinomial(),
rchi2(), rgamma(), rhypergeometric(),
rnbinomial(), rnormal(), rpoisson(), and rt(), respectively. Also,
old function uniform() is renamed runiform(). All
random-variate functions start with r.
- Existing function clear has new syntax
clear matrix, which clears (drops) all Stata matrices, as
distinguished from clear mata, which drops all Mata matrices
and functions.
- These days, commands intended for use by end-users are often being used
as subroutines by other end-user commands. Some of these commands
preserve the data simply so that, should something go wrong or the user
press Break, the original data can be restored. Sometimes, when
such commands are used as subroutines, the caller has already preserved
the data. Therefore, all programmers are requested to include option
nopreserve on commands that preserve the data for no other reason
than error recovery, and thus speed execution when commands are used as
subroutines.
Back to highlights
|
|