This page contains only historical information and is not about the current release of Stata. Please see our features page for information on the current version of Stata.

Stata News and Announcements

What's new in Stata 6.0


The following is an excerpt from The Stata User's Guide, a 384-page book, which accompanies the 4-volume Stata Reference Manual set and the Stata Graphics Manual.


New features
Highlights
Statistics
Existing
Functions
Data-management
Programming
Others

Order It!

1.3.1 Highlights of the new release

What is important varies from user to user, but here are a few of the changes we would like to call to your attention:

Stata is web-aware

Stata for Windows 98/95/NT, Stata for Power Macintosh, and Stata for Unix are now web-aware. (Stata for Windows 3.1 and Stata for 680x0 Macintosh are not.) See [U] 32 Using the Internet to keep up to date.

You can use datasets over the web—try typing

    . use http://www.stata.com/manual/oddeven.dta, clear

You can update your Stata over the web. Try typing

    . update

or pull down Help and choose Official Updates.

You can obtain STB or other materials over the web. Try typing

    . net from http:/www.stata.com

or pull down Help and choose STB and User-written Programs.

You can create your own site to deliver additions to Stata—be they help files, ado-files, or data—and the new checksum command will confirm that the materials are delivered uncorrupted; see [R] net and [R] checksum.

For the latest information on what's available from us, type

    . news

or pull down Help and choose News.

Scrolling Results window

Stata for Windows now has a scrolling results window, so you can look back at previous output, copy and paste output to other applications, etc.

New Do-file Editor

Stata for Windows 98/95/NT and Stata for Macintosh (both Power and 680x0) have a new do-file editor. Click on the Do-file Editor button or type doedit (one word) in the command window.

Long value labels

Stata's value labels can now be up to 80 characters long (as opposed to the previous maximum of 8), value labels may contain 65,536 mappings, and you may now label negative values. See [U] 1.3.5 New data-management features.

Time-series features

Stata has added time-series analysis, estimation, and data management facilities. Time-series operators for differencing and lagging can now be used in expressions and variable lists for many commands. New time-series date formats are provided. New commands estimate ARIMA models and ARCH family models (ARCH, GARCH, EGARCH, ARCH-in-mean, ...). Other new commands graph and tabulate autocorrelations, partial autocorrelations, and cross-correlations. Commands for periodograms, unit-root tests, and white-noise tests have also been added. See [U] 1.3.2.4 New time-series features for more information about these features and watch the Stata Technical Bulletin for additional features.

ANOVA: Repeated-measures, nested, and mixed designs

Up to four repeated-measure variables may now be specified with anova along with other categorical variables (providing repeated-measures ANOVA) and continuous variables (providing repeated-measures ANCOVA). Nested and mixed ANOVA and ANCOVA models are also now fully supported with an easier-to-use syntax. See [U] 1.3.2.1 New ANOVA features.

New st survival analysis additions

There are four new parametric survival estimators that estimate lognormal, log-logistic, Gompertz, and generalized log-gamma models. Graphical and statistical tests of the proportional-hazards assumption can now be computed after Cox regression (stcox or cox). There are many more residuals available after Cox regression, and all these residuals are available after parametric survival estimators as well. The st system (stset, etc.) has been completely rewritten, and it allows for much more flexibility in the way that your data were collected and recorded, including allowing for multiple failure events in the same dataset. See [U] 1.3.2.2 New st survival analysis features for a complete list of all the new st commands and features.

New xt panel estimators

There are 12 new xt estimators for use with panel data. For example, xtprobit now estimates random-effects probit models using Gauss–Hermite quadrature, in addition to estimating population-averaged models using GEE. In addition to random-effects probit, there is random-effects logit, tobit, interval regression, Poisson, negative binomial regression, and complementary log-log regression. There are also fixed-effects (conditional) Poisson and negative binomial estimators. See [U] 1.3.2.5 New xt panel estimators for a complete list of the new panel estimators.

New svy survey data additions

There is a new svytab command that produces two-way contingency tables with tests of independence for survey data. There are also six new svy estimation commands. See [U] 1.3.2.3 New svy survey commands for a complete list.

New ml command

The all new ml command for maximizing user-defined likelihood functions is easier to use, faster, and more robust. Even if you do not program your own MLEs, this change will affect you. Many of Stata's MLEs are programmed using ml, and so now they converge faster and more robustly. See [R] ml. Those interested in programming their own estimators will also want to see the new book Maximum Likelihood Estimation with Stata (Gould and Sribney 1999).

New matrix language

You can now write long, complicated matrix expressions; no longer are you restricted to one matrix operation per command. You can write expressions such as

    matrix b = syminv(X'*X)*X'*y

This makes working with matrices in Stata much, much easier.

Ado-files now behave just like internal commands

Quotes now work with ado-file implemented commands. Previously, you could not type, for instance,

    . logistic outcom x1 x2 if sex=="female"

because logistic was implemented as an ado-file and quoted strings confused ado-files. That is fixed.

Ado-files can now process datasets regardless of the number of variables in them. For instance, previously

    . codebook

would not work if the dataset had more than 600 variables because the ado-file could not hold all the variable names in a single macro. That is fixed.

The result of these two changes is that ado-files now behave just like internal commands from the user's point of view.

New function returns the estimation sample

After running any estimation command, the new function e(sample) returns true (1) if the observation was used in estimation and false (0) otherwise. You can type, for instance,

    summarize if e(sample)

to obtain summary statistics on the estimation sample. See [U] 23.4 Specifying the estimation subsample.

New way of saving results

Run summarize and then type return list. You will see r(N), r(mean), r(Var), and other r(name) items listed. r(N) contains the number of observations, r(mean) the mean, and r(Var) the variance. The new r(name) method of saving results replaces both _result(#) and $S_#. Hence, both ado-files and internal commands now save results in the same way.

Run regress and then type estimates list. You will see e(N) and other e(name) items. Some of the e(name) items are scalars, some are macros, some are matrices, and one is a function (e(sample)).

See [R] saved results for details on both r() and e().

Stata is 5% faster

We have sped up the rate at which Stata can evaluate expressions along with making other speed improvements.


New features
Highlights
Statistics
Existing
Functions
Data-management
Programming
Others

Order It!

1.3.2 New statistical features

1.3.2.1 New ANOVA features

Repeated-measures ANOVA and ANCOVA:
anova can now perform repeated-measures ANOVA and ANCOVA. Repeated-measure variables (up to four in one anova) are now fully supported. In addition to the regular ANOVA table, F tests based on the Box, Greenhouse–Geisser, and Huynh–Feldt corrections are also reported for terms involving a repeated-measures variable. See [R] anova.

Nested and mixed designs:
anova now handles nested and mixed designs. It is now easy within anova to specify the appropriate error(s) for testing nested and mixed terms. This means that for most analyses you can now get the appropriate F tests for all terms in one ANOVA table with one command. The test command provides a simple way to obtain any other F tests of interest. This is possible because of new syntax that has been added to anova to make specification of various nonresidual error terms easier, and that same syntax is understood by test as well. See [R] anova.

1.3.2.2 New st survival analysis features

There are substantial additions and may changes to the st system of commands for survival analysis.

New st survival estimators

    There are four new st survival estimators:

      Lognormal parametric survival regression
      Log-logistic parametric survival regression
      Gompertz parametric survival regression
      Generalized log-gamma parametric survival regression

    In the st system, these four new estimators are obtained using the new streg command. In addition to lognormal, log-logistic, Gompertz, and generalized log-gamma models, streg also estimates the Weibull and exponential models. See [R] st streg. The old stereg and stweib commands are undocumented but continue to work.

    For those not wanting to stset their data, the new stand-alone commands for estimating these parametric models are lnormal, llogistic, gompertz, and gamma, in addition to the previously existing ereg and weibull commands; all are documented under [R] weibull.

Other new st survival features

st has been rewritten:
You can now have different types of failure events in the same dataset, and there is now a careful distinction made between analysis time and time as you measure it (analysis time = 0 corresponds to start of risk). These changes are all incorporated into stset and many new features are added as well. When you stset the data, you can specify when subjects became at risk (either by time or by event), when they came under observation (either by time or by event), and when they failed or were censored (either by time or by event). The new streset command allows varying a previous definition. See [R] st stset.

Because there have been so many STB inserts based on the old st system, if you type version 5.0, you will be running the old system.

Testing the proportional-hazards assumption:
stphtest, for use after stcox, presents a test of the assumption; see [R] st stcox. stphplot and stkmcox (based on Garrett 1997) provide a graphical interpretation of the proportional-hazards assumption; see [R] st stphplot.

Ties in Cox regression:
stcox and cox now provide three ways to handle ties in addition to the Breslow approximation: the exact partial likelihood method, the exact marginal likelihood method, and the Efron approximation. See [R] st stcox and [R] cox.

Cumulative baseline hazard:
stcox and cox can now calculate the cumulative baseline hazard. See [R] st stcox and [R] cox.

Residuals after Cox regression:
stcox and cox command can now calculate Schoenfeld residuals, scaled Schoenfeld residuals, Cox–Snell residuals, cumulative Cox–Snell residuals, cumulative martingale residuals, and deviance residuals, in addition to the martingale and efficient score residuals that were previously available. See [R] st stcox and [R] cox.

Residuals after parametric survival regression:
After parametric survival model estimation with the streg command (exponential, Weibull, lognormal, log-logistic, Gompertz, or generalized log-gamma models), the same residuals are available as after stcox and cox; see [R] st streg.

Predicted survival and hazard functions:
stcurv, after streg, will plot the predicted survival, hazard, and cumulative hazard functions; see [R] st streg.

Rates and SMRs:
strate calculates and tabulates rates and SMRs by one or more categorical variables; see [R] st strate.

Stratified rate ratios:
stmc command calculates and tests stratified rate ratios using Mantel–Cox methods. stmh command calculates and tests stratified rate ratios using Mantel–Haenszel methods. See [R] st strate.

Nested case-control datasets:
sttocc command creates a nested case–control study dataset from a cohort-study dataset; see [R] st sttocc.

Splitting and joining time records:
The features of lexis and stlexis (Clayton and Hills 1995a, 1997) have been incorporated into the new stsplit command. stsplit splits time records into two or more records at the time points specified. The new stjoin command (based on Weesie 1998) performs the inverse operation. See [R] st stsplit.

Snapshot data:
snapspan makes it easier to convert snapshot data into time-span data; see [R] snapspan.

Changed st commands

stgen now has new functions for calculating earliest and latest times and times corresponding to an event; see [R] st stgen.

sts now provides the Nelson–Aalen estimator of the cumulative (integrated) hazard function in addition to what was previously provided such as the Kaplan–Meier estimate of the survivor function; see [R] st sts.

sts graph command now graphs the survival function from analysis time 0 rather than the time of the first failure and an option restores the previous behavior; see [R] st sts graph.

1.3.2.3 New svy survey commands

Two-way contingency tables:
svytab produces two-way tabulations with tests of independence for complex survey data or other clustered data. The command can display estimated proportions with standard errors and confidence intervals. Tests of independence include the Rao-and-Scott second-order correction for the Pearson chi-squared and likelihood-ratio statistics. Wald tests, which historically have been used, can be also computed. See [R] svytab.

Censored and interval regression:
The new svyintrg command is the parallel of intreg for survey data; see [R] svy estimators.

Instrumental variables regression:
The new svyivreg command is the parallel of ivreg for survey data; see [R] svy estimators.

Multinomial logistic regression:
The new svymlog command is the parallel of mlogit for survey data; see [R] svy estimators.

Ordered logistic regression:
The new svyolog command is the parallel of ologit for survey data; see [R] svy estimators.

Ordered probit:
The new svyoprob command is the parallel of oprobit for survey data; see [R] svy estimators.

Poisson regression:
The new svypois command is the parallel of poisson for survey data; see [R] svy estimators.

1.3.2.4 New time-series features

Stata has added some new time-series estimators and developed several other features designed for time-series data including, importantly, changes to Stata's language to support time-series operators and additions of time-series date formats.

New time-series estimators

ARIMA:
The new arima command estimates via maximum likelihood ARIMA models and models with ARMA disturbance structures; see [R] arima. Estimates and predictions are based on optimal filtering using the Kalman filter. Variance estimates for the parameters can be computed using either the standard method for MLEs (i.e., the inverse of the negative Hessian) or the robust Huber/White/sandwich variance estimator.

ARCH, GARCH, ARCH-in-mean:
The new arch command estimates via conditional maximum likelihood a family of models with autoregressive conditional heteroscedastic disturbances: ARCH, GARCH, EGARCH, APARCH, NARCH, AARCH, GJR, and others; see [R] arch. Estimation is by conditional maximum likelihood, and coefficient variances can be estimated using either the standard method for MLEs (i.e., the inverse of the negative Hessian), or the outer product of gradients (OPG), or the robust Huber/White/sandwich variance estimator. In addition to conditional heteroscedasticity, arch can model multiplicative deterministic heteroscedasticity and ARMA structure in the disturbances.

Other new time-series features

Time-series varlists and time-series operators:
Stata's new time-series features begin with the newly allowed time-series varlists and time-series operators in expressions—see [U] 14.4.3 Time-series varlists, [U] 16.8 Time-series functions, [U] Time-series operators, and [U] 27.3 Time-series dates. You can use time-series operators in varlists; e.g., L.gnp means gnp lagged once and L2.gnp means gnp lagged twice. To use these new features, you must first tsset your data; see [R] tsset.

Many commands work with time-series data:
Many new and existing commands now work with time-series data, meaning they allow you to specify a time-series varlist. These include, importantly, regress, reg3, ivreg, prais, and sureg, along with generate and replace, graph, list, summarize, and correlate (but you would probably prefer to use the new xcorr rather than correlate). You can tell when a command allows a time-series varlist because the note "varlist may contain time-series operators" appears at the end of the syntax diagram.

Graphs and tables of autocorrelations and partial autocorrelations:
The new corrgram, ac, and pac commands (based on Becketti 1992) graph and displays tables of partial and autocorrelations; see [R] corrgram.

Graphs and tables of cross-correlations:
The new xcorr command graphs and displays tables of cross-correlations; see [R] xcorr.

Periodograms:
The new pergram command graphs periodograms; see [R] pergram.

Cumulative sample spectral density:
The new cumsp command graphs the cumulative sample spectral density and optionally saves the values; see [R] cumsp.

Portmanteau test for white noise:
The new wntestq command presents a portmanteau test for white noise, also known as the Box&$150;Pierce test and the Box–Ljung test; see [R] wntestq.

Bartlett's periodogram test for white noise:
The new wntestb command (based on Newton 1996) presents a Bartlett's periodogram test for white noise; see [R] wntestb.

Dickey–Fuller test for unit roots:
The new dfuller command presents the augmented Dickey–Fuller test for unit roots; see [R] dfuller.

Phillips–Perron test for unit roots:
The new pperron command presents the Phillips&$150;Perron test for unit roots; see [R] pperron.

Prais–Winsten regression:
prais now supports time-series operators, will produce heteroscedasticity robust variance estimates using White's method, provides the two-step method, and will calculate the autocorrelation coefficient in various ways. The existing corc and hlu commands are now undocumented and subsumed by new features of prais. See [R] prais.

Durbin–Watson statistic:
The existing regdw command (regression with Durbin–Watson statistic) is now undocumented and replaced by the post-estimation command dwstat for use after regress; see [R] regression diagnostics.

New date formats:
Stata has lots of new date formats for use with time-series data; see [U] 15.5.3 Time-series formats.

A note about the Becketti time-series library:
A side effect of the use of the new time-series operators (e.g., L2.gnp) is that periods are no longer allowed in variable names. Previously, a variable could be named abc.def, but now that would mean the result of applying the operator abc to the variable named def. This means the time-series commands in the Becketti library (Becketti 1995) will no longer work. They will not work even if you set version 5.0. Stata's new time-series features mostly replace the Becketti functions, but in case you need to use an existing do-file to replicate old work, we have updated the Becketti library to work with Stata 6. To obtain the updated library,

type or pull down Help and select STB and User-written Programs
. net from http://www.stata.com click on http://www.stata.com
. net cd users click on users
. net cd becketti click on becketti
. net describe tslib click on tslib

1.3.2.5 New xt panel estimators

Random-effects interval regression:
New xtintreg command; see [R] xtintreg.

Hildreth–Houck random-coefficients regression:
New xtrchh command; see [R] xtrchh.

Random-effects tobit:
New xttobit command; see [R] xttobit.

Random-effects probit:
See [R] xtprobit.

Random-effects logistic regression:
New xtlogit command; see [R] xtlogit.

Random-effects complementary log-log regression:
New xtclog command; see [R] xtclog.

Population-averaged complementary log-log regression:
New xtclog command; see [R] xtclog.

Gaussian random-effects Poisson:
See [R] xtpois.

Gamma random-effects Poisson:
See [R] xtpois.

Fixed-effects (conditional) Poisson:
See [R] xtpois.

Beta random-effects negative binomial regression:
New xtnbreg command; see [R] xtnbreg.

Fixed-effects (conditional) negative binomial regression:
New xtnbreg command; see [R] xtnbreg.

1.3.2.6 Other new estimators

3SLS:
The new reg3 command estimates systems of equations by 3SLS. It also estimates seemingly unrelated regression, multivariate regression, and 2SLS. It allows linear constraints within and across equations; allows iterated or two-step estimation; accepts new in-line equation syntax; and accepts time-series operated variable lists. See [R] reg3.

Instrumental-variable regression:
The new ivreg command estimates instrumental-variable or 2SLS models that were previously estimated by an extension of regress's syntax (which most users found confusing and which continues to work) and allows time-series operators. predict after ivreg now provides additional statistics. See [R] ivreg.

Interquantile regression:
See [R] qreg.

Simultaneous quantile regression:
See [R] qreg.

Bivariate probit:
The new biprobit command estimates bivariate probit models and partial-observability bivariate probit models, with possibly different independent variables for each of the dependent variables; see [R] biprobit.

Probit with sample selection:
The new heckprob command extends Heckman-style selection models to probit; see [R] heckprob.

Heteroscedastic probit:
New hetprob command; see [R] hetprob.

Skewed logit:
New scobit command; see [R] scobit.

Complementary log-log regression:
New cloglog command (based on Hilbe 1996, 1998); see [R] cloglog.

Zero-inflated Poisson:
New zip command; see [R] zip.

Zero-inflated negative binomial:
New zinb command; see [R] zip.

1.3.2.7 Other new statistical commands

Adjusted predictions:
The new adjust command makes tables of predictions (or predicted probabilities) after estimation. The predictions can be adjusted to set levels of regressors, covariates, or terms. See [R] adjust.

Hausman test:
The new hausman command provides a general implementation of the Hausman test. This includes the ability to test the independence of irrelevant alternatives (IIA) after multinomial logit or conditional logistic regression and tests exogeneity or over-identifying restrictions for 2SLS and 3SLS. See [R] hausman.

Tabulated odds and odds ratios:
The new tabodds command (based on Clayton and Hills 1995b) calculates and tabulates odds and odds ratios for case–control or prevalence studies and performs a score test for linear trend in the log odds; see [R] epitab.

Mantel–Haenszel odds ratios:
The new mhodds command (based on Clayton and Hills 1995b) calculates and reports Mantel–Haenszel odds ratios for case–control or prevalence studies; see [R] epitab.

Indirectly standardized rates:
The new istdize command produces indirectly standardized rates using a standard population; see [R] dstdize.

Symmetry tests for tables:
The new symmetry and symmi commands perform asymptotic symmetry and marginal homogeneity tests and exact symmetry tests on n * n tables where there is a one-to-one matching of cases and controls. They can also perform a test for linear trend in the log relative risks. See [R] symmetry.

Rootograms and histograms:
The new spikeplt command (Cox and Brady 1997a, 1997b) graphs rootograms and histograms for both categorical and continuous variables; see [R] spikeplt.

Orthogonalization of variables:
The new orthog command orthogonalizes a set of variables and creates a new set of orthogonal variables using a modified Gram–Schmidt procedure; see [R] orthog.

Checking quadrature results:
The new quadchk command can be used to assess the stability of results from models estimated using Gauss–Hermite quadrature; see [R] quadchk.


New features
Highlights
Statistics
Existing
Functions
Data-management
Programming
Others

Order It!

1.3.3 Changes to existing commands

alpha incorporates the extensions of Weesie (1997a) adding item-test and item-rest correlations, average interitem covariance/correlation for test scale excluding item, and allowing pairwise computation of covariances as well as computations based on casewise deletion; see [R] alpha.

brier incorporates the extensions of Goldstein (1996): it now computes the mean probability of the forecast, the correlation between judgments and outcomes, the ROC area, and Spiegelhalter's test of the ROC area being greater than .5; see [R] brier.

canon now has new syntax; see [R] canon.

cc will now optionally perform the Breslow–Day test for homogeneity; see [R] epitab.

collapse allows pweights and iweights and provides the interquartile range; see [R] collapse.

corc is now undocumented and subsumed by new features of prais. See [R] prais.

decode has the new option maxlength(#), default maxlength(80), which specifies how many characters of the value label are to be kept in the newly created string; see [R] encode.

dydx and integ have been improved and now fit a cubic spline to the data and use that to produce the derivative and integral. Both commands also newly include a by() option so that calculations can be made within groups. See [R] range.

egen has several new functions; see [R] egen.

eq is now undocumented, but continues to work, and in its place multiple-equation estimators now accept a new in-line equation syntax or obtain the second equation as an option. heckman, for instance, does the latter; see [R] heckman. sureg, for instance, does the former; see [R] sureg. In all cases, the old syntax continues to work, but only if you first set version 5.0.

fit is undocumented because regress has been given all of its features; see [R] regression diagnostics.

for has been extensively updated. It is now more powerful, faster, and easier to use. The syntax is also different; for works the old way if you set version 5.0. See [R] for.

fracpoly, fracplot, and fracgen provide mean adjustment to variables, provide component+residual plots, and provide other new features as well. See [R] fracpoly.

heckman has been extensively updated and is 3 to 6 times faster. It has a new and more flexible syntax; allows robust, cluster(varname), and pweights; and will optionally estimate Heckman's two-step model. The new predict used after heckman can produce Mills' ratio, probability of selected/observed, expected value of y given both selection and model equations, and the expected value of y conditional on selection. See [R] heckman.

hlu is now undocumented and subsumed by new features of prais. See [R] prais.

infile, insheet, and input now automatically widen the display format of variables when the automatic option is specified. This has to do with the new, longer value labels and basically allows these commands to work as you would expect them to work. See [R] infile, [R] insheet, and [R] input.

kap and kappa: The kap command can now handle two or more raters (as kappa always could); kap and kappa have been modified to deal with a large number of ratings; both commands now display tables with missing rows and columns more prettily, and a new absolute option has been added to kap for dealing with unobserved outcomes. See [R] kappa.

lfit, lroc, lsens, and lstat now work after logit as well as after logistic.

lincom allows longer expressions and has a new hazard rate hr option; see [R] lincom.

linktest has a new syntax that makes it easier and less error-prone to use; see [R] linktest.

logit and logistic have an asis option to suppress dropping variables and observations due to oneway causation. logit now allows iweights. See [R] logit and [R] logistic.

loneway incorporates the corrections of Gleason (1997) to the intraclass correlation coefficient for unbalanced and/or weighted data and provides asymptotic and exact confidence intervals for the correlation coefficient; see [R] loneway.

mcc and mcci will report the exact McNemar test in addition to the asymptotic chi^2 result; see [R] epitab.

means now reports confidence intervals for the arithmetic, geometric, and harmonic means (which additions are based on Carlin, Vidmar, and Ramalheira 1998); see [R] means.

merge has a new _merge(newvarname) option to specify the name for the _merge variable; see [R] merge.

mlogit allows robust, cluster(varname), pweights, iweights, and score(). After mlogit, the matrix of coefficient estimates e(b) is now a row vector, just as the returned result would be after any other estimation command. Moreover, get(_b) after mlogit also returns a row vector unless you set version to 5.0 or before. See [R] mlogit.

mvreg now has new syntax; see [R] mvreg.

nbreg and gnbreg allow robust, cluster(varname), pweights, iweights, and score(); see [R] nbreg.

ologit allows robust, cluster(varname), pweights, iweights, and score(); see [R] ologit.

oprobit allows robust, cluster(varname), pweights, iweights, and score(); see [R] oprobit.

poisson allows robust, cluster(varname), pweights, iweights, and score(). The new command poisgof reports a goodness-of-fit test after poisson. See [R] poisson.

prais now supports time-series operators, will produce heteroscedasticity robust variance estimates using White's method, provides the two-step method, and will calculate the autocorrelation coefficient in various ways. The existing corc and hlu commands are now undocumented and subsumed by new features of prais. See [R] prais.

predict after estimation has been extensively reworked.

    predict is now more tightly coupled to the estimation command. The default statistic calculated is now related to the dependent variable. For instance, predict after weibull gives predicted times, exp(E(ln t|xj)), and not xjb.

    predict now calculates more statistics after an estimation command. For instance, after linear-regression style estimators, predict can calculate the probability a <= yj <= b, E(yj|a <= yj <= b), and E(yj*) where yj* = max(a, min(yj, b)). predict can do this, for instance, after tobit, after regress, and after a number of other estimators.

    What predict does after an estimation command is now documented with the estimation command so, if you wanted to know how predict works after regress, you would see [R] regress and if you wanted to know how predict works after clogit, you would see [R] clogit.

    Given predict's new capabilities, the old fpredict, lpredict, nlpred, ologitp, etc., commands are now undocumented (although they still work). predict replaces them all.

probit and dprobit have an asis option to suppress dropping variables and observations due to oneway causation. They now allow iweights. See [R] probit.

regress accepts the new time-series operators. For example, you can now estimate

    . regress irate gnp L.gnp L2.gnp

to model irate with predictors gnp, one-lagged gnp, and twice-lagged gnp. See [R] regress and [U] 14.4.3 Time-series varlists.

regdw (regression with Durbin–Watson statistic) is now undocumented and replaced by the post-estimation command dwstat for use after regress; see [R] regression diagnostics.

reshape has an all new, easier-to-use syntax (Gould 1997, Weesie 1997b); see [R] reshape.

sampsi includes the additions of Seed (1997, 1998) which allow for repeated measurements.

serrbar has been improved; see [R] serrbar.

st has lots of changes; see [U] 1.3.2.2 New st survival analysis features earlier in this chapter.

stack has a new group() option which provides an easier way to use stack. In addition, stack's into() option now understands variable ranges. See [R] stack.

sureg allows linear constraints and cross-equations constraints, accepts the new in-line equation syntax, optionally provides iterated maximum-likelihood estimates, and accepts time-series operators; see [R] sureg.

table and tabdisp now have a concise option which suppresses displaying rows with all missing values; see [R] table and tabdisp.

testnl has a new syntax; see [R] testnl.

ttest, ttesti, sdtest, and sdtesti now display standard deviations and provide a level() option to specify the level for the confidence interval. In addition, string variables are now allowed with the by() option. See [R] ttest and [R] sdtest.

weibull converges more rapidly. See [R] streg and [R] weibull.

while now works interactively and in do-files. Previously, you could only use while in a program. See [R] while.

xtgee has more families and links. It allows inverse Gaussian and negative binomial families and complementary log-log, negative binomial, power, and odds-power links. xtgee now allows iweights. See [R] xtgee.

xtgls will now calculate the autocorrelation coefficient in various ways; see [R] xtgls.

xtreg, ml is now internal and fast. xtreg now allows iweights. See [R] xtreg.


New features
Highlights
Statistics
Existing
Functions
Data-management
Programming
Others

Order It!

1.3.4 New functions and formats

1.3.4.1 Date functions and formats

Weekly, monthly, quarterly, half-yearly, and yearly time variables:
In addition to Stata's existing date format (0 = 1jan1960, 1 = 2jan1960, ...), new formats are provided for other time periods, such as 0 = first quarter of 1960, 1 = second quarter of 1960, etc. See [U] 27.3 Time-series dates. Also see [U] 15.5.3 Time-series formats for how the new display formats (called %t formats) work; for instance, 0 might be displayed as 1960q1 or 1960-1, etc.

Extensions to existing date format:
Going along with the above, the %d format has picked up new features; see [U] 27.2.3 Displaying dates.

String-to-date translation functions:
There are new string-to-date translation functions daily(), weekly(), monthly(), quarterly(), halfyearly(), and yearly(). (daily() is a synonym for the previously existing date() function.) See [U] 27.3.5 Extracting components of time.

New date-literal functions:
New date-literal functions d(), w(), m(), q(), h(), and y() have been added. The d() function, for instance, lets you type things like list if bdate>=d(15jul1982). See [U] 27.3.2 Specifying particular dates (date literals).

Y2K and the date function:
The existing date() function now takes two or (new) three arguments. The third argument makes it easier to deal with two-digit years so that 15 Jan 02 could mean 15jan1902 or 15jan2002. The third argument specifies the maximum year that is to be assigned. If a third argument of 2050 is specified, then two-digit years will be interpreted as being in the range 1951–2050. See [U] 16.3.3 Date functions.

Four-digit years by default:
The default %d format is now %dDlCY and not %dDlY. That means that, by default, the 7th of July, 2002, displays as 07jul2002 and not 07jul02. To obtain two-digit years, you must explicitly specify the previous %dDlY format. See [U] 27.2.3 Displaying dates.

1.3.4.2 Statistical and mathematical functions

New statistical and mathematical functions

digamma() returns the value of the digamma function Psi(x) = dlnGamma(x)/dx.

trigamma() returns the value of the trigamma function dPsi(x)/dx = d2lnGamma(x)/dx2.

reldif(x,y) returns |x-y|/(|y|+1). Note that for very small values of y, reldif() is approximately the absolute difference |x-y| and for very large values, reldif() is approximately the relative difference |x-y|/|y|. Programmers will find reldif() useful in measuring convergence.

mreldif(X, Y), where X and Y are matrices, returns maxi,j |xij - yij|/(|yij| + 1). This allows comparing matrices.

diag0cnt(X) returns a count of the number of 0's on the diagonal of square matrix X.

See [U] 16.3 Functions and [U] 17.8 Matrix functions for details about these functions.

Changed statistical and mathematical functions

invnorm() is now faster and more accurate.

mod() function now handles mod(-a,b), a > 0, properly; it now returns a result that is greater than or equal to 0, not negative. mod(a,-b), b > 0, now returns missing.

normd() now allows one (as previously) or two arguments. normd(z) returns the height of the N(0, 1) density at z. normd(z,s) returns the height of a N(0,s2) density at z.

See [U] 16.3 Functions for more information about these functions.

1.3.4.3 Other functions and formats

Other new functions and formats

Comma formats:
Stata's display formats now support commas. The number 1,002 would still be displayed as 1002 by %9.0g but will be displayed as 1,002 by %9.0gc—note the c on the end of the format. See [U] 15.5 Formats: controlling how data is displayed.

Left-justified string formats:
The string "this" can be displayed left justified by the format %-5s; see [U] 15.5 Formats: controlling how data is displayed.

Missing numeric or string expressions:
The new missing(numeric_or_string_exp) function for use in expressions returns 1 if the argument evaluates to missing and 0 otherwise. Missing here is defined as a numeric argument being equal to . and a string argument being equal to "".

Other changed functions

cond(z1,z2,z3) now allows z2 and z3 to both be string arguments or both be numeric arguments. z2 and z3 had to be numeric arguments previously. cond() returns z2 if z1 is true (not 0 and not missing) and z3 otherwise.

string() now takes one (as previously) or two (this is new) arguments. The optional second argument specifies the format under which the first argument is to be translated to a string. See [U] 16.3.5 String functions.


New features
Highlights
Statistics
Existing
Functions
Data-management
Programming
Others

Order It!

1.3.5 New data-management features

Longer value labels:
Value labels may now be up to 80 characters in length (the previous maximum was 8). Value labels may contain up to 65,536 mappings (the previous maximum was 500). You may now label negative as well as positive values.

The new nofix option on label values and label define prevents the display format of a variable from being widened to accommodate the length of the longest value label. This has to do with the longer value labels. When you label a variable, the width of one of the value labels might be greater than the display format. In that case, the default action is to widen the display format. See [R] label.

Longer variable and dataset labels:
Variable labels and the dataset label may now be up to 80 characters long, up from the previous maximum of 31.

Creating separate variables:
The new separate command creates separate variables from a single variable for each category of another variable; see [R] separate.

contract opposite of expand:
The new contract varlist command (Cox 1998) replaces the data in memory with a new dataset consisting of all combinations of varlist that exist in the data together with a new variable that contains the frequency of combination. Think of contract as the opposite of expand; see [R] contract.

New egen functions:
The following new egen() functions—see [R] egen—are provided:

    fill(# # ...) creates ascending, descending, or repeating patterns of numbers from the part of the pattern that is supplied.

    rmin(varlist) returns the row minimum.

    rmax(varlist) returns the row maximum.

    rfirst(varlist) returns the first nonmissing value in a row.

    rlast(varlist) returns the last nonmissing value in a row.

Concerning egen functions that previously existed:

    mtr(yearvar incvar) now has tax rates through 1997.

    pctile(exp) now calculates noninteger percentiles.

    rank(exp) now allows a by() option.

    rmiss(varlist) now works with strings as well as numeric variables.


New features
Highlights
Statistics
Existing
Functions
Data-management
Programming
Others

Order It!

1.3.6 New programming features

New matrix language

Full matrix parsing:
Stata now has complete matrix parsing, meaning complicated expressions involving matrices are now understood. For example, you can now write

    matrix b = syminv(X'*X)*X'*y

The matrix operations and functions in Stata remain for the most part unchanged, but now you can combine them in complicated expressions. This makes working with matrices in Stata much, much easier. See [U] 17 Matrix expressions.

Other matrix changes:

    matrix substitute is gone and instead you perform matrix substitutions directly:
      matrix A[exp1,exp2] = exp3.

    You can no longer obtain the ith row by referring to A[i,.]; instead refer to A[i,...].

    The new matrix rename command allows renaming matrices; see [R] matrix utility.

    matrix score now has an eq() option; see [R] matrix score.

    matrix accum, matrix vecaccum, and matrix glsaccum now allow iweights.

New features for saved results

Estimation sample saved in e(sample):
After running any estimation command, the new function e(sample) returns true (1) if the observation was used in estimation and false (0) otherwise. You can type, for instance, summarize if e(sample) to obtain summary statistics on the estimation sample. See [U] 23.4 Specifying the estimation subsample.

New way of saving results in r(name):
_result(#) and $S_# have been replaced by r(name). Type summarize and then type return list and you will get the idea. Rather than the mean being stored in _result(3) and the variance in _result(4), they are now stored in r(mean) and r(Var). You can use r(mean) and r(Var) in subsequent expressions. See [R] saved results.

Results are still saved in _result(#) and $S_# so old programs continue to work.

New way of saving estimation results in e(name):
Run any estimation command—regress, logistic, etc.—and then type estimates list. For instance, the number of observations will be found in e(N). This too is described in [R] saved results.

New way of getting the coefficient vector and variance–covariance matrix:
That results are saved in e() is carried forward even to the coefficient vector and the variance–covariance matrix of the estimators, which are now e(b) and e(V). Instead of typing

    . matrix b = get(_b)
    . matrix V = get(VCE)

One simply uses

    . matrix b = e(b)
    . matrix V = e(V)

See [U] 17.5 Accessing matrices created by Stata commands and [U] 21.9 Accessing results calculated by estimation commands.

New way of posting estimation results:
All issues of posting and redisplaying estimation results are now handled by estimates, not matrix. For instance, you use estimates post to post results, not matrix post. You use estimates display to display estimation results, not matrix mlout. The new estimates repost command makes it possible to change posted results. In addition, new options are available for estimates display. See [R] estimates.

Programs now have classes:
Programs are now marked as being r, e, s, or n class, according to whether they save results in the new r(), e(), s(), or do not save results at all. If a program is r, e, or s class, you must specify the rclass, eclass, or sclass option on the program define statement. You may then use return, estimates, or sreturn commands in the program body. See [R] return.

Bigger macros and new style of quotes make ado-files just like internal commands

Bigger macros:
Macros are now a maximum of 18,632 characters long. Note that 2,047 × 9 = 18,423 < 18,632, meaning that macros can hold the name of every variable in the dataset (2,047 is the maximum number of variables allowed in a dataset).

New double-quote characters:
Stata has a new pair of double-quote characters, '" and "', to go along with its standard double-quote character, ". The new-style double quotes—called compound double quotes—can be used anywhere you could use the standard double quotes. Syntax diagrams continue to show the old-style double quotes, but it is implied that compound double quotes may be used. It is not anticipated that end-users of Stata will use the compound double quotes, but it is expected that programmers will use them. The advantage of the compound double quotes is that they nest. What does "A"B"C" mean? It could mean '"A'"B"'C"' or it could mean '"A"'B'"C"'. See [U] 21.3.5 Double quotes.

Ado-files now like internal commands:
Taking the above two changes together, there is now no reason why an ado-file implemented command cannot be indistinguishable from an internally implemented command. The previous problems were (1) if there were too many variables, a single macro could not hold all of their names and (2) ado-files did not treat double-quoted strings correctly.

Longer limits elsewhere, too:
Most of the other limits have changed, too. For instance, the maximum number of characters in a command is now 18,648 (the previous maximum was 6,144). Enter Stata and type help limits.

New commands for parsing, etc.

New syntax command replaces parse:
The new syntax command replaces parse (which continues to work but is undocumented) for the parsing of standard Stata syntax. The new command is easier to use and more powerful. See [R] syntax.

New command for unloading arguments to a program:
The new args command is the right way to receive positional arguments rather than referring to '1', '2', .... That is, rather than using

    program define myprog
      local a '1'
      local b '2'
      local c '3' ...

you simply use

    program define myprog
      args a b c
      ...

See [R] syntax.

New command for nonstandard parsing:
The new tokenize command replaces parse, parse(). For example, you would use tokenize to parse on blanks or other special characters. parse continues to work, but is now undocumented. See [R] tokenize.

Original command line that invoked program is accessible:
The macro '0' now contains the original line, as typed by the user, with quotes, multiple blanks, and all, when a program is invoked. See [U] 21.4 Program arguments.

Tokens can be unloaded one at a time:
The new gettoken command allows fetching tokens one at a time from the input stream ('0') or from any macro. See [R] gettoken.

New tools for programming commands

New command for marking sample:
The new marksample command is an easier-to-use and less error prone alternative to mark and markout when you use the new syntax command to parse input. In addition, both the existing mark and new marksample commands provide a new zeroweight option to include observations with zero weights. See [R] mark.

Programmers can make predict work with new estimation commands:
Now predict is implemented as an ado-file and, under the new scheme, it calls 'e(predict)', the name of the prediction command saved by the estimation command. This command—another ado-file—is typically implemented in terms of _predict, the old built-in predict command. (When version is 5.0 or before, all of this is turned off and _predict becomes predict.) See [R] _predict.

Replaying estimation results:
The new replay() function returns 1 if the first nonblank character in '0' (what the user typed as the user typed it) is comma or if '0' is "". This makes writing estimation commands easier. Early on you simply code 'if replay() {' and put the redisplay logic there. See [U] 16.3.6 Special functions.

Command for handling new time-series varlists:
The new tsrevar command assists in writing commands that use time-series varlists; see [R] tsrevar.

New command for unabbreviating varlists:
The new unab command replaces unabbrev (unabbrev continues to work), but now, both are rarely used because of the new features of syntax. There is a tsunab command for time-series varlists, too. See [R] unab and [R] syntax.

New command for parsing numlists:
The new numlist command helps parse numlists but, as with unab, it is rarely used because syntax can do that, too. See [R] numlist and [R] syntax.

Confirm numeric variable:
The new confirm numeric variable command is just confirm string variable turned on its head; see [R] confirm.

Determining the version of caller:
The new _caller() function returns the version number of the program or session that invoked the program currently being executed. This makes coding for backwards compatibility easier. For instance, we changed Stata's st commands significantly this release and yet wanted to still ensure that the old routines would work. Consider the nonexistent st command stxyz. We took the old stxyz.ado file and renamed it stxyz_5.ado. We coded our new stxyz.ado file as
       program define stxyz
               version 6
               if _caller() <= 5 {
                       stxyz_5 '0'
                       exit
               }
               ... 
        end
   

New macro extended functions

data label returns the dataset label; see [U] 21.3.6 Extended macro functions.

label extended macro function now has syntax:

    local...: label { labelname|(varname) } {maxlength|#val [#len]}

First, you can indirectly refer to the value label associated with a variable by enclosing the variable name in parentheses. Second, specifying maxlength returns the length of the longest label in the value label. Third, specifying the #len trims the result to being no more than #len characters long. See [U] 21.3.6 Extended macro functions.

piece returns the piece of string, given a specified maximum length, the piece being at a word break; see [U] 21.3.6 Extended macro functions.

rowfullnames and colfullnames return the "full" row and column names of a matrix. This has to do with the new time-series features. Row and column names now potentially consist of three parts: the equation name, the time-series operator, and the variable name, e.g., eq1:L.gnp. See [R] matrix define and the new [U] 17 Matrix expressions chapter.

subinstr changes all occurrences of one substring to another; see [U] 21.3.6 Extended macro functions.

sysdir returns the identities of various system directories; see [R] sysdir.

Other new programming features

Better way to get contents of characteristics:
It is now possible to obtain the contents of characteristics by quoting them rather than first unloading them into a macro. For example, you can refer to 'mpg[note0]' and '_dta[tis]'. See [U] 21.3.11 Referencing characteristics.

New string formats for display:
display now allows %s formats so you can write code such as

    display %9s "'varname'"

to produce right-aligned strings. With the new %-#s format, you can produce left-aligned strings by coding

    display %-9s "'varname'"

In addition, display (and only display) understands ~ to mean center. display can center strings:

    display %~80s "My Title"

See [R] display (for programmers).

Different characteristics for st datasets:
Survival-time st datasets are now marked using different characteristics. Nevertheless, old st commands will continue to work as long as you set version 5.0. See [R] st st_is for a description of the new standard.

Underscore variables do not affect changed flag:
Changes to variables beginning with an underscore no longer count as a change to the data in terms of the "dataset has changed since last saved" flag.

Utilities for writing certification scripts:
Although not documented in the manuals, we have included the utility commands we use for writing certification scripts—do-files that prove Stata commands work as intended. If you write ado-files and want to write certification do-files to go along with them, start by seeing help cscript. cscript is a command to assist writing certification scripts, and in the help file we provide guidelines on how to do this along with links to the help files of other undocumented but useful commands such as rcof, bleeof, and old_ver.


New features
Highlights
Statistics
existing
Functions
Data-management
Programming
Others

Order It!

1.3.7 Other new features

Specifying lists of numbers:
Stata now has a concept of a numlist. All throughout Stata, whenever you need to type a list of numbers, you can type that list using numlist syntax. numlist syntax includes 1/3 to mean 1, 2, 3; 2(2)6 to mean 2, 4, 6; 0 5 to 20 to mean 0, 5, 10, 15, 20; and so on. See [U] 14.1.8 numlist.

Offsets:
Many estimation commands now allow an offset(varname) option, specifying that the linear equation to be estimated is xjb + varnamej.

New Help window:
Stata for Windows 98/95/NT and Stata for Power Macintosh users: The new net features carry over to the Help window. Pull down Help and you will discover that the Help window has the look of a browser. The similarity is more than just in appearance. For instance, if a help file refers to a FAQ on our website, if you click on the reference, your browser will be invoked to display the referenced FAQ. Back in Stata's Help window, you can also download and install official updates, or STB ado-files, etc., by pointing and clicking.

Searching for information on commands, STB articles, FAQs, etc.:
lookup is now called search. It includes a new faq option for searching among the FAQs. See [R] search.

profile.do is automatically executed at start-up:
Stata on all operating systems now looks for and, if found, executes profile.do when Stata is invoked. You can put commands in profile.do to tailor Stata to your tastes. See [GSW] A.7 Executing commands every time Stata is started (Windows 98/95/NT), [GSW] B.7 Executing commands every time Stata is started (Windows 3.1), [GSM] A.6 Executing commands every time Stata is started (Macintosh), or [GSU] A.7 Executing commands every time Stata is started (Unix).

New arrangement of Stata directories (folders):
We have restructured the Stata directories (folders) and added a new command sysdir to make accessing them easier. See [R] sysdir for a description of the new scheme.

Copy files from inside Stata:
The new copy command allows copying files from inside Stata. Moreover, copy, like all of Stata's file manipulation commands, can read files over the Internet. See [R] copy.

Create directories (folders) from inside Stata:
The new mkdir command allows creating directories (folders) from inside Stata. See [R] mkdir.


© Copyright 1996–2025 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index