This page contains only historical information and is not about the current
release of Stata.
Please see our features page
for information on the current version of Stata.
Data-management features in Stata 8
New data-management features include
- ODBC support (Stata for Windows)
- 26 new missing-value codes (.a,
.b, ...,
.z)
- More convenient syntax for generate
|
- merge and
append improved
- tsappend added
- more
|
ODBC support
New command
odbc allows Stata for Windows to act as an ODBC client,
meaning that you can fetch data directly from ODBC sources.
Stata 8 supports full SQL selection statements.
26 new missing-value codes
Stata now has multiple missing values! In addition to the previously
existing
., there are now
.a,
.b,
...,
.z, and you can attach value
labels to the new missing codes!
Matrices can now contain missing values, both standard (.) and
extended (.a, .b, ..., .z).
More convenient syntax for generate
Existing command
generate has a new,
more convenient syntax. Now you can type
. generate a = 2 + 3
or
. generate b = "this" + "that"
without specifying whether new variable
b is numeric or a string of a particular
length. If you wish, you can also type
. generate str b = "this" + "that"
which asserts that b is a string but
leaves it to generate to determine the
length of the string. This is useful in programming situations because it
helps to prevent bugs. Of course, you can continue to type
. generate double a = _pi/2
and
. generate str8 b = "this" + "that"
merge and append improved
Existing command merge has been improved:
- New options unique,
uniqmaster, and
uniqusing ensure that the merge goes as
you intend. These options amount to assertions that, if false, cause
merge to stop.
unique specifies that there not be
repeated observations within match variables, and if you say
``merge id using myfile'',
specifies that there be one observation per id
value in the master data (the data in memory) and one observation per
id in the using data. If observations are
not unique, merge will complain.
Options uniqmaster and
uniqusing make the same claim for one
or the other half of the merge; uniq
is equivalent to specifying uniqmaster
and uniqusing.
- merge no longer limits the number of match (key) variables.
- merge has new option
keep(varlist) that
specifies the variables to be kept from the using data.
Similarly, keep(varlist)
has been added to append.
tsappend added
New command tsappend appends
observations in a time-series context.
tsappend uses the information set by
tsset, automatically fills in the time
variable, and fills in the panel variable if the panel variable was set.
More
Other improvements include the following:
- Existing command list has been
completely redone. Not only is output far more readable — and even
pretty — but programmers will want to use
list to format tables.
- New command isid verifies that a
variable or set of variables uniquely identifies the observations and so is
suitable for use with merge.
- Existing command describe
using will now allow you to specify a
varlist, so you can check whether a
variable exists in a dataset before merging or appending. Programmers will be
interested in the new varlist option,
which will leave in r() the names of the
variables in the dataset.
- Existing command codebook has new
option problems to report potential
problems in the data.
- New command labelbook is like
codebook, but is for value labels. In
addition to providing documentation, the output includes a list of potential
problems.
- New command numlabel prefixes
numerical values onto value labels and removes them. For example, the mapping
2 to ``Catholic'' becomes ``2. Catholic'' and vice versa.
- New command duplicates reports, gives examples of, lists, browses,
tags, and/or drops duplicate observations.
- Existing command recode now allows a
varlist rather than a
varname, so several variables can be
recoded at once.
- Existing command recode has new option
generate() to specify that the transformed
variables be stored under different names than the originals.
- Existing command recode has a new option
prefix(), which is an alternative to
generate, to specify that the transformed
variables be given their original names with a prefix.
- Existing command sort has new option stable indicating that
within equal values of the sort keys, the observations appear in the same
order as they did originally.
- New command webuse loads the specified
dataset, obtaining it over the web. By default, datasets are obtained from
http://www.stata-press.com/data/r8/, but you can reset that.
- New command sysuse loads the specified
dataset that was shipped with Stata, plus any other datasets stored along the
ado-path.
- Existing command insheet has a new
delimiter(char) option that
allows you to specify an arbitrary character as the value separator.