Stata’s data-management features are now documented in a single volume for
easy reference. This includes match-merges, file and variable management, and
sorting, as well as more advanced features, such as collecting statistics from
any command over groups and reshaping datasets from wide to long and vice
versa.
New features include the ability to read and write datasets in the format
required for FDA NDAs, support for reading and writing XML, support for
simultaneous multiple language labels, filtering files, and additional support
for ODBC.
Here are the details.
- There is a new manual [D] Data Management, and the
data-management commands have been moved from [R] to [D].
See [D] intro for an expanded what’s new for data-management
capabilities.
- Existing command set type now has a permanently
option. You can now permanently set the default
datatype to either
float (the factory default) or double.
- New commands xmlsave and xmluse save and restore datasets
in Extended Markup Language (XML) format. Data may be saved or used in
either Stata dta XML format or Microsoft Excel’s SpreadsheetML
format. See [D] xmlsave.
- New commands fdasave, fdause, and fdadescribe
save, use, and describe files in the format required by the U.S. Food and
Drug Administration (FDA) for new drug and device applications (NDAs).
These commands are designed to assist people making submissions to the
FDA, but the commands are general enough for use in transferring data
between SAS and Stata. The FDA format is identical to the SAS XPORT
Transport format. See [D] fdasave.
- Value labels may now be up to 32,000 characters long.
- Existing command label has a new subcommand language that
lets you create and use datasets containing different
variable, value, and data labels, which might be in different languages.
See [D] label language.
- Datasets from the examples in the Stata manuals can now be
browsed, described, and used. Type help
dta contents,
or select File Example datasets...
from the Stata menu.
- statsby is now a prefix command; see
[U] 11.1.10 Prefix commands.
For information on its new syntax, see [D] statsby.
Enhancements to statsby include
- Rather than requiring a list of expressions for the
statistics to collect, statsby now collects a default set.
- Expressions to be computed and saved can now be grouped together
as equations; see
exp list.
- String variables are now allowed.
- Weights are now allowed.
- New option force forces statsby to work with
survey estimators. By default, this is prevented
because the method statsby uses to select subsamples will
generally not produce appropriate standard error estimates with
survey data (the subpop option must be used with survey
data).
- Dots showing the progress of computations are now shown by default.
- New option nolegend suppresses the table reporting
on what statsby is running.
- New command filefilter copies an input file to an output file while
converting specified ASCII or binary pattern to another pattern; see
[D] filefilter.
- New command expandcl replicates clusters of unique observations,
much like an expand, but for clustered data; see
[D] expandcl.
- New command tostring converts numeric variables to string;
see [D] tostring.
- Existing command codebook now allows if and in
qualifiers; see [D] codebook.
- New command rmdir removes an existing directory (folder);
see [D] rmdir.
- New command clonevar makes an identical copy of an
existing variable; see [D] clonevar.
- Existing commands
icd9 and icd9p have been updated to use the V21 codes;
see [D] icd9 and
[D] icd9p.
- Existing command
encode has new option noextend that prevents adding
new value label mappings; see
[D] encode.
- Existing command odbc for accessing Open DataBase Connectivity
(ODBC) data sources has the following enhancements:
- ODBC is now supported under
Mac OS X and Linux systems that use the iODBC Driver Manager.
For more information on configuring ODBC for Mac and Linux,
see the FAQ at
http://www.stata.com/support/faqs/data-management/configuring-odbc/.
- odbc has new subcommands
odbc insert and odbc exec for writing data to an ODBC
data source. Positioned updates can be performed using the
odbc exec command.
- odbc has a new subcommand sqlfile for batch
processing SQL instructions.
- odbc load has a new option sqlshow for debugging
SQL communication with ODBC drivers.
- odbc load has new options allstring and
datestring, which import either all data or just dates as
strings.
See [D] odbc.
- Existing command merge has the following new features:
- It now accepts multiple using files.
- New option nosummary suppresses creating
variables that summarize how the records were merged.
- New option sort option sorts the master and
using datasets if they are not already sorted.
- Existing options unique, uniqmaster, and
uniqusing now require you to specify matching variables.
- Warning messages are now given when matching variables do
not uniquely identify observations.
See [D] merge.
- Existing commands
merge and append now incorporate all notes from the using
dataset that do not already appear in the master dataset,
unless new option nonotes is specified;
see [D] merge and [D] append.
- Existing command contract has new options cfreq(),
percent(), cpercent(), float, and format() to
create frequency and percentage variables; see
[D] contact.
- Existing commands
corr2data and drawnorm now support triangular specification
of the correlation or covariance matrix; see
[D] corr2data and [D] drawnorm.
- Existing command
separate has new option shortlabel
to specify that shorter variable labels be created; see
[D] separate.
- Existing command
outfile has new option missing that preserves both standard
and extended missing values when the comma option is also
specified; see [D] outfile.
- Existing command clear now performs mata: mata
clear in addition to everything else; see
[D] clear.
Functions and expressions
- The limit for the number of dyadic operators has been
increased from 200 to 500; see limits.
- The default matrix size (matsize) for Intercooled Stata is now 200,
rather than 40. The default for Stata/SE remains 400, and for Small
Stata, 40.
- The following new functions have been added in the context of
expressions, such as generate newvar = exp or
if exp:
name |
purpose |
binormal() |
bivariate normal cumulative |
atan2() |
two-argument arc tangent |
regexm() |
regular expression matching |
regexr() |
regular expression replacement |
regexs() |
regular subexpressions |
indexnot() |
first string s1 not in s2 |
See [D] functions or type help followed by the function name, such as help binormal().
In addition, a host of new functions are available through Mata;
see [M-4] intro — Index and guide to functions.
- The following existing functions have been renamed:
old name |
new name |
index() | strpos()
|
binorm() | binormal()
|
match() | strmatch()
|
norm() | normal()
|
invnorm() | invnormal()
|
normd() | normalden()
|
lnfact() | lnfactorial()
|
issym() | issymmetric()
|
syminv() | invsym()
|
Old names continue to work. Functions were renamed because the new name is
better and because Mata uses the new name, and you want to be able to use the
same names in both environments.
- The following existing functions now have two names, and you can use
either:
Name 1 |
Name 2 |
lower() |
strlower() |
upper() |
strupper() |
proper() |
strproper() |
ltrim() |
strltrim() |
rtrim() |
strrtrim() |
trim() |
strtrim() |
reverse() |
strreverse() |
string() |
strofreal() |
int() |
trunc() |
length() |
strlen() |
In this case, throughout the Stata documentation, we use name 1,
but you can use name 1 or name 2 in your Stata expressions. Name 2 matches the
name of the Mata function that does the same thing, so you may want to
standardize on name 2.
- The following egen functions have been renamed:
old name |
new name |
any() |
anyvalue() |
eqany() |
anymatch() |
neqany() |
anycount() |
rfirst() |
rowfirst() |
rlast() |
rowlast() |
rmean() |
rowmean() |
rmin() |
rowmin() |
rmiss() |
rowmiss() |
robs() |
rownonmiss() |
rsd() |
rowsd() |
rsum() |
rowtotal() |
sum() |
total() |
The new names are more consistent.
Old names continue to work but are not documented.