What’s new in data management
- Existing command merge has all new syntax. It is easier to use,
easier to read, and makes it less likely that you will make a mistake.
Merges are classified as 1:1, 1:m, m:1, and m:m.
When you type merge 1:1, you are saying that you expect the
observations to match one-to-one. merge 1:m specifies a
1-to-many merge; m:1, a many-to-1 merge; and m:m, a
many-to-many merge. New options assert() and keep() allow
you to specify what you expect the outcome to be and what you want to
keep from it. For instance,
. merge 1:1 subjid using filename, assert(match)
means that you expect all the observations in both datasets to match each
other, whereas
. merge 1:1 subjid using filename, assert(match using) keep(match)
specifies that you expect each observation to either match or be solely
from the using data and, assuming that is true, you want to keep only
the matches.
Sorting of both the master and the using datasets is now automatic.
The new merge does not support merging multiple files in one step.
Merge the first two datasets, then merge that result with the next dataset,
and so on.
merge now aborts with error if variables are string in one
dataset and numeric in the other unless new option force
is specified.
The old merge syntax continues to work.
- Existing command append has several new features: 1) it will work
even if there are no data in memory; 2) multiple files can be appended
in one step; and 3) new option generate(newvar)
creates a variable indicating the source of the observations, numbered
0, 1, ... append now aborts with error if variables are string in
one dataset and numeric in the other unless new option force is
specified.
Old behavior is preserved under version control.
- Stata’s default memory allocations have changed:
- Stata/SE and Stata/MP now default to allocating
50M of memory rather than 10M. Stata/IC now defaults to 10M rather
than 1M. Stata’s required footprint has not grown; we reset
these defaults because users were resetting to larger numbers
anyway.
- Stata/IC now defaults matsize
to 400 rather than 200; the default for Stata/SE and Stata/MP
remains 400. The default for Small Stata is now 100 rather than
40.
- Existing command order now does what order, move, and
aorder did.
Old commands aorder and move continue to work but are no
longer documented.
- New commands zipfile and unzipfile compress and uncompress
files and directories in zip archive format.
- New command changeeol converts text from one
end-of-line format to another. Stata does not care about end-of-line
format, but some editors and other programs do.
- New command snapshot saves to disk and restores from disk copies of
the data in memory. snapshot used by the new Data Editor. An
important feature of the Data Editor is that it can log all the changes
you make interactively. snapshot will show up in those logs.
snapshot really is a command of Stata, so you can replay logs to
duplicate past efforts. For your own use, however, it is better if you
continue using
preserve and
restore.
- You can now copy-and-paste commands from logs and execute them
without editing out the period (the dot prompt) in front! Stata 11 ignores
leading periods.
- Existing command notes has new options search, replace,
and renumber.
- Concerning value labels:
- Existing command label define has new option replace so
that you do not have to drop the value label before redefining it.
- New command label copy copies value labels.
- Existing command label values now allows a varlist, so you
can label (or unlabel) a group of variables at the same time.
- Existing command expand has new option
generate(newvar) that makes it easier to
distinguish original from duplicated observations.
- Concerning egen:
- New function rowmedian(varlist) returns,
observation by observation, the median of the values in varlist.
- New function rowpctile(varlist), p(#)
returns, observation by observation, the #th row percentile
of the values within varlist.
- Existing function mode(varname) with option
missing treats missing values as a category. When version
is set to 10 or less, missing does not treat missing as a
category.
- Existing functions total(exp) and
rowtotal(varlist) have new option
missing. If all values of exp or varlist for
an observation are missing, then that observation in newvar
will be set to missing.
- Existing command copy now allows copying a file to a directory
without having to type the filename twice.
- Existing command clear now allows clear matrix to clear all
Stata matrices (as distinguished from Mata matrices) from memory.
- Existing command outfile now exports date variables as strings
rather than their underlying numeric values. Under version control, old
behavior is restored.
- Existing command reshape now preserves variable and value labels
when converting from long to wide and restores variable and value labels
when converting from wide to long. Thus the value and variable labels
for the i variable, which exists in long form and not in wide
form, are restored when converting back from wide to long. The value
labels of the xij variables are similarly restored. Prior
behavior is preserved when version is 10 or earlier.
- Existing command collapse now allows new statistics semean,
sebinomial, and sepoisson for obtaining the standard error
of the mean.
- Existing command destring allows new option dpcomma to convert
to numeric form string representation of numbers using commas as the
decimal point.
- Concerning existing command odbc:
- odbc insert now uses parameterized inserts, which
are faster.
- The dialogs for odbc load and odbc insert
can now store a data source user ID and password for a Stata
session.
- odbc query has new options verbose and
schema. verbose lists any data source alias,
nickname, typed table, typed view, and view along with tables so
that data from these table types can be loaded. schema lists
schema names with the table names if the data source returns schema
information.
- odbc insert has a new dialog.
- Existing option dsn() now allows the data source to be
up to 499 characters.
- odbc now reports driver errors directly. Previously,
odbc would issue the error “ODBC error; type
set debug on and rerun command to see extended error
information” when an ODBC driver issued an error.
- odbc, with set debug on,
for security reasons no longer displays the data source name, user
ID, and password used for connecting to your data source.
- New function strtoname() converts a general string to a
string meeting Stata’s naming conventions. Also, existing functions
lower(), ltrim(), proper(), reverse(),
rtrim(), and upper() now have synonyms strlower(),
strltrim(), ..., and strupper(). Both sets of names work
equally well.
- New function soundex() returns the soundex code for a name,
consisting of a letter followed by three numbers. New function
soundex_nara() returns the U.S. Census soundex for a name, also
consisting of a letter followed by three numbers, but produced by a
different algorithm.
- New functions sinh(), cosh(), asinh(), and
acosh() join existing functions tanh() and atanh()
to provide the hyperbolic functions.
- New functions binomialp(); hypergeometric()
and hypergeometricp(); nbinomial(), nbinomialp(), and
nbinomialtail(); and poisson(), poissonp(), and
poissontail() provide distribution and probability mass for the
binomial, hypergeometric, negative binomial, and Poisson distributions.
- New functions invnbinomial() and invnbinomialtail(), and
invpoisson() and invpoissontail() provide inverses for the
negative binomial and Poisson distributions.
- Algorithms for the existing functions normal() and
lnnormal() have been improved to operate in 60% and 75% of the
time, respectively, while giving equivalent double-precision results.
- New functions rbeta(), rbinomial(), rchi2(), rgamma(),
rhypergeometric(), rnbinomial(), rnormal(),
rpoisson(), and rt() produce random variates for the
β, binomial, χ2, γ, hypergeometric, negative
binomial, normal, Poisson, and Student’s t distributions, respectively.
Old function uniform() has been renamed to runiform(),
but uniform() continues to work.
All random-variate functions start with r.
- Existing command drawnorm now uses new function rnormal() to
generate random variates. When version is set to 10 or earlier,
drawnorm reverts to using invnormal(uniform()).
- Existing command describe now respects the width of the Results
window when formatting output.
- Existing command renpfix now returns the list of variables
changed in r(varlist).
- Previously existing command impute still works but is now
undocumented. It is replaced by the new multiple-imputation command
mi. Click here for more information.
Back to highlights
|
|