Last updated: 23 August 2007
2007 North American Stata Users Group meeting
13–14 August 2007
Longwood Galleria Conference Center
342 Longwood Avenue
Boston, Massachusetts
Proceedings
Quantiles, L-moments and modes: Bringing order to descriptive statistics
Nicholas J. Cox
Durham University
Describing batches of data in terms of their order statistics or quantiles
has long roots but remains underrated in graphically based exploration,
data reduction, and data reporting. Hosking in 1990 proposed L-moments
based on quantiles as a unifying framework for summarizing distribution
properties, but despite several advantages they still appear to be
little known outside their main application areas of hydrology and
climatology. Similarly, the mode can be traced to the prehistory of
statistics, but it is often neglected or disparaged despite its value as a
simple descriptor and even as a robust estimator of location. This
presentation reviews and exemplifies these approaches with detailed
reference to Stata implementations. Several graphical displays are
discussed, some novel. Specific attention is given to the use of Mata for
programming core calculations directly and rapidly.
Additional information
njctalkNASUG2007.zip
Extensions to var and svar estimation
Michael Hanson
Yingzhe Zhao
Wesleyan University
We develop packages to support computation of historical decompositions in
(S)VAR models in Stata, and to extend the estimation of impulse–response
functions. Specifically, we compute cumulative structural impulse
responses, which are useful for SVAR models that rely on long-run
restrictions. While such models typically are estimated in differences,
the responses of the levels of the endogenous variables to the identified
structural innovations (that is, the cumulative structural impulse
responses) are most often of theoretical interest. We also allow an option
to relax the default assumption of symmetry when computing bootstrapped
error bands for the impulse–response functions. We also develop a
package to compute the historical decompositions of the variables in a
SVAR, as a function of the estimated structural shocks. Used in
conjunction with the previous package, one can compute historical
decompositions for the levels of variables from a long-run SVAR model
estimated in first differences. An application to the determination of the
equilibrium Chinese real exchange rate will be shown.
Additional information
hanson_nasug07.pdf
Meta-analytical integration of diagnostic accuracy studies in Stata
Ben Dwamena
University of Michigan Health System, Ann Arbor
This presentation will demonstrate how to perform diagnostic meta-analysis
using midas, a user-written command.
midas is comprehensive program of statistical
and graphical routines for undertaking meta-analysis of diagnostic test
performance in Stata. Primary data synthesis is performed within the
bivariate generalized linear mixed modeling framework. Model specification,
estimation, and prediction are carried out with
gllamm
(Rabe-Hesketh et.al, spherical adaptive quadrature). Using the estimated
coefficients and variance–covariance matrices,
midas calculates the summary operating sensitivity
and specificity (with confidence and prediction ellipses) in SROC space.
Summary likelihood and odds ratios with relevant heterogeneity statistics
are provided. midas facilitates extensive
statistical and graphical data synthesis and exploratory analyses of
unobserved heterogeneity, covariate effects, publication bias, and subgroup
analyses. Bayes’ nomograms, likelihood-ratio matrices, and
conditional probability plots may be obtained and used to guide clinical
decision making.
Agony and ecstasy: Teaching a computationally intensive
introductory statistics course using Stata
Nicholas Jon Horton
Smith College
In the last decade, a sea change has occurred in the organization of
introductory statistics courses. The mantra of “more data, less
lecture” is widely repeated while active learning opportunities
receive increasing focus. At Smith College, a small liberal arts college,
several introductory statistics courses are offered, with various
mathematical prerequisites. Stata is used as the computing environment for
many of these courses. In all courses, students engage in the analysis of
real-world example datasets, often taught in the form of mini-case studies
(using a set of lab materials developed at UCLA). For the more
mathematically savvy students, introductory statistics concepts are
introduced through simulation and other activities. While Stata serves as
an easy-to-use environment for statistical analysis, there are areas where
additional functionality would improve its use as a testbed for statistical
investigation. In this presentation, I will review the use of Stata for
both of these purposes and detail areas of strengths and potential
improvements.
Additional information
hortonnasug2007.pdf
Powerful new tools for time series analysis
Christopher Baum
Boston College, DIW Berlin, and RePEc
Elliott and Jansson developed a powerful test for unit roots, published in
Journal of Econometrics (2003), extending the
Elliott–Rothenberg–Stock test (
dfgls)
by adding stationary covariates. I will discuss and demonstrate a Stata
implementation of the test. Elliott and Müller's
Review of Economic Studies paper (2006)
illustrates how tests for parameter constancy and tests for an unknown break
process can be unified to produce a single efficient test for stability of
the regression function. I will discuss and demonstrate a Stata
implementation of the test.
Additional information
StataTS07.beamer.7727.pdf
Record linkage in Stata
Michael Blasnik
M. Blasnik & Associates
Record linkage involves attempting to match records from two different data
files that do not share a unique and reliable key field. It can be a
tedious and challenging task when working with multiple administrative
databases where one wants to match subjects by using names, addresses, and
other identifiers that may have spelling and formatting variations. Formal
record linkage methods often use a combination of approximate string
comparators and probabilistic matching algorithms to identify the best
matches and assess their reliability. Some standalone software is
available for this task. This presentation will introduce
reclink, a rudimentary probabilistic record
matching program for Stata. reclink uses a
modified bigram string comparator and allows user-specified match and
nonmatch weights. The algorithm also provides for blocking (both
“or” and “and”) to help improve speed for this
otherwise slow procedure.
Ado-lists: A new concept for Stata
Ben Jann
ETH Zürich
A new command called
adolist is presented.
adolist is a tool to create, install, and uninstall
lists of user ado-packages (“adolists”). For example,
adolist can create a list of all user packages
installed on a system and then install the same packages on another system.
Moreover,
ado-list can be used to put together
thematic lists of packages such as, say, a list on income inequality
analysis or time-series add-ons, or the list of “41 user ados
everyone should know”. Such lists can then be shared with others,
who can easily install and uninstall the listed packages using the
adolist command.
Additional information
jann_nasug07_adolist.pdf
Constructing Krinsky and Robb confidence interval
for mean and median WTP using Stata
P. Wilner Jeanty
Ohio State University
The ultimate goal of most nonmarket valuation studies is to obtain welfare
measures, i.e., mean and/or median willingness to pay (WTP) and confidence
intervals. While the delta
(
nlcom) and
bootstrap (
bs)
methods can be used for constructing such confidence intervals in Stata,
they are not recommended because WTP measures are nonlinear functions of
random parameters (Creel and Loomis 1991). The best and most widely used
approach, which is not available in Stata, consists of simulating the
confidence intervals by using the Krinsky and Robb procedure (Haab and
McConnell 2002). Hole (2007) has recently introduced a useful command,
wtp, that implements the Krinsky and Robb
procedure in Stata but does not feature mean and median WTP estimates and
their confidence intervals. I present a Stata command,
wtpcikr, that computes mean and median WTP, confidence
intervals using the Krinsky and Robb procedure, achieved significance level
(ASL) for testing the null hypothesis that WTP equals zero, and a relative
efficiency measure (Loomis and Ekstrand 1998). The command supports both
linear and exponential contingent valuation models estimated with or
without covariates using the Stata commands
probit,
logit,
biprobit, and
xtprobit.
I will illustrate the use of
wtpcikr by
replicating empirical results in Haab and McConnell (2002).
Additional information
wtpcikr.zip
Resampling inference through quasi–Monte Carlo
Stanislav Kolenikov
University of Missouri, Columbia
This presentation will review quasi–Monte Carlo methods (Halton
sequences) and their applications in resampling inference. The two major
applications are the bootstrap procedures where quasi–Monte Carlo
methods allow one to achieve stability close to that of the balanced
bootstrap and the complex survey variance estimation where
quasi–Monte Carlo methods allow one to create approximately balanced
resampling designs, thus providing a compromise between the balanced
resampling designs and regular bootstrap.
Causal inference with observational data: Regression
discontinuity and related methods in Stata
Austin Nichols
Urban Institute
This overview of implementing quasiexperimental methods of estimating
causal impacts (panel methods, matching estimators, instrumental variables,
and regression discontinuity) emphasizes practical considerations and
Stata-specific approaches, with examples using real data and comparisons
across methods. Particular attention is paid to the regression
discontinuity method, which seems to be less well-known in the larger
community of Stata users but is the most well regarded of the
quasiexperimental methods in those circumstances where it is appropriate.
Additional information
causal.pdf
Recent developments in multilevel modeling, including models for binary and count responses
Roberto G. Gutierrez
StataCorp
Mixed-effects models contain both fixed and random effects. The fixed
effects are analogous to standard regression coefficients and are estimated
directly. The random effects are not directly estimated but instead are
summarized according to their estimated variances and covariances, known as
variance components. Random effects take the form of either random
intercepts or random coefficients, and the grouping structure of the data
may consist of multiple levels of nested groups. In Stata, one can fit
mixed models with continuous (Gaussian) responses by using
xtmixed and in Stata 10, fit mixed models with binary and count
responses by using xtmelogit and xtmepoisson, respectively. All three
commands have a common multiequation syntax and output, and
postestimation tasks such as the prediction of random effects and
likelihood-ratio comparisons of nested models also take a common
form. This presentation will cover many models that one can fit
using these three commands. Among these are simple random intercept
models, random-coefficient models, growth curve models, and
crossed-effects models.
Additional information
gutierrez_boston07.pdf
From estimation output to document tables: A long way made short
Ben Jann
ETH Zürich
Postestimation processing and formatting of statistical results for input
into document tables are tasks that most of us have to do. However,
processing results by hand can be tedious and is prone to error. There are
therefore many benefits to automating these tasks while at the same time
retaining user flexibility in terms of output format and accessibility.
This talk is concerned with such automation processes, focusing primarily
on tabulating results from estimation commands. In the first part of the
talk, I briefly review existing approaches and user-written programs and
then provide an extensive tutorial on the
estout
package. Compiling estimation tables for display on screen and for
inclusion into, e.g., LaTeX, Word, or Excel documents, is illustrated using
a range of examples, from relatively basic applications to complex ones. In
the second part of the talk, I draw on material from J. Scott Long’s
presentation last year and introduce some new utilities to tabulate results
from Long and Freese’s SPost commands for categorical outcomes
models.
Additional information
jann_nasug07_estout.pdf
Power analysis and sample-size determination in survival
models with the new stpower command
Yulia Marchenko
StataCorp
Power analysis and sample-size determination are important components of a
study design. In survival analysis, the power is directly related to the
number of events observed in the study. The required sample size is
therefore determined by the observed number of events. Survival data are
commonly analyzed using the log-rank test or the Cox proportional hazards
model. Stata 10’s new
stpower
command provides sample-size and power calculations for survival studies
that use the log-rank test, the Cox proportional hazards model, and the
parametric test comparing exponential hazard rates. It reports the number of
events that must be observed in the study and accommodates unequal subject
allocation between groups, nonuniform subject entry, and exponential losses
to follow-up. This talk will demonstrate power, sample-size, and effect-size
computations for different methods used to analyze survival data and for
designs with recruitment periods and random censoring (administrative and
loss to follow-up). It will also discuss building customized tables and
producing graphs of power curves.
Additional information
marchenko_boston07.pdf
Scientific organizers
Kit Baum, Boston College
[email protected]
Marcello Pagano, Harvard School of Public Health
[email protected]
Logistics organizers
Chris Farrar, StataCorp
Gretchen Farrar, StataCorp