Last updated: 7 August 2015
2015 Stata Conference Columbus
30–31 July 2015
Hyatt Regency Columbus
350 North High Street
Columbus, Ohio
(614) 463-1234
Proceedings
midasinla: midas goes Bayesian via R-INLA
Ben Adarkwa Dwamena
University of Michigan Medical School
Integrated nested Laplace approximation (INLA) has been
developed as a computationally fast, deterministic alternative to Markov
chain Monte Carlo (MCMC)-based Bayesian modeling. An R interface to the
C-based INLA (R-INLA) program is available with extensive and diverse
applications, including diagnostic test accuracy meta-analysis. In this
presentation, I discuss the INLA methodology briefly and, in more
detail, an illustrated application of the user-written ado-file
midasinla, a deterministic Bayesian version of
midas
(a comprehensive and medically popular module for diagnostic test accuracy
meta-analysis). This Stata routine provides R-INLA estimation of the
bivariate random-effects model for diagnostic accuracy meta-analysis
with data pre- and post-processing within Stata. A dataset of studies
evaluating auxillary staging performance of positron emission tomography
in breast cancer patients is provided for illustration of the omnibus
capabilities of
midasinla.
Additional information
columbus15_dwamena.pdf
Estimating treatment effects for ordered outcomes using maximum simulated likelihood
Christian Gregory
Economic Research Service, USDA
In this presentation, I introduce four new modules:
treatoprobit,
switchoprobit,
treatoprobitsim, and
switchoprobitsim. Each of these routines estimates a model in which a
binary endogenous variable affects an ordered outcome.
treatoprobit
and
switchoprobit estimate treatment and outcome under the assumption
that the error terms in the selection and outcome process are
distributed as bivariate normal.
treatoprobitsim and
switchoprobitsim allow researchers to relax this assumption by
estimating models in which a latent factor with a potentially nonnormal
distribution accounts for the correlation between treatment and outcome.
treatoprobit and
treatoprobitsim operate under the assumption of a
single outcome regime for treated and untreated groups;
switchoprobit
and
switchoprobitsim work under (and test) the assumption that outcome
processes for treated and untreated ought to be handled as distinct. The
presentation will introduce the modules, show Monte Carlo evidence
regarding their performance, and offer an example of their use. This
presentation is based on an article that is currently under review at
the
Stata Journal.
Additional information
columbus15_gregory.pdf
Linear dynamic panel-data estimation using maximum likelihood and structural equation modeling
Richard Williams
Department of Sociology, University of Notre Dame
Paul Allison
Department of Sociology, University of Pennsylvania
Enrique Moral Benito
Banco de Espana Madrid
Panel data make it possible both to control for unobserved
confounders and to include lagged, endogenous regressors. Trying to do
both at the same time, however, leads to serious estimation
difficulties. In the econometric literature, these problems have been
solved by using lagged instrumental variables together with the
generalized method of moments (GMM). In Stata, commands such as
xtabond
and
xtdpdsys have been used for these models. Here we show that the same
problems can be addressed via maximum likelihood estimation implemented
with Stata's structural equation modeling (sem) command. We show that
the ML (sem) method is substantially more efficient than the GMM method
when the normality assumption is met and suffers less from finite sample
biases. We introduce a command named
xtdpdml with syntax similar to
other Stata commands for linear dynamic panel-data estimation.
xtdpdml
simplifies the SEM model-specification process, makes it possible to
test and relax many of the constraints that are typically embodied in
dynamic panel models, and takes advantage of Stata's ability to use full
information maximum likelihood (FIML) for dealing with missing data.
Additional information
columbus15_rwilliams.pdf
15 years a consultant
Phil Ender
UCLA Statistical Consulting Group (Ret)
I present the origins and evolution of the UCLA Statistical
Consulting Group. The presentation will cover the history of the UCLA
Statistical Consulting Group as well as one approach to the practice of
statistical consulting in an academic environment. UCLA Statistical
Consulting provides services to faculty, graduate students, and campus
researchers. Additionally, the group maintains a website popular not
only with Stata users but also with users of other statistical packages.
Additional information
columbus15_ender.pdf
Robust inference in regression-discontinuity designs
Matias Cattaneo
University of Michigan
Sebastian Calonico
University of Miami
Rocio Titiunik
University of Michigan
In this presentation, I will review main methodological
results from the regression-discontinuity (RD) design literature and
illustrate them using the Stata
rdrobust package provided by the
authors. More information about the Stata package and background
methodological and theoretical papers may be obtained here:
https://sites.google.com/site/rdpackages/rdrobust.
If time permits, I will also discuss two ongoing research projects on
RD methods and their corresponding Stata implementations. The first
project focuses on RD inference under a local randomization assumption,
while the second project discusses a new manipulation test for RD
designs.
Additional information
columbus15_cattaneo.pdf
Estimation in panel data with individual effects and AR(p) remainder disturbances
Long Liu
Department of Economics, The University of Texas at San Antonio
In this presentation, I introduce a new user-written Stata
command,
xtregarp. This command considers the problem of estimation in a
panel-data model with both individual effects and AR(p) remainder
disturbances. It utilizes a simple exact transformation for the AR(p)
time-series process derived by Baltagi and Li (1994) and obtains the
generalized least-squares estimator for this panel model as a
least-squares regression. This command allows the individual effects to
be either random effects or fixed effects. The performance of this
estimator is illustrated using an empirical example.
Additional information
columbus15_liu.pdf
Item response theory models in Stata
Rebecca Pope
Health Econometrician, StataCorp
Stata 14 provides several new commands for fitting item
response theory (IRT) models. IRT has a long history in test development
and psychometrics and is now being adopted more broadly in fields such
as health services research. In this presentation, I will provide an
overview of IRT, demonstrate fitting models with binary and categorical items,
and discuss postestimation tools such as plotting characteristic curves and
information functions.
Additional information
columbus15_pope.pdf
Meta-analysis on the effects of interviewer supportiveness on the accuracy of children's reports
Christine Wells
Statistical Consulting Group, UCLA
Karen Saywitz, PhD
UCLA
Rakel Larson, MA
University of California, Riverside
Sue Hobbs, PhD
University of California, Davis
Increasingly, children are called upon to participate in
decisions that affect their welfare, from providing testimony in court
to providing input to public policies. However, many questions remain
regarding how to elicit accurate, reliable information from children. A
meta-analysis was conducted to investigate the effect of a supportive
interviewer on the accuracy of information provided by children (ages 4
to 12). The interviewers asked both neutral and misleading questions in
both supportive and nonsupportive conditions. Our results suggest that
interviewer supportiveness, when provided in a nonsuggestive manner,
bolsters the reliability of children's reports, and that supportiveness
lowers children's errors on misleading questions. Despite the importance
of this topic, only eight randomized control studies were identified to
be included in the meta-analysis. These studies hail from the psychology
literature and were published over 18 years. These two facts introduced
some interesting challenges in preparing the data for the meta-analysis.
The analyses included the meta-analysis, investigation into possible
nonindependence, a search for outliers, and cumulative meta-analyses.
The current guidelines for publishing a meta-analysis in the
psychological literature, specifically the MARS guidelines, will be
discussed as well as the user-written commands and their options used
to perform these analyses.
Additional information
columbus15_wells.pdf
tetrad: A program for confirmatory tetrad analysis
Shawn Bauldry
University of Alabama at Birmingham
Kenneth Bollen
University of North Carolina at Chapel Hill
Confirmatory tetrad analysis (CTA) is a method of testing and
comparing the fit of structural equation models (SEMs) based on tetrads
(differences in the product of pairs of covariance of observed
variables). CTA has a few benefits over alternative methods of testing
SEM model fit, including (1) some underidentified SEMs are still
testable using their vanishing tetrads, (2) some SEMs are nested in
their vanishing tetrads and can be directly compared while they are not
nested using alternative estimators, and (3) researchers can perform
tests on parts of the model as well as the whole model. We have
developed a Stata command that conducts CTA based on the approach
outlined in Bollen (1990) and Bollen and Ting (1993). The approach
involves 4 steps: (1) identify vanishing tetrads (tetrads that equal 0)
for a given model, (2) compute the asymptotic covariance matrix for the
vanishing tetrads, (3) identify nonredundant vanishing tetrads, and (4)
compute the tetrad test statistic. The Stata command takes as input the
set of observed variables and an implied covariance matrix from a
hypothesized model (or two implied covariance matrices if a nested test)
that can be obtained following the
sem command and then returns the
tetrad test statistic.
Additional information
columbus15_bauldry.pdf
Postestimation parameter recentering and rescaling
Douglas Hemken
Social Science Computing Cooperative, University of Wisconsin–Madison
Recoding data prior to model estimation is a frequent part of
analysis. For linear models, this can be thought of as a change of basis
that is common to the data and the model. Where the change of basis in
the data is linear, the change in the model is also linear. We can
calculate the transformed parameters (and the transformed parameter
variance–covariance matrix) without actually recoding our data. The same
mathematics that is used to design factorial experiments or design
contrasts that include interactions can be extended to include
recentering and rescaling continuous variables in models with
interaction terms. This gives us a general solution to such problems as
calculating standardized coefficients, or converting models expressed in
American units of measure to international units, regardless of whether
the models include interaction terms or whether we have access to the
original data. This is implemented here as a Stata program,
stdParm,
that produces centered or standardized parameters and precision
matrices, postestimation.
Additional information
columbus15_hemken.pdf
Statistical process control charts
Barbara Williams
Virginia Mason Medical Center
Statistical process control (SPC) charts are used to assess
outcomes measured over time, usually with the purpose of detecting
improvement or maintaining a high level of performance. Traditionally
used in industrial engineering for quality control, these methods are
now frequently employed in healthcare and are the standard method of
analysis for quality improvement work. In this presentation, I define
methods to improve on current Stata syntax to generate useful and
reader-friendly SPC charts. I build on existing Stata
cchart (count),
pchart (proportion),
rchart (range), and
xchart (average)
commands to
produce SPC charts with a clear, easy-to-read visual display. This
presentation will explore default and edited
pchart and
xchart
examples
using health services research data, including the syntax for creating
these graphs. Graphic elements include customized axis labels, text,
colors, lines, notes, fonts, and titles. Under this approach, Stata can
replace current SPC chart generators, including macros for Excel and
stand-alone programs.
Additional information
columbus15_bwilliams.pptx
Data workflows with Stata and Python
Stephen Childs
Education Policy Research Initiative, University of Ottawa
Dejan Pavlic
Education Policy Research Initiative, University of Ottawa
Python is a general purpose programming language with a large
library of packages that extend into domains that Stata does not touch.
In this presentation, I will identify the key packages from Python that
will allow it to work with Stata, primarily the pandas framework. Pandas
is a relatively new, but extremely powerful, package for data
preparation and analysis that works well with Stata–including support
for categorical variables. I will discuss some new tools that have been
developed to make it easier to connect Stata to Python. I will also
discuss using Stata with the IPython Notebook, a tool that allows
researchers to combine code and text in an easy-to-access document.
During their work with the Education Policy Research Initiative, the
authors have successfully transitioned much complex data preparation
from Stata to Python while still supporting Stata's powerful analytical
tools. This presentation is ideal for those interested in incorporating
some Python into their workflow or planning a larger transition.
Additional information
columbus15_childs.pdf
Distribution-free estimation of heteroskedastic binary response models in Stata
Jason Blevins
Department of Economics, The Ohio State University
Shakeeb Khan
Duke University
In this presentation, I demonstrate how to implement two
recent semiparametric estimators for binary response models in Stata.
These estimators do not require parametric assumptions on the
distribution of the error term, unlike the logit and probit models, and
they allow for general forms of heteroskedasticity. I begin with a short
introduction to binary response models and the various known identifying
assumptions, including the weak conditional median independence
assumption that the two estimators of interest are based on. Then, I
focus on two recently proposed semiparametric estimators: a sieve
nonlinear least-squares estimator and a local nonlinear least-squares
estimator. I demonstrate how both estimators can be easily implemented
in Stata via simple modifications to the standard probit objective
function, and I give several applied examples and Monte Carlo results.
Finally, I introduce the
dfbr package by Blevins and Khan (2013,
Stata Journal, st0310) for distribution-free estimation of binary response
models. Although the estimators can be implemented by hand using
standard Stata commands, this package provides a standard Stata
interface for the user, automates constructing the modified probit
objective functions, and calculates bootstrap standard errors.
Additional information
columbus15_blevins.pdf
A comparison of modeling scales in flexible parametric models
Noori Akhtar-Danesh
McMaster University
Cox regression and parametric survival models are quite
common in the analysis of survival data. Recently, flexible parametric
models (FPM) have been introduced that are extensions of the parametric
models such as the Weibull (hazard-scale) model, the loglogistic
(odds-scale) model, and the lognormal (probit-scale) model. In this
presentation, I aim to statistically compare these modeling scales. I
used Stata code
stpm2 to compare flexible parametric models based on
these three different scales. I used two subsets of the U.S. National
Cancer Institute's Surveillance, Epidemiology, and End Results (SEER)
dataset for this illustration: one on ovarian cancer diagnosed between
1991 and 2010 and one on colorectal cancer diagnosed in men between 2001
and 2010. The ovarian and colorectal datasets included data from 13,810
and 42,002 patients, respectively. Patients were classified into
different age groups. I present results using graphs to compare
survival curves, trends in one-year and five-year survival rates, and
mortality rates. In general, there were no substantial differences
between the three modeling scales, although the probit-scale showed
better fit based on the Akaike information criterion (AIC) for both
datasets.
Additional information
columbus15_akhtar_danesh.pdf
Estimating Markov-switching regression models in Stata
Ashish Rajbhandari
Senior Econometrician, StataCorp
Many datasets are not well characterized by linear autoregressive
moving-average (ARMA) models. In this presentation, I will describe the
new
mswitch command, which implements Markov-switching regression models,
which characterize many of these datasets well. Markov-switching
regression models allow the time series to switch between unobserved
states according to a Markov process.
mswitch can estimate the
parameters of the Markov-switching dynamic regression (MSDR) model and
Markov-switching autoregressive (MSAR) model. This talk outlines the
models, discusses the relative advantages of MSDR and MSAR models, and
discusses examples of how to intepret
mswitch output and its
postestimation features.
Additional information
columbus15_rajbhandari.pdf
Between and beyond: Irregular series, interpolation, variograms, and smoothing
Nicholas Cox
Department of Geography, Durham University
Time series (and similar one-dimensional series) are more
often irregularly spaced than many methods texts or courses admit. Even
with a plan of regular measurements, gaps can arise for many human or
inhuman reasons, while some series are naturally irregular.
Interpolation of values between known values is a centuries-old need
but one neglected by official Stata, which offers only linear
interpolation and cubic spline interpolation (in Mata). I review
additional user-written commands for interpolation, including those for
cubic, nearest neighbor, and piecewise cubic Hermite methods available
from SSC. Beyond interpolation of irregular series lie the questions of
characterizing the structure of such series and smoothing in various
ways. One useful tool standard in spatial statistics is the variogram,
which relates dissimilarity as squared differences between values to
their separation in time or distance in space. Diggle and others have
shown uses for variograms in time-series and longitudinal data analysis.
I discuss user-written Stata commands for variogram calculation,
plotting and use in relation to exploratory data analysis on the one
hand and smoothing on the other.
Additional information
columbus15_cox.ppt
Public program sensitivity: Using ROC curves to characterize classification efficiency of state Medicaid systems
Lisa Frazier
John Glenn College of Public Affairs, The Ohio State University
Despite being the largest single source of health care
coverage in the U.S., Medicaid fails to capture all eligible citizens.
This is a well-known problem among means-tested programs like Medicaid;
discussions of take-up and churning attend to this failure. Cases of
fraud in programmatic enrollments represent another classification
failure in these systems. Reports on rates of fraud, take-up, and churn
rarely acknowledge that such outcomes are ultimately features of the
same tradeoff function: the sorting of citizens into benefit groups on
the basis of membership to some a priori category. This research
elucidates the implicit tradeoffs being made in the Medicaid
citizen-sorting mechanism by using administrative data to construct ROC
curves for each state Medicaid system before and after the passage of
the Affordable Care Act.
Additional information
columbus15_frazier.pptx
Small-sample inference for linear mixed-effects models
Xiao Yang
Senior Statistician and Software Developer, StataCorp
Researchers are often interested in making inferences about
fixed effects in a linear mixed-effects model. For a large sample, the
null sampling distributions of the test statistics can be approximated
by a normal distribution for a one-hypothesis test and a chi-squared
distribution for a multiple-hypotheses test. For a small sample, these
large-sample approximations may not be appropriate, and t and F
distributions may provide better approximations. In this presentation,
I will describe five denominator-degrees-of-freedom (DDF) methods available
with
mixed in Stata 14, including the Satterthwaite and Kenward–Roger
methods, and I will demonstrate examples of when and how to use these methods.
Additional information
columbus15_yang.pdf
Development of a project-based statistics course for applied biostatistics using Stata
Frank Snyder
Purdue University
Project-based learning is an instructional approach that is
designed to build students' skills and offer real-world activities, such
as defining a research question and using nationally representative data
to find an answer (Dierker et al. 2012). The purpose of this
presentation is to describe an innovative, project-based statistics
course for applied biostatistics using Stata. The semester-long course
is designed as a graduate-level introductory biostatistics course;
however, it could easily be adapted for use in an undergraduate public
health program. The course combines two textbooks (Acock 2014; Bush
2012) and traditional lecture and assessment with computer lab
activities and a research project. The project-based course structure
offers students the opportunity to directly apply course content to
their unique research question, with the intent to increase students'
motivation and interest in statistics. Each student's culminating
experience is a 15-minute presentation or poster that explains his or her
research and results to classmates or an alternative audience. Course
evaluation data demonstrate that students rate the course as excellent,
and students strongly agree the course encourages learning. A course
syllabus, lab activities, Stata do-files, and a description of the
research project and final presentation will be available upon request.
Additional information
columbus15_snyder.pptx
Brewing color schemes in Stata: Making it easier for end users to customize Stata graphs
William Buchanan
Mississippi Department of Education
Although Stata graphs can be created to satisfy customized needs, it can
be time consuming to specify all the unique options required to create
clean customized graphs. Graph schemes provide a method to help
alleviate this difficulty, but customizations to graph schemes are typically
fixed for a single scheme. In this presentation, I will be discussing a
new Stata program,
brewscheme, that allows end users to generate
customized graph schemes using color palettes available from
www.colorbrewer2.org. The program allows users to specify a single color
palette for all graph types, unique color palettes for individual graph
types, or a combination (for example, to specify color palettes and the number of
colors to select from the palette) for scatterplots and to set a default
color palette for the other graph types. Additionally, the schemes
generated by the program also set clean graph defaults (for example, all
white backgrounds and foregrounds, no grid lines, etc.), orient axis
labels horizontally, and remove boxes around legends. The program
brewmeta also allows users to quickly access metadata about
specific palettes (for example, colorblindness, LCD display, print, and
photocopier friendliness).
Additional information
columbus15_buchanan.pdf
Colombian industrial structure behavior and its regions between 1974 and 2005
Luis Fernando Lopez Pineda
Chamber of Commerce of Cartagena
This presentation analyzes Colombian industrial structure
behavior and its regions between 1974 and 2005 to determinate if the
liberal reform at the end of the 20th century caused the industrial
stagnation and its lack of diversification. Evidence proves that the
"slowdown" of industrial growth and the stagnation of
productive transformation were caused by the greatest competition for
national industry since the application of an opening model. The process
was not similar in all regions covered in the study. The more industrial
regions, specifically, Antioquia, Atlantico, Valle, and Bogota, suffered
from deindustrialization. The less industrial regions, like Bolivar
and Cundinamarca, became industrial regions.
Additional information
columbus15_lopez_pineda.pdf
Scientific organizers
Timothy R. Sahr, (coordinator) Ohio Colleges of Medicine Government Resource Center Applied Research
Stanley Lemeshow, (chair of review team) Ohio State University Biostatistics
Marcus Berzofsky, RTI, International Survey Research
Christopher Browning, Ohio State University Sociology
Anand Desai, Ohio State University Public Policy
Christopher Holloman, Ohio State University Statistics
Bo Lu, Ohio State University Biostatistics
Eric Seiber, Ohio State University Health Economics
Logistics organizers
Nathan Bishop, StataCorp
Chris Farrar, StataCorp
Gretchen Farrar, StataCorp