The 26th UK Stata conference was held virtually on 10–11 September 2020.
Proceedings
Session chair: |
Session chair: Tim Morris |
11:00–11:30 | From datasets to metadatasets in Stata
Abstract:
Metadatasets are Stata datasets in files or in frames that may have
one observation per file, per dataset, per variable, or per variable
value. Metadatasets can be used to modify a Stata database or to make a
Stata database self-documenting, especially if converted to non-Stata
formats, such as HTML or even Microsoft Excel. I present
some
community-contributed packages, updated to Stata 16, for creating and
using metadatasets. The xdir package creates a resultsset with one
observation per file in a folder conforming to a user-specified pattern.
The descgen pack inputs an xdir resultsset and generates a new variable
indicating whether each file is a Stata dataset, and other new variables
containing dataset attributes, such as the dataset label and
characteristics, the sort key of variables, and the numbers of
observations and variables. The vallabdef package inputs a dataset with
one observation per label name per value per value label and generates
Stata value labels. The vallabsave package loads and saves value labels
from and to label-only datasets and transfers value labels between data
frames. The descsave package creates a metadataset with one observation
per variable in a dataset and data on variable attributes (including
characteristics). The invdesc package modifies the variable attributes
of the dataset in the current frame, inputting a descsave resultsset in
a second data frame to set the variable attributes and inputting value
labels from a dataset in a third data frame. The datasets containing the
variable attributes and value labels may be produced as resultssets by
Stata packages or produced manually in a spreadsheet using LibreOffice
Calc or Microsoft Excel and input into Stata datasets using
import delimited or import excel.
Additional information: Roger Newson
Imperial College London
|
11:30–12:00 | Second generation p-values (SGPV) for common estimation commands in Stata
Abstract:
This presentation introduces commands to calculate second generation
p-values (SGPV) for common estimation commands in Stata. The sgpv
command and its companions allow the easy calculation of SGPVs and the
associated diagnostics as well as the plotting of SGPVs against the
standard p-values. SGPVs were introduced by Blume et al. (2018, 2019) as
an alternative and upgrade of the standard p-values.
References:
Additional information: Sven-Kristjan Bormann
University of Tartu
|
12:00–12:30 | xthst: Testing for slope homogeneity in Stata
Abstract:
This presentation introduces a new community-contributed Stata command,
xthst, to test for slope homogeneity in panels with many
observations over cross-sectional units and time periods. The
command implements such a test, the delta test derived by Pesaran and
Yamagata (2008). Under the null, slope coefficients are heterogeneous
across cross-sectional units. xthst also includes two extensions.
The first is a heteroskedasticity auto-correlation robust test along the
lines of Blomquist and Westerlund (2013). The second extension is a
cross-sectional-dependence robust version. The presentation will cover
the econometric theory of the tests, explain xthst and its
options, and give empirical examples. Monte Carlo evidence will be shown
to prove that the test behaves as expected.
References: Blomquist, J., and J. Westerlund. 2013. Testing slope homogeneity in large panels with serial correlation. Economics Letters 121: 374–378. Pesaran, M. H., and T. Yamagata. 2008. Testing slope homogeneity in large panels. Journal of Econometrics 142: 50–93.
Contributor:
Tore Bersvendsen
Kristiansand Kommune
Additional information: Jan Ditzen
Heriot-Watt University
|
1:00–1:30 | Unit-root tests for explosive behavior
Abstract:
We present the new Stata command radf to compute several tests
for explosive behavior in time series. The command implements the
right-tail augmented Dickey and Fuller (1979) (ADF) unit-root test and
its further developments based on supremum statistics derived from
ADF-type regressions estimated using rolling windows, recursive
windows
(Phillips, Wu, and Yu 2011), and recursive flexible windows (Phillips,
Shi, and Yu 2015). The command allows for the number of lags of the
dependent variable in the test regression to be either specified by the
user or endogenously determined using a data-dependent procedure. The
use of the command is illustrated with an empirical example.
Contributor:
Christopher F. Baum
Boston College
Additional information: Jesús Otero
Universidad del Rosario
|
Session chair: |
Session chair: Nick Cox |
1:30–2:15 | A gmm recipe to get standard errors for control function and other two-step estimators
Abstract:
It is common to use residuals from the first step of estimation as
regressors in the second step. We are interested in the coefficients and
effects of the second step. An example of these types of estimators is
control function approach methods. Getting standard errors in these
cases is challenging, and thus bootstrap methods are commonly used. I
will
illustrate how to use Stata's gmm command to obtain correct
standard errors, using cross-sectional and panel-data examples. The GMM
estimates give correct coverage and reduce computation time relative to
commonly used bootstrap methods.
Additional information: Enrique Pinzón
StataCorp
|
Session chair: |
Session chair: Rachael Hughes |
2:30–3:00 | randregret: A command for fitting random regret minimization models
Abstract:
In this presentation, we describe the randregret command, which
implements a variety of random regret minimization (RRM) models. The
command allows the user to apply the classic RRM model (Chorus 2010),
the generalized RRM model (Chorus 2014), and also the mu-RRM and pure
RRM models (Van Cranenburgh, Guevara, and Chorus 2015).
We illustrate
the usage of the randregret command using stated choice data on
route preferences. The command offers robust and cluster standard-error
correction using analytical expressions of the score functions. It also
offers likelihood ratio tests, which can be used to assess the relevance
of a given model specification. Finally, predicted probabilities from
each model can be easily computed using the randregretpred
postestimation command.
References: Chorus, C. G. 2010. A new model of random regret minimization. European Journal of Transport and Infrastructure Research 10(2). Chorus, C. G. 2014. A generalized random regret minimization model. Transportation Research Part B: Methodological 68: 224–238. Van Cranenburgh, S., C. A. Guevara, and C. G. Chorus. 2015. New insights on random regret minimization models. Transportation Research Part A: Policy and Practice 74: 91–109.
Contributors:
Michel Meulders
Martina Vandebroek
KU Leuven
Additional information: Álvaro A. Gutiérrez Vargas
KU Leuven
|
3:00–3:30 | Agent-based models in Mata: Modeling aggregate processes, such as the spread of a disease
Abstract:
An agent-based model (ABM) is a simulation in which agents that each
follow simple rules interact with one another and thus produce an often
surprising outcome at the macro level. The purpose of an ABM is to
explore mechanisms through which actions of the individual agents add up
to a macro outcome by varying the rules that agents have
to follow or varying with whom the agent can interact (for example,
varying the network).
These models have many applications, such as the study of segregation of
neighborhoods or the adoption of new technologies. However, the
application that is currently most topical is the spread of a disease.
In this presentation, I will introduce how to implement an ABM in Mata
by going through the simple models I (a sociologist, not an
epidemiologist) used to make sense of what is happening with the
COVID-19 pandemic.
Additional information: Maarten Buis
University of Konstanz
|
3:30–4:30 | New Bayesian features: Multiple chains, predictions, and more
Abstract:
Stata 16 expanded the Bayesian suite of commands with many new features,
including multiple chains and Bayesian predictions. This presentation
will showcase these features. I will demonstrate how to run multiple
chains,
including in parallel, and how to use them to check for MCMC
convergence. I will show how to compute Bayesian predictions and how to
use them for model diagnostic checks. And more.
Additional information: Yulia Marchenko
StataCorp
|
Session chair: |
Session chair: Rachael Hughes |
11:00–11:30 | Nonparametric estimation in multistate survival models: An update to msaj
Abstract:
Background: Multistate survival models are a useful tool when disease
pathways are complex and there are multiple events of interest. The
multistate package in Stata can provide a range of predictions
from parametric multistate models via the predictms command.
However, nonparametric estimates produced by the accompanying
msaj command
were limited. The aim of this work was to update
msaj to provide a comprehensive set of nonparametric estimates.
Methods: Two useful metrics in a multistate model are transition probabilities and expected length of stay. Transition probabilities from a Markov model can be estimated nonparametrically using the empirical Aalen—Johansen estimator (analogous to the Kaplan—Meier estimator in standard survival). Expected length of stay can be estimated by integrating the transition probabilities. In this setting, this involves a summation of rectangles, because the Aalen—Johansen estimator is a step function. Updates to msaj: Previously, only transition probabilities from state 1 at time 0 could be obtained using msaj, along with corresponding confidence intervals. Following the update, the starting state, entry time, and exit time can be specified. Estimates can now also be produced for bidirectional models, and expected length of stay can be obtained. Illustrative example: A nonparametric analysis was performed on hospital epidemiology data, which demonstrated how msaj can be implemented. Three parametric multistate models were also fit to illustrate how nonparametric estimates can be used as a reference to informally compare models. Transition probabilities and expected length of stay were estimated from state 1 at time 0 and from state 2 at time 3 (relevant metrics for this dataset). Conclusion: The updated msaj provides a comprehensive set of nonparametric predictions, allowing for analyses with no assumptions made on transition rates and providing a reference for parametric models. Extensions could include fixed horizon predictions and confidence intervals for expected length of stay.
Contributors:
Paul C. Lambert
Michael J. Crowther
Karolinska Institutet
Additional information: Micki Hill
University of Leicester
|
11:30–12:00 | kinkyreg: Instrument-free inference for linear regression models with endogenous regressors
Abstract:
In models with endogenous regressors, a standard regression approach is
to exploit just- or overidentifying orthogonality conditions by using
instrumental variables. In just-identified models, the identifying
orthogonality assumptions cannot be tested without the imposition of
other nontestable assumptions. While formal testing of
overidentifying restrictions is possible, its interpretation still hinges on the
validity of an initial set of untestable just-identifying orthogonality
conditions. We present the kinkyreg Stata program for kinky
least-squares (KLS) inference, which adopts an alternative approach to
identification. By exploiting non-orthogonality conditions in the form
of bounds on the admissible degree of endogeneity, feasible test
procedures can be constructed that do not require instrumental
variables. The KLS confidence bands can be more informative than
confidence intervals obtained from instrumental variable estimation, in
particular when the instruments are weak. Moreover, the approach
facilitates a sensitivity analysis for the standard instrumental
variable inference. In particular, it allows assessment of the validity
of previously untestable just-identification exclusion restrictions.
Further KLS-based tests include heteroskedasticity, function form, and
serial correlation tests.
Contributor:
Jan F. Kiviet
University of Amsterdam
Additional information: Sebastian Kripfganz
University of Exeter Business School
|
12:00–12:30 | Sample-size calculation for an ordered categorical outcome
Abstract:
We describe a new command, artcat, to calculate sample size or
power for a clinical trial or similar experiment with an ordered
categorical outcome, where analysis is by the proportional odds model.
The command implements an existing and a new method. The existing method
is that of Whitehead (1993). The new method is based on creating a
weighted dataset containing the expected counts per person and
analyzing it with ologit. We show how the weighted dataset can
be used to compute variances under the null and alternative hypotheses
and hence to produce a more accurate calculation. We also show that the
new method can be extended to handle noninferiority trials and to
settings where the proportional odds model does not fit the expected
data.
We illustrate the command and explore the value of an ordered
categorical outcome over a binary outcome in various settings. We show
by simulation that the methods perform well and are very similar when
treatment effects are moderate. With very large treatment effects, the
new method is a little more accurate than Whitehead's method. The new
method also applies to the case of a binary outcome, and we show that it
compares favorably with the official power and the
community-contributed command artbin.
Whitehead, J. 1993. Sample size calculations for ordered categorical data. Statistics in Medicine 12: 2257–2271.
Contributors:
Ella Marley-Zagar
Tim P. Morris
Mahesh K. B. Parmar
Abdel G. Babiker
MRC Clinical Trials Unit at UCL
Additional information: Ian R. White
MRC Clinical Trials Unit at UCL
|
Session chair: |
Session chair: Tim Morris |
1:00–1:30 | Fancy graphics: Force-directed diagrams
Abstract:
This short presentation discusses and illustrates implementation of
force-directed diagrams in Stata. Force-directed layouts use simple
stochastic simulation algorithms to position nodes and vertices in a
two-way plot. They can
be used in a range of data visualization
applications, such as network visualization, or representation of
clustering and relationships among observations in the data. I will
discuss implementation, examine some examples, and discuss pros and cons
of using Stata for producing such displays.
Additional information: Philippe van Kerm
University of Luxembourg and Luxembourg Institute of Socio-Economic Research
|
1:30–2:00 | f_able: Estimation of marginal effects for models with alternative variable transformations
Abstract:
margins is a powerful postestimation command that allows the
estimation of marginal effects for official and community-contributed
commands, with well-defined predicted outcomes (see predict).
While the use of
factor-variable notation allows us to easily estimate
marginal effects when interactions and polynomials are used, estimation
of marginal effects when other types of transformations such as splines,
logs, or fractional polynomials are used remains a
challenge. This presentation describes how margins capabilities
can be extended to analyze other variable transformations using the
command f_able.
Additional information: Fernando Rios-Avila
Bard College
|
2:00–2:30 | Socioeconomic factors influencing the spatial spread of COVID-19 in the United States
Abstract:
As the COVID-19 pandemic has progressed in the U.S., "hotspots" have
been shifting geographically over time to suburban and rural counties,
showing a high prevalence of the disease. We analyze daily U.S.
county-level
variations in COVID-19-confirmed case counts to evaluate
the spatial dependence between neighboring counties. We find strong
evidence of county-level socioeconomic factors influencing the spatial
spread. We show the potential of combining spatial econometric
techniques and socioeconomic factors in assessing the spatial effects of
COVID-19 among neighboring counties.
Contributor:
Miguel Henry
Greylock McKinnon Associates
Additional information: Christopher F. Baum
Boston College, DIW Berlin & CESIS
|
Session chair: |
Session chair: Nick Cox |
3:00–4:00 | Correlated random-effects methods for panel-data models with heterogeneous time effects
Abstract:
I propose a correlated random-effects (CRE) approach to linear
panel-data models with heterogeneous time effects. The setting is
microeconometric, where the number of time periods is small relative to
the number of cross-sectional units. Given T time periods, T different
sources of heterogeneity are allowed, and each is allowed to be
correlated with
time-constant features of the covariates. In the leading
case, the CRE approach extends the Mundlak regression by allowing each
heterogeneity term to be correlated with the time averages of the
time-varying covariates. Additional flexibility is allowed by extracting
unit-specific trends from the covariates and using those in the CRE
approach. Estimation requires (many) linear regressions. For small T,
the approach is an alternative to factor models, which require nonlinear
estimation in addition to pretesting to determine the number of
factors. I show straightforward implementation of the new estimators in
Stata.
Additional information: Jeff Wooldridge
Michigan State University
|
4:00–4:30 |
Open panel discussion with Stata developers
StataCorp
|
Scientific committee
Nicholas J. Cox Durham University |
Rachael Hughes University of Bristol |
Tim Morris MRC Clinical Trials Unit at UCL |
Patrick Royston MRC Clinical Trials Unit at UCL |
Logistics organizer
The logistics organizer for the 2020 UK Stata Conference is Timberlake Consultants, the Stata distributors to the United Kingdom and Ireland, France, Spain, Portugal, the Middle East and North Africa, Brazil, and Poland.
View the proceedings of previous Stata Conferences and Users Group meetings.