Home  /  Stata Conferences  /  2020 UK

The 26th UK Stata conference was held virtually on 10–11 September 2020.

Proceedings

Session chair:
Session chair: Tim Morris
11:00–11:30 From datasets to metadatasets in Stata Abstract: Metadatasets are Stata datasets in files or in frames that may have one observation per file, per dataset, per variable, or per variable value. Metadatasets can be used to modify a Stata database or to make a Stata database self-documenting, especially if converted to non-Stata formats, such as HTML or even Microsoft Excel. I present
... (Read more)
some community-contributed packages, updated to Stata 16, for creating and using metadatasets. The xdir package creates a resultsset with one observation per file in a folder conforming to a user-specified pattern. The descgen pack inputs an xdir resultsset and generates a new variable indicating whether each file is a Stata dataset, and other new variables containing dataset attributes, such as the dataset label and characteristics, the sort key of variables, and the numbers of observations and variables. The vallabdef package inputs a dataset with one observation per label name per value per value label and generates Stata value labels. The vallabsave package loads and saves value labels from and to label-only datasets and transfers value labels between data frames. The descsave package creates a metadataset with one observation per variable in a dataset and data on variable attributes (including characteristics). The invdesc package modifies the variable attributes of the dataset in the current frame, inputting a descsave resultsset in a second data frame to set the variable attributes and inputting value labels from a dataset in a third data frame. The datasets containing the variable attributes and value labels may be produced as resultssets by Stata packages or produced manually in a spreadsheet using LibreOffice Calc or Microsoft Excel and input into Stata datasets using import delimited or import excel.
(Read less)

Additional information:
UK20_Newson.zip

Roger Newson
Imperial College London
11:30–12:00 Second generation p-values (SGPV) for common estimation commands in Stata Abstract: This presentation introduces commands to calculate second generation p-values (SGPV) for common estimation commands in Stata. The sgpv command and its companions allow the easy calculation of SGPVs and the
... (Read more)
associated diagnostics as well as the plotting of SGPVs against the standard p-values. SGPVs were introduced by Blume et al. (2018, 2019) as an alternative and upgrade of the standard p-values.

References:

https://doi.org/10.1371/journal.pone.0188299

https://doi.org/10.1080/00031305.2018.1537893

(Read less)

Additional information:
UK20_Bormann.pdf

Sven-Kristjan Bormann
University of Tartu
12:00–12:30 xthst: Testing for slope homogeneity in Stata Abstract: This presentation introduces a new community-contributed Stata command, xthst, to test for slope homogeneity in panels with many observations over cross-sectional units and time periods. The command implements such a test, the delta test derived by Pesaran and Yamagata (2008). Under the null, slope coefficients are heterogeneous
... (Read more)
across cross-sectional units. xthst also includes two extensions. The first is a heteroskedasticity auto-correlation robust test along the lines of Blomquist and Westerlund (2013). The second extension is a cross-sectional-dependence robust version. The presentation will cover the econometric theory of the tests, explain xthst and its options, and give empirical examples. Monte Carlo evidence will be shown to prove that the test behaves as expected.

References:

Blomquist, J., and J. Westerlund. 2013. Testing slope homogeneity in large panels with serial correlation. Economics Letters 121: 374–378.

Pesaran, M. H., and T. Yamagata. 2008. Testing slope homogeneity in large panels. Journal of Econometrics 142: 50–93.

Contributor:
Tore Bersvendsen
Kristiansand Kommune
(Read less)

Additional information:
UK20_Ditzen.pdf

Jan Ditzen
Heriot-Watt University
1:00–1:30 Unit-root tests for explosive behavior Abstract: We present the new Stata command radf to compute several tests for explosive behavior in time series. The command implements the right-tail augmented Dickey and Fuller (1979) (ADF) unit-root test and its further developments based on supremum statistics derived from ADF-type regressions estimated using rolling windows, recursive
... (Read more)
windows (Phillips, Wu, and Yu 2011), and recursive flexible windows (Phillips, Shi, and Yu 2015). The command allows for the number of lags of the dependent variable in the test regression to be either specified by the user or endogenously determined using a data-dependent procedure. The use of the command is illustrated with an empirical example.

Contributor:
Christopher F. Baum
Boston College
(Read less)

Additional information:
UK20_Otero.pdf

Jesús Otero
Universidad del Rosario
Session chair:
Session chair: Nick Cox
1:30–2:15 A gmm recipe to get standard errors for control function and other two-step estimators Abstract: It is common to use residuals from the first step of estimation as regressors in the second step. We are interested in the coefficients and effects of the second step. An example of these types of estimators is control function approach methods. Getting standard errors in these cases is challenging, and thus bootstrap methods are commonly used. I will
... (Read more)
illustrate how to use Stata's gmm command to obtain correct standard errors, using cross-sectional and panel-data examples. The GMM estimates give correct coverage and reduce computation time relative to commonly used bootstrap methods.
(Read less)

Additional information:
UK20_Pinzón.pdf

Enrique Pinzón
StataCorp
Session chair:
Session chair: Rachael Hughes
2:30–3:00 randregret: A command for fitting random regret minimization models Abstract: In this presentation, we describe the randregret command, which implements a variety of random regret minimization (RRM) models. The command allows the user to apply the classic RRM model (Chorus 2010), the generalized RRM model (Chorus 2014), and also the mu-RRM and pure RRM models (Van Cranenburgh, Guevara, and Chorus 2015).
... (Read more)
We illustrate the usage of the randregret command using stated choice data on route preferences. The command offers robust and cluster standard-error correction using analytical expressions of the score functions. It also offers likelihood ratio tests, which can be used to assess the relevance of a given model specification. Finally, predicted probabilities from each model can be easily computed using the randregretpred postestimation command.

References:

Chorus, C. G. 2010. A new model of random regret minimization. European Journal of Transport and Infrastructure Research 10(2).

Chorus, C. G. 2014. A generalized random regret minimization model. Transportation Research Part B: Methodological 68: 224–238.

Van Cranenburgh, S., C. A. Guevara, and C. G. Chorus. 2015. New insights on random regret minimization models. Transportation Research Part A: Policy and Practice 74: 91–109.

Contributors:
Michel Meulders
Martina Vandebroek
KU Leuven
(Read less)

Additional information:
UK20_Vargas.pdf

Álvaro A. Gutiérrez Vargas
KU Leuven
3:00–3:30 Agent-based models in Mata: Modeling aggregate processes, such as the spread of a disease Abstract: An agent-based model (ABM) is a simulation in which agents that each follow simple rules interact with one another and thus produce an often surprising outcome at the macro level. The purpose of an ABM is to explore mechanisms through which actions of the individual agents add up to a macro outcome by varying the rules that agents have
... (Read more)
to follow or varying with whom the agent can interact (for example, varying the network). These models have many applications, such as the study of segregation of neighborhoods or the adoption of new technologies. However, the application that is currently most topical is the spread of a disease. In this presentation, I will introduce how to implement an ABM in Mata by going through the simple models I (a sociologist, not an epidemiologist) used to make sense of what is happening with the COVID-19 pandemic.
(Read less)

Additional information:
UK20_Buis.zip

Maarten Buis
University of Konstanz
3:30–4:30 New Bayesian features: Multiple chains, predictions, and more Abstract: Stata 16 expanded the Bayesian suite of commands with many new features, including multiple chains and Bayesian predictions. This presentation will showcase these features. I will demonstrate how to run multiple chains,
... (Read more)
including in parallel, and how to use them to check for MCMC convergence. I will show how to compute Bayesian predictions and how to use them for model diagnostic checks. And more.
(Read less)

Additional information:
UK20_Marchenko.pdf

Yulia Marchenko
StataCorp
Session chair:
Session chair: Rachael Hughes
11:00–11:30 Nonparametric estimation in multistate survival models: An update to msaj Abstract: Background: Multistate survival models are a useful tool when disease pathways are complex and there are multiple events of interest. The multistate package in Stata can provide a range of predictions from parametric multistate models via the predictms command. However, nonparametric estimates produced by the accompanying msaj command
... (Read more)
were limited. The aim of this work was to update msaj to provide a comprehensive set of nonparametric estimates.

Methods: Two useful metrics in a multistate model are transition probabilities and expected length of stay. Transition probabilities from a Markov model can be estimated nonparametrically using the empirical Aalen—Johansen estimator (analogous to the Kaplan—Meier estimator in standard survival). Expected length of stay can be estimated by integrating the transition probabilities. In this setting, this involves a summation of rectangles, because the Aalen—Johansen estimator is a step function.

Updates to msaj: Previously, only transition probabilities from state 1 at time 0 could be obtained using msaj, along with corresponding confidence intervals. Following the update, the starting state, entry time, and exit time can be specified. Estimates can now also be produced for bidirectional models, and expected length of stay can be obtained.

Illustrative example: A nonparametric analysis was performed on hospital epidemiology data, which demonstrated how msaj can be implemented. Three parametric multistate models were also fit to illustrate how nonparametric estimates can be used as a reference to informally compare models. Transition probabilities and expected length of stay were estimated from state 1 at time 0 and from state 2 at time 3 (relevant metrics for this dataset).

Conclusion: The updated msaj provides a comprehensive set of nonparametric predictions, allowing for analyses with no assumptions made on transition rates and providing a reference for parametric models. Extensions could include fixed horizon predictions and confidence intervals for expected length of stay.

Contributors:
Paul C. Lambert
Michael J. Crowther
Karolinska Institutet
(Read less)

Additional information:
UK20_Hill.pptx

Micki Hill
University of Leicester
11:30–12:00 kinkyreg: Instrument-free inference for linear regression models with endogenous regressors Abstract: In models with endogenous regressors, a standard regression approach is to exploit just- or overidentifying orthogonality conditions by using instrumental variables. In just-identified models, the identifying orthogonality assumptions cannot be tested without the imposition of other nontestable assumptions. While formal testing of
... (Read more)
overidentifying restrictions is possible, its interpretation still hinges on the validity of an initial set of untestable just-identifying orthogonality conditions. We present the kinkyreg Stata program for kinky least-squares (KLS) inference, which adopts an alternative approach to identification. By exploiting non-orthogonality conditions in the form of bounds on the admissible degree of endogeneity, feasible test procedures can be constructed that do not require instrumental variables. The KLS confidence bands can be more informative than confidence intervals obtained from instrumental variable estimation, in particular when the instruments are weak. Moreover, the approach facilitates a sensitivity analysis for the standard instrumental variable inference. In particular, it allows assessment of the validity of previously untestable just-identification exclusion restrictions. Further KLS-based tests include heteroskedasticity, function form, and serial correlation tests.

Contributor:
Jan F. Kiviet
University of Amsterdam
(Read less)

Additional information:
UK20_Kripfganz.pdf

Sebastian Kripfganz
University of Exeter Business School
12:00–12:30 Sample-size calculation for an ordered categorical outcome Abstract: We describe a new command, artcat, to calculate sample size or power for a clinical trial or similar experiment with an ordered categorical outcome, where analysis is by the proportional odds model. The command implements an existing and a new method. The existing method is that of Whitehead (1993). The new method is based on creating a
... (Read more)
weighted dataset containing the expected counts per person and analyzing it with ologit. We show how the weighted dataset can be used to compute variances under the null and alternative hypotheses and hence to produce a more accurate calculation. We also show that the new method can be extended to handle noninferiority trials and to settings where the proportional odds model does not fit the expected data.

We illustrate the command and explore the value of an ordered categorical outcome over a binary outcome in various settings. We show by simulation that the methods perform well and are very similar when treatment effects are moderate. With very large treatment effects, the new method is a little more accurate than Whitehead's method. The new method also applies to the case of a binary outcome, and we show that it compares favorably with the official power and the community-contributed command artbin.

Reference:

Whitehead, J. 1993. Sample size calculations for ordered categorical data. Statistics in Medicine 12: 2257–2271.

Contributors:
Ella Marley-Zagar
Tim P. Morris
Mahesh K. B. Parmar
Abdel G. Babiker
MRC Clinical Trials Unit at UCL
(Read less)

Additional information:
UK20_White.pptx

Ian R. White
MRC Clinical Trials Unit at UCL
Session chair:
Session chair: Tim Morris
1:00–1:30 Fancy graphics: Force-directed diagrams Abstract: This short presentation discusses and illustrates implementation of force-directed diagrams in Stata. Force-directed layouts use simple stochastic simulation algorithms to position nodes and vertices in a two-way plot. They can
... (Read more)
be used in a range of data visualization applications, such as network visualization, or representation of clustering and relationships among observations in the data. I will discuss implementation, examine some examples, and discuss pros and cons of using Stata for producing such displays.
(Read less)

Additional information:
UK20_Van_Kerm.pdf

Philippe van Kerm
University of Luxembourg and Luxembourg Institute of Socio-Economic Research
1:30–2:00 f_able: Estimation of marginal effects for models with alternative variable transformations Abstract: margins is a powerful postestimation command that allows the estimation of marginal effects for official and community-contributed commands, with well-defined predicted outcomes (see predict). While the use of
... (Read more)
factor-variable notation allows us to easily estimate marginal effects when interactions and polynomials are used, estimation of marginal effects when other types of transformations such as splines, logs, or fractional polynomials are used remains a challenge. This presentation describes how margins capabilities can be extended to analyze other variable transformations using the command f_able.
(Read less)

Additional information:
UK20_Rios-Avila.pdf

Fernando Rios-Avila
Bard College
2:00–2:30 Socioeconomic factors influencing the spatial spread of COVID-19 in the United States Abstract: As the COVID-19 pandemic has progressed in the U.S., "hotspots" have been shifting geographically over time to suburban and rural counties, showing a high prevalence of the disease. We analyze daily U.S. county-level
... (Read more)
variations in COVID-19-confirmed case counts to evaluate the spatial dependence between neighboring counties. We find strong evidence of county-level socioeconomic factors influencing the spatial spread. We show the potential of combining spatial econometric techniques and socioeconomic factors in assessing the spatial effects of COVID-19 among neighboring counties.

Contributor:
Miguel Henry
Greylock McKinnon Associates
(Read less)

Additional information:
UK20_Baum.pdf

Christopher F. Baum
Boston College, DIW Berlin & CESIS
Session chair:
Session chair: Nick Cox
3:00–4:00 Correlated random-effects methods for panel-data models with heterogeneous time effects Abstract: I propose a correlated random-effects (CRE) approach to linear panel-data models with heterogeneous time effects. The setting is microeconometric, where the number of time periods is small relative to the number of cross-sectional units. Given T time periods, T different sources of heterogeneity are allowed, and each is allowed to be correlated with
... (Read more)
time-constant features of the covariates. In the leading case, the CRE approach extends the Mundlak regression by allowing each heterogeneity term to be correlated with the time averages of the time-varying covariates. Additional flexibility is allowed by extracting unit-specific trends from the covariates and using those in the CRE approach. Estimation requires (many) linear regressions. For small T, the approach is an alternative to factor models, which require nonlinear estimation in addition to pretesting to determine the number of factors. I show straightforward implementation of the new estimators in Stata.
(Read less)

Additional information:
UK20_Wooldridge.pdf

Jeff Wooldridge
Michigan State University
4:00–4:30
Open panel discussion with Stata developers
StataCorp

Scientific committee

Nicholas J. Cox
Durham University
Rachael Hughes
University of Bristol
Tim Morris
MRC Clinical Trials Unit at UCL
Patrick Royston
MRC Clinical Trials Unit at UCL

Logistics organizer

The logistics organizer for the 2020 UK Stata Conference is Timberlake Consultants, the Stata distributors to the United Kingdom and Ireland, France, Spain, Portugal, the Middle East and North Africa, Brazil, and Poland.

View the proceedings of previous Stata Conferences and Users Group meetings.