10:00–10:20 | Customized Markdown and .docx tables using listtab and docxtab
Abstract:
Statisticians make their living producing tables (and plots).
I present an update of a general family of methods for making
customized tables called the DCRIL path (decode, characterize,
reshape, insert, list), with customized table cells (using the
sdecode package), customized column attributes (using the
chardef package), customized column labels (using the
xrewide package), and/or customized inserted gap-row
labels (using the insingap package), and listing these
tables to automatically generated documents. This demonstration
uses the listtab package to list Markdown tables for
browser-ready HTML documents, which Stata users like to
generate, and the docxtab package to list .docx tables
for printer-ready .docx documents, which our superiors like us
to generate.
Additional information:
Roger B. Newson
King's College London
|
10:20–10:40 | Multiply imputing informatively censored time-to-event data
Abstract:
Time-to-event data, such as overall survival in a cancer
clinical trial, are commonly right-censored, and this censoring
is commonly assumed to be noninformative.
While noninformative censoring is plausible when censoring is
due to end of study, it is less plausible when censoring is due
to loss to follow-up. Sensitivity analyses for departures from
the noninformative censoring assumption can be performed using
multiple imputation under the Cox model. These have been
implemented in R but are not commonly used. We propose a new
implementation in Stata.
Our existing stsurvimpute command (on SSC) imputes right-censored data under noninformative censoring, using a flexible parametric survival model fit by stpm2. We extend this to allow a sensitivity parameter gamma, representing the log of the hazard ratio in censored individuals versus comparable uncensored individuals (the informative censoring hazard ratio, ICHR). The sensitivity parameter can vary between individuals, and imputed data can be recensored at the end-of-study time. Because the mi suite does not allow imputed variables to be stset, we create an imputed data set in ice format and analyze it using mim. In practice, sensitivity analysis computes the treatment effect for a range of scientifically plausible values of gamma. We illustrate the approach using a cancer clinical trial. References:Jackson D., I. R. White, S. Seaman, H. Evans, K. Baisley, J. Carpenter. 2014. Relaxing the independent censoring assumption in the Cox proportional hazards model using multiple imputation. Statistics in medicine. 33: 4681–4694. https://CRAN.R-project.org/package=InformativeCensoring
Contributor:
Patrick Royston
MRC Clinical Trials Unit at UCL
Additional information:
Ian R. White
MRC Clinical Trials Unit at UCL
|
10:40–11:00 | Influence analysis with panel data using Stata
Abstract:
The presence of units that possess extreme values in the
dependent variable and independent variables (for example,
vertical outliers, good and bad leverage points) has the
potential to severely bias least-squares (LS)
estimates—for example, regression coefficients and
standard errors.
Diagnostic plots (such as leverage-versus-squared residual
plots) and measures of overall influence (for example, Cook's
[1979] distance) are usually used to detect such anomalies,
but there are two different problems arising from their use.
First, available commands for diagnostic plots are built for
cross-sectional data, and some data manipulation is necessary
for panel data. Second, Cook-like distances may fail to flag
multiple anomalous cases in the data because they do not account
for pairwise influence of observations (Atkinson 1993;
Chatterjee and Hadi 1988, Rousseeuw 1991; Rousseeuw and Van
Zomeren 1990, Lawrance 1995). I overcome these limits as
follows. First, I formalize statistical measures to quantify the
degree of leverage and outlyingness of units in a panel-data
framework to produce diagnostic plots suitable for panel data.
Second, I build on Lawrance's [1995] pairwise approach by
proposing measures for joint and conditional influence suitable
for panel-data models with fixed effects.
I develop a method to visually detect anomalous units in a panel dataset and identify their types; investigate the effect of these units on LS estimates, and on other units’ influence. I propose two community-contributed commands in Stata to implement this method. xtlvr2plot produces a leverage-versus-residual plot suitable for panel data, and a summary table with the list of detected anomalous units and their type. xtinfluence calculates the joint and conditional influence and effects of pairs of units, and generates network-style plots (an option between scatterplot or heat plot is allowed by the command). JEL codes: C13, C15, C23.
Additional information:
Annalivia Polselli
Institute for Analytics and Data Science and University of Essex
|
11:00–11:30 | A suite of programs for the design, development, and validation of clinical prediction models
Abstract:
An ever-increasing number of research questions focuses on the
development and validation of clinical prediction models to
inform individual diagnosis and prognosis in healthcare.
These models predict outcome values (for example, pain
intensity) or outcome risks (for example, five-year mortality risk)
in individuals from a target population (for example, pregnant
women; cancer patients). Development and validation of such
models is a complex process, with a myriad of statistical
methods, validation measures, and reporting options. It is
therefore not surprising that there is considerable evidence of
poor methodology in such studies.
In this presentation, I will introduce a suite of ancillary software packages with the prefix “pm”. The pm-suite of packages aims to facilitate the implementation of methodology for building new models, validating existing models and transparent reporting. All packages are in line with the recommendations of the TRIPOD guidelines, which provide a benchmark for the reporting of prediction models. I will showcase a selection of packages to aid in each stage of the life cycle of a prediction model, from the initial design (for example, sample-size calculation using pmsampsize and pmvalsampsize), to development and internal validation (for example, calculating model performance using pmstats), external validation (for example, flexible calibration plots of performance in new patients using pmcalplot), and model updating (for example, comparing updating methods using pmupdate). Through an illustrative example, I will demonstrate how these packages allow researchers to perform common prediction modeling tasks quickly and easily while standardizing methodology.
Additional information:
Dr. Joie Ensor
University of Birmingham
|
11:30–12:30 | Bayesian model averaging
Abstract:
Model uncertainty accompanies many data analyses.
Stata's new bma suite, which performs Bayesian model
averaging (BMA), helps address this uncertainty in the context of
linear regression. Which predictors are important given the
observed data? Which models are more plausible? How do
predictors relate to each other across different models? BMA can
answer these and more questions. BMA uses the Bayes theorem to
aggregate the results across multiple candidate models to
account for model uncertainty during inference and prediction in
a principled and universal way. In my presentation, I will
describe the basics of BMA and demonstrate it with the
bma suite. I will also show how BMA can become a useful
tool for your regression analysis, Bayesian or not!
Additional information:
Yulia Marchenko
StataCorp LLC
|
1:30–1:50 | Prioritizing clinically important outcomes using the win ratio
Abstract:
The win ratio is a statistical method used for analyzing
composite outcomes in clinical trials.
Composite outcomes are composed of two or more distinct
“component” events (for example, heart attacks,
death) and are often analyzed using time-to-first event methods
ignoring the relative importance of the component events. When
using the win ratio, component events are instead placed into a
hierarchy from most to least important; more important
components can then be prioritized over less important outcomes
(for example, death, followed by myocardial infarction). The
method works by first placing patients into pairs. Within each
pair, one evaluates the components in order of priority starting
with the most important until one of the pair is determined to
have a better outcome than the other.
A major advantage of the approach is its flexibility: one can include in the hierarchy outcomes of different types (for example, time-to-event, continuous, binary, ordinal, and repeat events). This can have major benefits, for example by allowing assessment of quality of life or symptom scores to be included as part of the outcome. This is particularly helpful in disease areas where recruiting enough patients for a conventional outcomes trial is unfeasible. The win-ratio approach is increasingly popular, but a barrier to more widespread adoption is a lack of good statistical software. The calculation of sample sizes is also complex and usually requires simulation. We present winratiotest, the first package to implement win-ratio analyses in Stata. The command is flexible and user-friendly. Included in the package is the first software (we know of) that can calculate the sample size for win-ratio-based trials without requiring simulation.
Contributors:
Tim Collier
Joan Pedro Ferreira
London School of Hygiene and Tropical Medicine
Additional information:
John Gregson
London School of Hygiene and Tropical Medicine
|
1:50–2:10 | Object-oriented programming in Mata
Abstract:
Object-oriented programming (OOP) is a programming paradigm that
is ubiquitous in today's landscape of programming languages.
OOP code proceeds by first defining separate
entities—classes—and their relationships, and then
lets them communicate with each another. Mata, Stata's matrix
language, does have such OOP capabilities. Comparison with some
other programming languages that are object-oriented, like Java
or C++, Mata offers a lighter implementation, but does so by
striking a nice balance between feature availability and
language complexity.
This presentation explores OOP features in Mata by describing the code behind dtms, a community-contributed package for discrete-time multistate model estimation. Estimation in dtms proceeds in several steps, where each step can nest multiple results of the next level, thus building up a treelike structure of results. The presentation explains how this treelike structure is implemented in Mata using OOP, and what the benefits of using OOP for this task are. These include easier code maintenance via a more transparent code structure, shorter coding time, and an easier implementation of efficient calculations. The presentation will at first provide simple examples of useful classes; for example, a class that represents a Stata matrix in Mata, or a class that can grab, hold, and restore Stata e()-results. More complex relationships among classes will then be explored in the context of the treelike results structure of dtms. While topics covered will include such technically sounding concepts as class composition, self-threading code, inheritance, and polymorphism, an effort will be made to link these concepts to tasks that are relevant to Stata users that have already gained or are interested in gaining an initial proficiency of Mata.
Additional information:
Daniel C. Schneider
Max Planck Institute for Demographic Research
|
2:10–2:40 | A review of machine learning commands in Stata: Performance and usability evaluation
Abstract:
This presentation provides a comprehensive survey reviewing
machine learning (ML) commands in Stata.
I systematically categorize and summarize the available ML
commands in Stata and evaluate their performance and usability
for different tasks such as classification, regression,
clustering, and dimension reduction. I also provide examples of
how to use these commands with real-world datasets and compare
their performance. This review aims to help researchers and
practitioners choose appropriate ML methods and related Stata
tools for their specific research questions and datasets, and to
improve the efficiency and reproducibility of ML analyses using
Stata. I conclude by discussing some limitations and future
directions for ML research in Stata.
Additional information:
Giovanni Cerulli
CNR-IRCRES
|
2:40–3:10 | On the shoulders of giants: Writing wrapper commands in Stata
Abstract:
For repeated tasks, it is convenient to use commands with simple
syntax that carry out more complicated tasks under the hood.
These can be data management and visualization tasks or
statistical analyses. Many of these tasks are variations or
special cases of more versatile approaches. Instead of
reinventing the wheel, wrapper commands build on the existing
capabilities by “wrapping” around other commands.
For example, certain types of graphs might require substantial
effort when building them from scratch using Stata's graph
twoway commands, but this process can be automated with a
dedicated command. Similarly, many estimators for specific
models are special cases of more general estimation techniques,
such as maximum likelihood or generalized method of moments
estimators. A wrapper command can be used to translate
relatively simple syntax into the more complex syntax of Stata's
ml or gmm commands, or even directly into the
underlying optimize() or moptimize() Mata
functions. Many official Stata commands can be regarded as
wrapper commands, and often there is a hierarchical wrapper
structure with multiple layers. For example, most commands for
mixed-effects estimation of particular models are wrappers for
the general meglm command, which itself just wraps around
the undocumented _me_estimate command, which then calls
gsem, which in turn initiates the estimation with the
ml package. The main purpose of the higher-layer
wrappers is typically syntax parsing. With every layer the
initially simple syntax is translated into the more general
syntax of the lower-layer command, but the user only needs to be
concerned with the basic syntax of the lop-layer command.
Similarly, community-contributed commands often wrap around
official or other community-contributed commands. They may even
wrap around packages written for other programming environments,
such as Python.
In this presentation, I discuss different types of wrapper commands and focus on practical aspects of their implementation. I illustrate these ideas with two of my own commands. The new spxtivdfreg wrapper adds a spatial dimension to the xtivdfreg command (Kripfganz and Sarafidis 2021) for defactored instrumental-variables estimation of large panel-data models with common factors. The xtdpdgmmfe wrapper provides a simplified syntax for the GMM estimation of linear dynamic fixed-effects panel-data models with the xtdpdgmm command.
Additional information:
Sebastian Kripfganz
Univeristy of Exeter
|
3:40–4:10 | Gigs package -new egen extensions for international newborn and child growth standards
Abstract:
Children’s growth status is an important measure commonly used
as a proxy indicator of advancements in a country’s health,
human capital, and economic development.
Understanding how and why child growth patterns have changed is
necessary for characterizing global health inequalities.
Sustainable development goal 3.2 aims to reduce preventable
newborn deaths by at least 12 deaths per 1,000 live births and
child deaths to 25 per 1,000 live births (WHO/UNICEF, 2019).
However, large gaps remain in achieving these goals: currently
54 and 64 (of 194) countries will miss the targets for child
(<5 years) and neonatal (<28 days) mortality, respectively
(UN IGME, 2022). Because infant mortality is associated strongly with
nonoptimal growth, accurate growth assessment using prescriptive
growth standards is essential to reduce these mortality gaps.
A range of standards can be used to analyze infant growth: In newborns, size-for-gestational age analysis of different anthropometric measurements is possible using the Newborn Size standards from the International Fetal and Newborn Growth Consortium for the 21st Century (INTERGROWTH-21st) project (Villar et al., 2014). In infants, growth analysis depends on whether the child is born preterm or term: for term infants, the WHO Child Growth Standards are appropriate (WHO MGRS Group, 2006), whereas there are INTERGROWTH-21st standards for post-natal growth in preterm infants (Villar et al., 2018). Unfortunately, many researchers apply these standards incorrectly, which can lead to inappropriate interpretations of growth trajectories (Perumal et al., 2022). As part of the Guidance for International Growth Standards (GIGS) project, we are making a range of these tools available in Stata to provide explicit, evidence-based functions through which these standards can be implemented in research and clinical care. We therefore introduce several egen extensions for converting between anthropometric measurements and centiles/z-scores in WHO and INTERGROWTH-21st standards. We also describe several egen functions that classify newborn size and infant growth according to international growth standards. References:Perumal, N., E. O. Ohuma, A. M. Prentice, P. S. Shah, A. Al Mahmud, S. E. Moore, D. E. Roth. 2022. Implications for quantifying early life growth trajectories of term-born infants using INTERGROWTH-21st newborn size standards at birth in conjunction with World Health Organization child growth standards in the postnatal period. Paediatric and Perinatal Epidemiology 6: 839–850. United Nations Inter-agency Group for Child Mortality Estimation (UN IGME). 2023. Levels & Trends in Child Mortality: Report 2022, Estimates developed by the United Nations Inter-agency Group for Child Mortality Estimation, United Nations Children’s Fund, New York. Villar, J., L. C. Ismail, C. G. Victora, E. O. Ohuma, E. Bertino, D. G. Altman, A. Lambert, A. T. Papageorghiou et al. 2014. International standards for newborn weight, length, and head circumference by gestational age and sex: The Newborn Cross-Sectional Study of the INTERGROWTH-21st Project. The Lancet 384(9946): 857–868. Villar, J., F. Giuliani, Z. A. Bhutta, E. Bertino, E. O. Ohuma, L. C. Ismail, F. C. Barros, D. G. Altman, et al. 2015. Postnatal growth standards for preterm infants: The Preterm Postnatal Follow-up Study of the INTERGROWTH-21st Project. The Lancet Global Health 3(11): e681–e691. WHO Multicentre Growth Reference Study Group. 2006. WHO Child Growth Standards based on length/height, weight and age. Acta Paediatrica Suppl. 450: 76–85. WHO/UNICEF. 2019. WHO/UNICEF discussion paper: The extension of the 2025 maternal, infant and young child nutrition targets to 2030. https://data.unicef.org/resources/who-unicef-discussion-paper-nutrition-targets/ (accessed May 15th, 2023).
Contributors:
Linda Vesel
Harvard T. H. Chan School of Public Health and Brigham and Women's Hospital
Eric Ohuma
London School of Hygiene and Tropical Medicine
Additional information:
Simon Parker
London School of Hygiene and Tropical Medicine
|
4:10–4:30 | Plot suite: Fast graphing commands for very large datasets
Abstract:
This presentation showcases the functionality of the new
“plot suite” of graphing commands.
The suite excels in visualizing very large datasets, enabling
users to produce a variety of highly-customizable plots in a
fraction of time required by Stata's native graphing commands.
Additional information:
Jan Kabatek
Melbourne Institute of Applied Economic and Social Research
|
4:30–5:30 | pystacked and ddml: Machine learning for prediction and causal inference in Stata
Abstract:
pystacked implements stacked generalization (Wolpert
1992) for regression and binary classification via Python’s
scikit-learn.
Stacking is an ensemble method that combines multiple supervised
machine learners—the “base” or
“level-0” learners—into a single learner. The
currently-supported base learners include regularized regression
(lasso, ridge, elastic net), random forest, gradient boosted
trees, support vector machines, and feed-forward neural nets
(multilayer perceptron). pystacked can also be used to
fit a single base learner and thus provides an easy-to-use API
for scikit-learn’s machine learning algorithms.
ddml implements algorithms for causal inference aided by supervised machine learning as proposed in “Double/debiased machine learning for treatment and structural parameters” (Econometrics Journal 2018). Five different models are supported, allowing for allowing for binary or continuous treatment variables and endogeneity in the presence of high-dimensional controls and/or instrumental variables. ddml is compatible with many existing supervised machine learning programs in Stata, and in particular has integrated support for pystacked, making it straightforward to use machine learner ensemble methods in causal inference applications.
Contributors:
Achim Ahrens
ETH Zürich
Christian B. Hansen
Thomas Wiemann
University of Chicago
Additional information:
Mark E. Schaffer
Heriot-Watt University
|
9:00–9:20 | Fitting the Skellam distribution in Stata
Abstract:
The Skellam distribution is a discrete probability distribution
related to the difference between two independent
Poisson-distributed random variables.
It has been used in a variety of contexts, including sports or
supply and demand imbalances in shared transportation. To the
best of our knowledge, Stata does not support the Skellam
distribution or the Skellam regression. In this presentation, I
plan to show how to fit the parameters of a Skellam distribution
and Skellam regression using Mata’s optimize function. The
optimization problem is then packaged into a basic Stata command
that I plan to describe.
Additional information:
Vincenzo Verardi
Université libre de Bruxelles
|
9:20–9:40 | A short report on making Stata secure and adding metadata in a new data platform
Abstract:
The presentation has two parts. A version of the first part was
presented at the 2022 Northern European Stata Conference.
Part 1. Securing Stata in a secure environment. Data access and
logging.
At CRN, we develop a secure environment for using Stata. A short description of this work is given describing the data access and logging of data extraction (JDBC + Java plugins) and Stata commands. Part 2. Metadata using characteristics. In the new solution, metadata is automatically attached to Stata .dta characteristics when users fetch data from the data warehouse. The implementation is described, along with some small utility programs to use metadata, and examples of use are presented.
Additional information:
Bjarte Aagnes
Cancer Registry of Norway
|
9:40–10:00 | Facilities for optimizing and designing multiarm multistage (MAMS) randomized controlled trials with binary outcomes
Abstract:
In this presentation, we introduce two Stata commands,
nstagebin and nstagebinopt, which can be used to
facilitate the design of multiarm multistage (MAMS) trials
with binary outcomes.
MAMS designs are a class of efficient and adaptive randomized
clinical trials that have successfully been used in many disease
areas, including cancer, TB, maternal health, COVID-19, and
surgery. The nstagebinopt command finds a class of
efficient “admissible” designs based on an
optimality criterion using a systematic search procedure. The
nstagebin command calculates the stagewise sample sizes,
trial timelines, and the overall operating characteristics of
MAMS design with binary outcomes. Both programs allow the use
of Dunnett's correction to account for multiple testing. We also
use the ROSSINI 2 MAMS design, an ongoing MAMS trial in surgical
wound infection, to illustrate the capabilities of both
programs. The new Stata commands facilitate the design of MAMS
trials with binary outcomes where more than one research
question can be addressed under one protocol.
Reference:
Contributors:
Daniel J. Bratton
GlaxoSmithKline
Mahesh KB Parmar
University College London
Additional information:
Babak Choodari-Oskooei
University College London
|
10:00–10:20 | How to check a simulation study
Abstract:
Simulation studies are a powerful tool in biostatistics, but they
can be hard to conduct successfully.
Sometimes, unexpected results are obtained. We offer advice on
how to check a simulation study when this occurs and how to
design and conduct the study to give results that are easier to
check. Simulation studies should be designed to include some
settings where answers are already known. Code should be written
in stages, and data-generating mechanisms should be checked
before simulated data are analyzed. Results should be explored
carefully, with scatterplots of standard error estimates against
point estimates a surprisingly powerful tool. When estimation
fails or there are outlying estimates, these should be
identified, understood, and dealt with by changing data-generating
mechanisms or coding realistic hybrid analysis
procedures. Finally, we give a series of ideas that have been
useful to us in the past for checking unexpected results.
Following our advice may help to prevent errors and to improve
the quality of published simulation studies. We illustrate the
ideas with a simple but realistic simulation study in Stata.
Contributors:
Ian R. White
Matteo Quartagno
Tim P. Morris
MRC Clinical Trials Unit at UCL
Additional information:
Tra My Pham
MRC Clinical Trials Unit at UCL
|
10:20–10:40 | Drivers of COVID-19 deaths in the United States: A two-stage modeling approach
Abstract:
We offer a two-stage (time-series and cross-section) econometric
modeling approach to examine the drivers behind the spread of
COVID-19 deaths across counties in the United States.
Our empirical strategy exploits the availability of two years
(January 2020 through January 2022) of daily data on the number
of confirmed deaths and cases of COVID-19 in the 3,000 U.S.
counties of the 48 contiguous states and the District of
Columbia. In the first stage of the analysis, we use daily
time-series data on COVID-19 cases and deaths to fit mixed
models of deaths against lagged confirmed cases for each county.
As the resulting coefficients are county specific, they relax
the homogeneity assumption that is implicit when the analysis is
performed using geographically aggregated cross-section units.
In the second stage of the analysis, we assume that these county
estimates are a function of economic and sociodemographic
factors that are taken as fixed over the course of the pandemic.
Here we employ the novel one-covariate-at-a-time
variable-selection algorithm proposed by Chudik et al.
(Econometrica, 2018) to guide the choice of regressors. The
second stage utilizes the SUR technique in an unusual setting,
where the regression equations correspond to time periods in
which cross-sectional estimates at the county level are
available.
Contributors:
Andrés Garcia-Suaza
University del Rosario
Miguel Henry
Jesús Otero
University del Rosario
Additional information:
Kit Baum
Boston College
|
11:10–11:30 | Use of Stata in modeling the determinants of work engagement
Abstract:
The research goal was to identify the determinants of the
phenomenon of work engagement.
Two primary datasets were used provided by Eurofound in the
European Working Conditions Survey. Data were gathered before
and during the COVID-19 pandemic, which allowed me to include the
pandemic context in the analysis. Additionally, some
macroeconomic and other social variables were included, such as
GDP per capita, labor force participation rate, unemployment
rate, the level of social trust, Doing Business Index, and
European Quality of Government Index. Stata, with its potential
for data cleaning and checking, allowed me to merge all variables
from complex datasets into one set with 115,608 observations and
over 100 variables from 34 European countries. When I
prepared the data, some repetitive tricks of commands were
applied. Stata programmability helped in preparing the model
using the logistic regression method. A dichotomous outcome
(dependent) variable was modeled—engaged or not engaged
into work. The predictor variables of interest were those
related to work, such as working conditions, occupational
characteristics, and the level of human capital. The
logistic command in Stata produced results in terms of
odds ratios, which were interpreted to calculate the effect of
chosen predictors on the response variable and consequently to
take or reject the constructed research hypothesis. The
innovation of presented analysis lies in including macroeconomic
or macrosocial variables and consideration of the international
and intersectoral analysis. The presented logit model provided
by Stata possibilities fills the research gap in the area of
the work engagement phenomenon.
Additional information:
Paulina Hojda
University of Łódź
|
11:30–12:30 | Heterogeneous difference-in-difference estimation
Abstract:
Treatment effects might differ over time and for groups that are
treated at different points in time, treatment cohorts.
In Stata 18, we introduced two commands that estimate treatment
effects that vary over time and cohort. For repeated
cross-sectional data, we have hdidregress. For panel
data, we have xthdidregress. Both commands let you graph
the evolution of treatment over time. They also allow you to
aggregate treatment within cohort and time and visualize these
effects. I will show you how both commands work and briefly
discuss the theory underlying them.
Additional information:
Enrique Pinzón
StataCorp LLC
|
1:30–1:50 | A robust test for linear and log-linear models against Box-Cox alternatives
Abstract:
The purpose of this presentation is to describe a new command,
xtloglin, that tests the suitability of the linear and
log-linear regression models against Box-Cox alternatives.
The command uses a GMM-based Lagrange multiplier test, which is
robust to nonnormality and heteroskedasticity of the errors and
extends the analysis by Savin and Würtz (2005) to panel data
regressions after xtreg.
The Box-Cox transformation, first introduced by Box and Cox (1964), is a popular approach for testing the linear and log-linear functional forms, because both are special cases of the transformation. The usual approach is to estimate the Box-Cox model by maximum likelihood, assuming normally distributed homoskedastic errors, and test the restrictions on the transformation parameter, which lead to linear and log-linear specifications using a Wald or likelihood ratio test. Despite the popularity of this approach, the estimator of the transformation parameter is not just restricted to the search for nonlinearity but also to one that leads to more normal errors, with constant variance. This can result in an estimate that favors log-linearity over linearity even though the true model is linear with non-normal or heteroskedastic errors. These issues are resolved by xtloglin because the GMM estimator is consistent under less restrictive distributional assumptions. References:Box, G. E., and Cox, D. R. (1964). An analysis of transformations. Journal of the Royal Statistical Society: Series B (Methodological), 26(2), 211–243. Savin, N. E., and Würtz, A. H. (2005). Testing the semiparametric Box–Cox model with the bootstrap. Identification and Inference for Econometric Models: Essays in Honor of Thomas Rothenberg, 322–354.
Additional information:
David Vincent
David Vincent Econometrics
|
1:50–2:10 | Network regressions in Stata
Abstract:
Network analysis has become critical to the study of social
sciences.
While several Stata programs are available for analyzing network
structures, programs that execute regression analysis with a
network structure are currently lacking. We fill this gap by
introducing the nwxtregress command. Building on spatial
econometric methods (LeSage and Pace 2009), nwxtregress
uses MCMC estimation to produce estimates of endogenous peer
effects, as well as own-node (direct) and cross-node (indirect)
partial effects, where nodes correspond to cross-sectional units
of observation, such as firms, and edges correspond to the
relations between nodes. Unlike existing spatial regression
commands (for example, spxtregress), nwxtregress
is designed to handle unbalanced panels of economic and social
networks. Networks can be directed or undirected with weighted
or unweighted edges, and they can be imported in a list format
that does not require a shapefile or a Stata spatial weight
matrix set by spmatrix. A special focus of the
presentation will be put on the construction of the spatial
weight matrix and integration with Python to improve speed.
Contributors:
William Grieser
Morad Zekhnini
Free University of Bozen-Bolzano
Additional information:
Jan Ditzen
Free University of Bozen-Bolzano
|
2:10–2:30 | The joy of sets: Graphical alternatives to Euler and Venn diagrams
Abstract:
Given several binary (indicator) variables and intersecting
sets, a Euler or Venn diagram may spring to mind, but even with
only a few sets the collective pattern becomes hard to draw and
harder to think about easily.
In genomics and elsewhere, so-called upsetplots (specialized bar
charts for the purpose) have become popular recently as
alternatives. This presentation introduces an implementation,
upsetplot, a complementary implementation, vennbar,
and associated minor extras and utilities. Applications include
examination of the structure of missing data and of the
cooccurrence of medical symptoms or any other individual binary
states. These new commands are compared with previous graphical
commands, both official and community contributed and both
frequently used and seemingly little known.
Secondary themes include data structures needed to produce and store results; what works better with graph bar and what works better with twoway bar; and the serendipity of encounters at Stata users' meetings.
Contributor:
Tim P. Morris
MRC Clinical Trials Unit, UCL
Additional information:
Nicholas J. Cox
Durham University
|
2:30–3:00 | geoplot: A new command to draw maps.
Abstract:
geoplot is a new command for drawing maps from shape
files and other datasets.
Multiple layers of elements such as regions, borders, lakes,
roads, labels, and symbols can be freely combined and the look
of elements (for example, color) can be varied depending on the
values of variables. Compared with previous solutions in Stata,
geoplot provides more user convenience, more
functionality, and more flexibility. In this presentation, I
will introduce the basic components of the command and
illustrate its use with examples.
Additional information:
Ben Jann
University of Bern
|
3:30–4:30 | Open panel discussion with Stata developers
Contribute to the Stata community by sharing your feedback with StataCorp's developers. From feature improvements to bug fixes and new ways to analyze data, we want to hear how Stata can be made better for our users.
|
The logistics organizer for the 2023 UK Stata Conference is Timberlake Consultants, the Stata distributor to the United Kingdom and Ireland, France, Spain, Portugal, the Middle East and North Africa, Brazil, and Poland.
View the proceedings of previous Stata Conferences and Users Group meetings.