9:15–10:00 | Double-debiased machine learning in Stata
Abstract:
We introduce ddml, a package for double-debiased machine learning
in Stata. ddml implements algorithms for causal inference aided by
supervised machine learning. Five different models are supported, allowing
for binary or continuous treatment variables as well as instrumental variables.
ddml uses stacking regression as the default machine learner but
may be used in combination with other methods implemented in Stata.
Contributors:
Christian B. Hansen
University of Chicago
Mark E. Schaffer
Heriot-Vatt University
Additional information: Achim Ahrens
ETH Zürich
|
10:00–10:30 | kinkyreg: Instrument-free inference for linear regression models with endogenous regressors
Abstract:
In models with endogenous regressors, a standard regression approach is to
exploit just-identifying or overidentifying orthogonality conditions by using instrumental variables. In
just-identified models, the identifying orthogonality assumptions cannot be tested without
the imposition of other nontestable assumptions.
While formal testing of overidentifying restrictions is possible,
its interpretation still hinges on the validity of an initial set of
untestable just-identifying orthogonality conditions. We present the kinkyreg Stata program
for kinky least-squares (KLS) inference, which adopts an alternative approach to identification. By
exploiting nonorthogonality conditions in the form of bounds on the admissible degree of
endogeneity, feasible test procedures can be constructed that do not require instrumental
variables. The KLS confidence bands can be more informative than confidence intervals obtained from
instrumental-variable estimation, in particular when the instruments are weak. Moreover, the
approach facilitates a sensitivity analysis for the standard instrumental-variable inference.
In particular, it allows one to assess the validity of previously untestable just-identification
exclusion restrictions. Further KLS-based tests include heteroskedasticity, function form,
and serial correlation tests.
Contributor:
Sebastian Kripfganz
University of Exeter Business School
Additional information: Jan F. Kiviet
University of Amsterdam
|
11:00–11:30 | Two-step multilevel analysis using Stata
Abstract:
This presentation describes twostep, a bundle of programs
to perform multilevel analyses with the two-step approach in one step.
The two-step approach to multilevel analysis means to estimate a
parameter of interest in a unit-level dataset (for example, individuals within
countries) that is fed as a dependent variable into an analysis on the
cluster level (for example, countries).
The two-step approach is sometimes seen
as superior to the more standard one-step approach if the numbers of
observations on the cluster level become small. Additionally, two-step
mulitlevel analysis may be used as a companion to the one-step
approach, for instance, to check model or linearity assumptions.
twostep is created specifically with this second use in mind.
Contributor:
Ulrich Kohler
University of Potsdam
Additional information: Johannes Giesecke
Humboldt University Berlin
|
11:30–12:00 | xtbreak: Estimating and testing breakpoints in time series and panel data
Abstract:
The recent events that have plagued the global economy, such
as the 2008 financial crisis or the 2020 COVID-19 outbreak, hint to
multiple structural breaks in economic relationships. I present xtbreak,
which implements the estimation of single and multiple breakpoints and
tests for structural breaks in time series and panel data.
The
estimation and the tests follow the methodologies developed in Andrews
(1993),
Bai and Perron (1998), and Ditzen,
Karavias, and Vesterlund (2021). For both time-series and panel-data
regressions, five tools are provided: (i) a test of no structural
change against the alternative of a specific number of changes; (ii) a
test of the null hypothesis of no structural change against the alternative
of an unknown number of structural changes; (iii) a test of the null of
s changes against the alternative of s + 1 changes; (iv) consistent
break date estimators; and (v) asymptotically valid confidence intervals
for the break dates.
References: Andrews, D. V. K. 1993. Tests for parameter instability and structural change with unknown change point. Econometrica 61: 821–856. Bai, B. Y. J., and P. Perron. 1998. Estimating and testing linear models with multiple structural changes. Econometrica 66: 47–78. Ditzen, J., Y. Karavias, and J. Vesterlund. 2021. Testing for Multiple Structural Breaks in Panel Data.
Additional information: Jan Ditzen
Heriot-Watt University
|
1:00–1:30 | Playing nice with others: Initializing your work with external configurations
Abstract:
Stata comes with ample internal features to set up and automate your
workflow and analysis routines. However, interdisciplinary teams or
interconnected workflow may give rise to the wish to separate easily
adjustable settings from core procedures in a way that is accessible to
those not fluent in Stata for configuration or review.
This presentation
will consider three specific variants—namely, external Stata macros,
INI, and MS Excel—and outline some general principles to facilitate
discussion on good practices within the Stata community.
Additional information: Sven Oliver Spieß
DPC Software GmbH
|
1:30–2:15 | Bayesian vector autoregressive models in Stata
Abstract:
Vector autoregressive (VAR) models are a popular choice for studying the
joint dynamics of multiple time series. They require no special
structure because the outcome variables are regressed on their own lagged
variables. One of the main problems with VAR models is the significant
number of regression parameters, which is proportional to the number of
lags. As a result, fit to small data, complex VAR models tend to
show poor forecasting performance.
In Stata 17, we introduced a new command, bayes:var, for fitting Bayesian
VAR models. Bayesian VAR models apply priors on the regression
parameters and variance-covariance of the errors for a fine control over
the posterior time-series process. By default, the prior on regression
coefficients shrinks them toward a random-walk process that assumes no
relationship between time-series variables. This assumption helps
avoid overfitting the data. The Bayesian approach also provides a
systematic and unambiguous way of determining the number of lags.
In this presentation, I illustrate Bayesian VAR models on some real data and show model interpretations based on their impulse–response functions. I also compute Bayesian forecasts and compare them with classical forecasts.
Additional information: Nikolay Balov
StataCorp
|
2:45–3:15 | dstat: A unified framework for estimation of summary statistics and distribution functions
Abstract:
I present a new Stata command that unites a variety of
methods to describe (univariate) statistical distributions. Covered
are density estimation, histograms, cumulative distribution functions,
probability distributions, quantile functions, Lorenz curves, percentile
shares, and a large collection of summary statistics such as classical
and robust measures of location, scale, skewness, kurtosis, and inequality and poverty measures.
Particular features of the command
are that it provides consistent standard errors supporting complex
sample designs for all covered statistics and that the simultaneous
estimation of multiple statistics across multiple variables and multiple
subpopulations is possible. Furthermore, the command supports
covariate balancing based on reweighting techniques (inverse probability
weighting and entropy balancing), including appropriate correction of
standard errors. Standard-error estimation is implemented in terms of
influence functions, which can be stored for further analysis, for
example, in RIF regressions.
Additional information: Ben Jann
University of Bern
|
3:15–3:45 | wikiviews—A Stata interface for the Wikipedia API
Abstract:
I present the community-contributed Stata command wikiviews, which allows
flexible calls to the official Wikimedia API and to the database of its
predecessor maintained by Peter Meissner. The program allows you to create
Stata datasets holding pageviews and related statistics of long lists of
Wikipedia pages from 2007 up to now.
Additional information: Ulrich Kohler
University of Potsdam
|
4:15–5:00 | Treatment-effects estimation with lasso
Abstract:
There is always an intrinsic conflict between the unconfoundedness
assumption and the overlap assumption regarding the treatment-effects
estimation. With high-dimensional controls, this conflict becomes even
more vivid. This presentation shows how to overcome this conflict by
using Stata 17's telasso command.
telasso estimates the
average treatment effects with high-dimensional controls while using
lasso for model selection. This estimator is Neyman orthogonal
because it is robust to the model-selection mistakes. It is also doubly
robust, so only one of the models needs to be correctly specified.
Additional information: Di Liu
StataCorp
|
5:00–5:30 |
Open panel discussion with Stata developers
StataCorp
|
Ulrich Kohler University of Potsdam |
Johannes Giesecke Humboldt University Berlin |
The logistics organizer for the 2021 German Stata Conference is DPC Software GmbH, the official distributor of Stata in Germany, the Netherlands, Austria, the Czech Republic, and Hungary.
View the proceedings of previous Stata Conferences and Users Group meetings.