The Swiss Stata Users Group Meeting was held on 25 October 2018 at ETH Zürich. There was also an optional workshop the same day. You can view the program and presentation slides below.
9:00–10:00 |
Abstract:
The field of machine learning is attracting increasing attention
among social scientists and economists. At the same time, Stata
offers only a limited set of machine learning tools to date. This
presentation introduces two Stata packages, lassopack and pdslasso,
that implement regularized regression methods, including
the lasso for Stata. The packages include features
intended for prediction, model selection, and causal inference
and are thus applicable in many settings. The commands allow for
high-dimensional models, where the number of regressors may be
large or even exceed the number of observations under the assumption
of sparsity. lassopack implements lasso, square-root
lasso, elastic net, ridge regression, adaptive lasso, and postestimation
OLS. These methods rely on tuning parameters, which determine the
degree and type of penalization. lassopack supports three approaches
for selecting these tuning parameters: information criteria
(implemented in lasso2), K-fold and h-step ahead rolling cross-validation
(cvlasso), and theory-driven penalization (rlasso) due to Belloni
et al. (2012). pdslasso offers methods to
facilitate causal inference in structural models. Specifically, pdslasso
implements methods for selecting control variables (pdslasso) and
instruments (ivlasso) from a large set of variables in a setting where
the researcher is interested in estimating the causal impact of one or
more (possibly endogenous) causal variables of interest.
Additional information: switzerland18_Ahrens.pdf
Achim Ahrens
Economic and Social Research Institute, Dublin
|
10:00–10:30 |
Abstract:
The overall look of Stata's graphs is determined by so-called
scheme files. Scheme files are system components; that is,
they are part of the local Stata installation. In this presentation,
I will argue that style settings deviating from default schemes
should be part of the script producing the graphs rather than being
kept in separate scheme files, and I will present software that
supports such practice. In particular, I will present a command
called grstyle that allows users to quickly change the
overall look of graphs without having to fiddle around with external
scheme files. I will also present a command called colorpalette
that provides a wide variety of color schemes for use in Stata graphics.
Additional information: switzerland18_Jann.pdf
Ben Jann
Universität Bern
|
11:00–11:30 |
Abstract:
This presentation discusses the average causal effect (ACE) of an
endogenous binary treatment on an ordinal outcome when the sample is
subject to endogenous selection. I show how to estimate the
ACE using an extended regression model (ERM) command in Stata. I
illustrate how to do regression adjustment in Stata and discuss
standard errors for sample-averaged treatment effects and
population-averaged treatment effects.
Additional information: switzerland18_Drukker.pdf
David Drukker
StataCorp
|
11:30–12:00 |
Abstract:
In recent years, we have witnessed a tremendous surge of empirical
analyses that use geospatial data or data with a network
structure. Inference in these settings is challenging because
unobserved errors can be correlated in space along a network
or over time and because the standard approaches to conducting
inference are not compelling. We developed an estimator for the
variance–covariance matrix (VCV) of OLS and IV estimates that
allows for arbitrary dependence between observational units.
Arbitrary here refers to the fact that there are no restrictions
in the way units could be correlated with each other in space and
time: this estimator can account for indirect links in the
cross-sectional dependence, time dependence, and alteration
of the correlation structure over time. Our estimator builds on
the seminal insight by White (1980), who shows that a sandwich
type VCV can be estimated by constructing a consistent estimator
of the VCV of the parameters. Specifically, the estimator uses
estimated regression errors and knowledge on the clustering
structure to reconstruct estimates of the unknown elements
of the sandwich formula. We also provide the community with a
companion statistical package: our acreg command enables
users to adjust OLS and 2SLS coeficients' standard errors,
accounting for arbitrary dependence. We conduct a Monte Carlo
study to illustrate how correlation across units within an
arbitrary cluster, for example, spatially close units or
friends in a network, affect the rejection rate of a null hypothesis
if such correlation is not accounted for while estimating the
standard errors. We implement simulations using real-life data
to construct arbitrary clusters, for example, geocoded data on U.S. towns
and counties for the spatial setting and authorship connections
data for the network setting. We construct a setting where IV
with cluster–robust standard errors rejects the null of no effect
in about 20% of all cases when the significance level of the test
is set at 5%. Conventional inference does not improve as the sample
size increases, suggesting that the conventional approach produces
inconsistent estimates of the variance–covariance matrix. Adopting
the arbitrary clustering estimator, we find that the null
rejection rate is about 10% for small samples and converges
quickly toward the true significance level of 5% as the sample size
increases. This pattern suggests that the arbitrary clustering correction
produces consistent estimates of the VCV, enabling applied
econometricians to conduct robust inference in the presence of
arbitrary clustering.
Additional information: switzerland18_Colella.pdf
Fabrizio Colella
University of Lausanne
|
12:00–12:30 |
Abstract:
Using the Stata community-contributed command xtdcce2, I
show how to estimate long-run coefficients in a dynamic panel
with heterogeneous coefficients and common factors and a large
number of observations over cross-sectional units and time periods.
The common factors cause cross-sectional dependence, which is
approximated by cross-sectional averages. Heterogeneity of the
coefficients is accounted by taking the unweighted averages of the
unit-specific estimates. Following Chudik et al.
(2016), I consider three different models
to estimate long-run coefficients: a simple dynamic
model (CS-DL), an error-correction model, and an ARDL model (CS-ARDL).
I explain how to estimate all three models in Stata using xtdcce2.
Further emphasis is put on estimating the standard errors of the
long-run coefficients. Estimated standard errors obtained by the delta
method and bootstrapped standard errors are compared.
Reference: Chudik, A., K. Mohaddes, M. H. Pesaran, and M. Raissi. 2016. Long-run effects in large heterogeneous panel data models with cross-sectionally correlated errors. Essays in Honor of Aman Ullah. Advances in Econometrics 36: 85–135. Additional information: switzerland18_Ditzen.pdf
Jan Ditzen
Heriot-Watt University
|
3:15–4:30 |
Abstract:
David Drukker
StataCorp
|
4:30–5:00 |
Abstract:
Stata developers present will carefully and cautiously
consider wishes and grumbles from Stata users in the audience.
Questions, and possibly answers, may concern reports of
present bugs and limitations or requests for new features in
future releases of the software.
StataCorp personnel
StataCorp
|
Many estimators in statistics, econometrics, and biostatistics are cast as multi-step estimators. Multi-step estimators produce consistent point estimates, but the standard errors must be corrected. This problem is so common that it even emerges when estimating population averaged effects from a regression with powers or interactions. This workshop introduces the solution of stacked moment equations, which is a special case of the generalized method of moments (GMM), and shows how to implement this solution using the gmm command in Stata.
This workshop also includes an introduction to Monte Carlo simulations. In addition to describing the mechanics of running a Monte Carlo in Stata, it discusses how to use Monte Carlo simulations to illustrate a theoretical point.
The workshop is included in your meeting registration.
Allister Loder
ETH Zürich
Henrik Becker
ETH Zürich
Basil Schmid
ETH Zürich
Ben Jann
Universität of Bern
The 2018 Swiss Stata Users Group meeting is jointly organized by the Swiss Federal Institute of Technology and Ritme, scientific solutions, the distributor of Stata in Belgium, France, and Switzerland.
View the proceedings of previous Stata Users Group meetings.