The German Stata Users Group Meeting was Friday, 23 June 2017 but you can view the program and presentation slides below.
Proceedings
9:30–10:30 |
Abstract:
In their paper titled
Why Propensity Scores Should Not Be Used for Matching,
Gary King and Richard Nielsen suggest
that propensity-score matching (PSM) is inferior to other
matching procedures such as Mahalanobis matching (King and
Nielsen 2016). They argue
that PSM approximates complete randomization, whereas other
techniques approximate fully blocked randomization, and that
fully blocked randomization dominates complete randomization
in terms of statistical efficiency. They illustrate their
argument using constructed examples, simulations, and applications
to real data. Overall, their results suggest that PSM has dramatic
deficiencies and should best be discarded. Although the claim
about the superior efficiency of a fully blocked design over
complete randomization is true (given a specific sample size),
the problems King and Nielsen identify apply only under certain
conditions. First, the complete randomization argument is valid
only with respect to covariates that are not related to the
treatment. Second, and more importantly, King and Nielsen's "PSM
paradox" occurs only for specific variants of PSM. I
will explain why this is the case, and I will show that other
variants of PSM compare favorably with blocking procedures such
as Mahalanobis matching. I will illustrate my arguments using a
new matching software called "kmatch".
Additional information: Germany17_Jann.pdf
Ben Jann
University of Berne
|
10:30–11:30 |
Abstract:
SWire is a plugin that connects Stata to other software programs,
thus permitting the interaction between Stata and other client
applications. Software programs relying on SWire can exchange data
with Stata or request that Stata execute basic data management
operations. Client applications can be developed in many programming
languages, and even web pages can communicate with Stata via Swire.
The only requirement is that client applications communicate with
Stata via the SWire protocol, which is based on HTTP. Software programs
like R, QGis, and Office applications can be extended to
interact with Stata. Several software programs have been developed to
demonstrate how SWire can be usefully employed for connecting Stata to
other software programs. One of these is the new SWordy add-in for
Microsoft Word, which will be presented here. It allows for the
retrieval of data from Stata to Word and the creation of automatic
reports, namely, Word documents with numerical data and tables that
can be automatically obtained from Stata. Automatic reports are useful
for saving time when presenting results in addition to providing a way
for documenting the final part of data analysis workflow.
Additional information: Germany17_LoMagno.pdf
Giovanni Luca Lo Magno
University of Palermo
|
11:45–12:15 |
Abstract:
I present the new Stata command xtseqreg, which implements sequential
(two-stage) estimators for linear panel-data models. In general,
the conventional standard errors are no longer valid in sequential
estimation when the residuals from the first stage are regressed on
another set of (often time-invariant) explanatory variables at a second
stage. xtseqreg computes the analytical standard-error correction of
Kripfganz and Schwarz (2015), which accounts for the first-stage estimation
error. The command can be used to fit both stages of a sequential
regression or either stage separately. OLS and 2SLS estimation are
supported, as well as one-step and two-step "difference"-GMM and "system"-GMM
estimation in the spirit of Arellano and Bond (1991), Arellano and Bover
(1995), and Blundell and Bond (1998), with flexible choice of the instruments
and weighting matrix. Available postestimation statistics include the
Arellano–Bond test for absence of autocorrelation in the first-differenced
errors and Hansen's J-test for the validity of the overidentifying
restrictions. While I do not intend to introduce xtseqreg as a competitor
for existing commands, it can mimic part of their behavior. In particular,
xtseqreg can replicate results obtained with xtdpd and xtabond2.
In that regard, I will illustrate some pitfalls in the estimation of dynamic panel models.
References: Arellano, M., and S. R. Bond. 1991. Some tests of specification for panel data: Monte Carlo evidence and an application to employment equations. Review of Economic Studies 58: 277–297. Arellano, M., and O. Bover. 1995. Another look at the instrumental variable estimation of error-components models. Journal of Econometrics 68: 29–51. Blundell, R., and S. R. Bond. 1998. Initial conditions and moment restrictions in dynamic panel data models. Review of Economic Studies 87: 115–143. Kripfganz, S., and C. Schwarz. 2015. Estimation of linear dynamic panel data models with time-invariant regressors. ECB Working Paper 1838. European Central Bank. Additional information: Germany17_Kripfganz.pdf
Sebastian Kripfganz
University of Exeter Business School
|
1:45–2:15 |
Abstract:
The user-written package ardl, first released in 2014, estimates
autoregressive distributed lag (ARDL) time-series models and provides
the popular Pesaran, Shin, and Smith (2001, Journal of Applied
Econometrics) bounds testing procedure for a long-run relationship.
In this presentation, the statistics and application side of the
command take a back seat and give way to a discussion of the algorithms
used under the hood of ardl. Efficient programming is critical for
ardl for two reasons: optimal lag selection and for obtaining
critical values via simulation. This presentation will use the "case study"
of the ardl estimation command to discuss efficient programming
in Stata and Mata. Various programming concepts (compilation,
argument passing, data types, pointer variables, etc.) and their
implementation in Stata/Mata will be explained, as well as various
finer Mata-specific topics (fast matrix indexing, matrix inversion,
etc.). The overall message is that coding based on common sense,
knowledge of the workings of Stata/Mata, and knowledge of linear
algebra goes a long way when trying to write high-performance code
and in many cases is to be preferred to the tedium of moving to a
lower-level programming language like C/C++.
Additional information: Germany17_Schneider.pdf
Daniel C. Schneider
Max Planck Institute for Demographic Research
|
2:15–2:45 |
Abstract:
Stata 14 includes the multilevel model for binary (melogit)
and ordinal logits (meologit). Unfortunately, except for the
global Wald test of the estimated fixed effects, both models
do not provide any fit measure to assess its practical significiance.
Therefore, I developed an ado-file to calculate McFadden's and
McKelvey and Zavoina's pseudo-R²s. It estimates the intraclass
correlation (ICC) of the dependent variable for the actual sample
to assess the maximum of the contextual effect. Since the early
1990s, a lot of Monte Carlo simulation studies (Hagle and Mitchell 1992;
Veall and Zimmermann 1992, 1993, 1994; Windmeijer 1995; DeMaris 2002)
proved that McKelvey and Zavoina pseudo-R² is the best one to assess
the fit of binary and ordinal logit models. My ado-file calculates
this fit in two complementary ways: first, for the fixed
effects only, and second, for the fixed and random effects together.
The estimation of McFadden's pseudo-R² uses two different zero models:
first, the random-intercept-only model (RIOM) knowing the contextual
units, and second, the fixed-intercept-only model (FIOM) ignoring the
contextual units completely. For each of them, it calculates the global
likelihood-ratio-chi2 test statistic whether all fixed effects or all
fixed and random effects are zero in the population. An empirical study
of drug consumption in European countries demonstrates the usefulness
of my fit_meologit_2lev.ado or fit_meologit_3lev.ado files
for multilevel binary and ordinal logit models.
References: DeMaris, A. 2002. Explained variances in logistic regression. A Monte Carlo study of proposed measures. Sociological Methods & Research 11: 27–74. McFadden, D. 1979. Quantitative methods for analysing travel behaviour of individuals: Some recent developments. Behavioural travel modelling, ed. D.A. Hensher and P.R. Stopher, 279–318. London: Croom Helm. McKelvey, R., and W. Zavoina. 1975. A statistical model for the analysis of ordinal level dependent variables. Journal of Mathematical Sociology 4: 103–120. Hagle, T. M., and G. E. Mitchell II. 1992. Goodness of fit measures for probit and Logit. American Journal of Political Science 36: 762–784. Veall, M.R. & Zimmermann, K.F. (1992): Pseudo-R² in the ordinal probit model. Journal of Mathematical Sociology, 16, 4, pp. 333–342. Veall, M. R., and K. F. Zimmermann. 1994. Evaluating pseudo-R²'s for binary probit models. Quality & Quantity 28: 151–164. Windmeijer, F. A. G. 1995. Goodness-of-fit measures in binary choice models. Econometric Reviews 14: 101–116. Zimmermann, K. F. 1993. Goodness of fit in qualitative choice models: Review and evaluation. In Studies in Applied Econometrics, ed. H. Schneeweiß and K. Zimmermann, 25–74. Heidelberg: Physika. Additional information: Germany17_Langer.pdf
Wolfgang Langer
Martin Luther University of Halle-Wittenberg
|
2:45–3:15 |
Abstract:
Several authors have introduced different methods for decomposing
the variance of a variable into an additive genetic (A), a shared
environmental (C), and a unique environmental (E) component using
twin data and multilevel mixed-effects (MME) models; Guo and Wang
2002; McArdle and Prescott 2005; Rabe-Hesketh, Skrondel, and Gjessing 2008,
who used Stata). In recent years, the focus of behavioral genetic research
has increasingly shifted toward analyzing the causal influence of
these genetic and environmental components of traits on the development
of inequalities. Regarding methods, this implies estimating the effects
of ACE components, that is, estimating models with ACE-decomposed explanatory
variables. This presentation compares different MME implementations of
such models using the meglm and the gsem packages of Stata: A bivariate
ACE decomposition (McArdle and Prescott 2005), a one step-estimator for the
ACE decomposition and its effects, and a more flexible two-step estimator
based on plausible values for the ACE components. Conceptually, these models
are extensions of hybrid MME models (Allison 2009), which replace the
within-between-group-decomposition of explanatory variables with an
ACE-decomposition. To demonstrate how these models facilitate the causal
analyses of inequalities, the presentation uses examples based on data of
TwinLife, the new German twin family panel.
References: Allison, P. D. 2009. Fixed Effects Regression Models. Quantitative Applications in the Social Sciences 160. Thousand Oaks, CA: SAGE publications. Guo, G., and J. Wang. 2002. The mixed or multilevel model for behavior genetic analysis. Behavior Genetics 32: 37–49. McArdle, J. J., and C. A. Prescott. 2005. Mixed-effects variance components models for biometric family analyses. Behavior Genetics 35: 631–652. Rabe-Hesketh, S., A. Skrondel, and H.K. Gjessing. 2008. Biometrical modeling of twin and family data using standard mixed model software. Biometrics 64: 280–288. Additional information: Germany17_Lang.pdf
Volker Lang
Bielefeld University
|
3:30–4:00 |
Abstract:
Despite its well-known weaknesses and existing alternatives in
the literature, the Kappa coefficient (Cohen 1960: Fleiss 1971)
remains the most frequently applied statistic when it
comes to quantifying agreement among raters. It is also the only
available measure in official Stata that is explicitly dedicated
to assessing inter-rater agreement for categorical data. In this
presentation, I briefly review Cohen's Kappa and five related
statistics within a general framework of chance-corrected agreement
coefficients, discussed in Gwet (2014). The presentation covers the
generalization of all measures to multiple raters, weights for partial
disagreement that are suitable for any data level of measurement,
the treatment of missing ratings, and a new probabilistic method
for benchmarking the estimated coefficients. I introduce the
kappaetc command, which implements these concepts.
References: Cohen, J. 1960. A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20: 37–46. Fleiss, J. L. 1971. Measuring nominal scale agreement among many raters. Psychological Bulletin 76: 378–382. Gwet, K. L. 2014. Handbook of Inter-Rater Reliability. Gaithersburg, MD: Advanced Analytics, LLC. Additional information: Germany17_Klein.pdf
Daniel Klein
University of Kassel
|
4:00–5:00 |
Abstract:
Stata 15 introduces the new estimation command menl for fitting nonlinear
mixed-effects models, also known as nonlinear multilevel models and nonlinear
hierarchical models. These models can be thought of in two ways:
as nonlinear models containing random effects or
as linear mixed-effects models in which some or all fixed and random
effects enter nonlinearly. The overall error distribution is assumed to be
Gaussian. Nonlinear mixed-effects models have been used to model drug
absorption in the body, intensity of earthquakes, and growth of plants, to
name a few.
In my presentation, I will demonstrate how to use the new menl command to fit nonlinear mixed-effects models in a variety of applications, including population pharmacokinetics and macroeconomics. Additional information: Germany17_Marchenko.pdf
Yulia Marchenko
StataCorp
|
5:15–6:00 |
StataCorp
|
6:00–6:30 |
StataCorp
|
Workshop: 22 June
Generalized propensity-score matching and its implementation in Stata
Presented by Michaela Bia, Luxembourg Institute of Socio-Economic Research (LISER)
This workshop examines advanced techniques for causal inference, with a focus on generalized propensity score-based methods. Much of the work on propensity-score analysis has focused on the case where the treatment is binary, but in many empirical studies, treatments may take on many values, implying that participants in the study may receive different treatment levels. In such cases, focus is on assessing the heterogeneity of treatment effects arising from variation in the amount of treatment exposure, that is, on estimating a dose–response function (DRF). In this workshop, we build on the work by Hirano and Imbens (2004), who introduced the concept of the generalized propensity score (GPS) and employed it to estimate the DRF of a continuous treatment, within the potential outcome approach to causal inference (Rubin 1974, 1978). In particular, we will focus on parametric (Hirano and Imbens 2004; Bia and Mattei 2008) and semiparametric techniques (Bia, Flores-Lagunes, Flores, and Mattei 2014) to estimate the DRF.
Organizers
Scientific committee
Johannes Giesecke
Humboldt University Berlin
Ulrich Kohler
University of Potsdam
Logistics organizer
The logistics organizer for the 2017 German Stata Users Group meeting is Dittrich & Partner Consulting GmbH, the distributor of Stata in Germany, the Netherlands, Austria, the Czech Republic, and Hungary.
View the proceedings of previous Stata Users Group meetings.