The 2019 German Stata Users Group Meeting was held on 24 May at the Seidlvilla club e.V.. There was an optional workshop on 23 May.
9:15–10:15 |
Abstract:
Part of the art of coding is writing as little as possible to
do as much as possible. In this presentation, I expand on this
truism and give examples of Stata code to yield tables and graphs
in which most of the real work is delegated to workhorse commands.
In tabulations and listings, the better known commands sometimes seem
to fall short of what you want. However, some preparation commands
(such as generate, egen, collapse, or contract)
followed by list, tabdisp, or tab can get you a long way.
In graphics, a key principle
is that graph twoway is the most general command even when you do not
want rectangular axes. Variations on scatter and line plots are precisely
that, variations on scatter and line plots. More challenging illustrations
include commands for circular and triangular graphics, in which x and y
axes are omitted with an inevitable but manageable cost in recreating
scaffolding, titles, labels, and other elements. The examples range in
scope from a few lines of interactive code to fully developed programs.
This presentation is thus pitched at all levels of Stata users.
Additional information: germany19_Cox.pdf
Nicholas J. Cox
University of Durham
|
10:15–10:45 |
Abstract:
Precise and detailed data documentation is essential for the secondary
analysis of scientific data, whether they are survey or official microdata.
Among the most important metadata in this perspective are variable and
category labels and frequency distributions and descriptive statistics.
To generate and publish these metadata from Stata datafiles, an efficient
export interface is essential. It must be able to handle large and complex
datasets, account for the specifics of different studies, and generate
flexible output formats (depending on the requirements of the documentation
system). As a solution to the problem described above, we present the process
developed in the GML (German Microdata Lab) at GESIS. In the first step, we
show how an aggregated file with all required metadata can be generated from
the microdata. In the second step, this file is transformed into a standardized
DDI format. Additionally, we will present the implementation for MISSY (the
metadata information system for official microdata at GESIS), which includes
some practical additions (for example, communication with the MISSY database to
retrieve existing element identifiers, writing an output tailored to the
MISSY data model).
Additional information: germany19_Balz.pdf
Anne Balz
Klaus Pforr
Florian Thirolf
GESIS - Leibniz-Institut für Sozialwissenschaften
|
11:00–12:00 |
Abstract:
An Agent Based Model (ABM) is a simulation in which agents that each follow
simple rules interact with one another and thus produce an often
surprising outcome at the macro level. The purpose of an ABM is to explore
mechanisms through which actions of the individual agents add up to a
macro outcome by varying the rules that agents have to follow or varying
with whom the agent can interact (for example, varying the network).
A simple example of an ABM is Schelling's segregation model, in which he showed that one does not need racists to produce segregated neighborhoods. The model starts with 25 red and 25 blue agents, each of which live in a cell of a chessboard. They can have up to 8 neighbors. In order for an agent to be happy, they need to have some, e.g. 30%, agents in the neighborhood of the same color. If the agent is unhappy, they will move to another empty cell that will make them happy. If we repeat this until everybody is happy or nobody can move, we will often end up with segregated neighborhoods. Implementing a new ABM will always require programming, but a lot of the tasks will be similar across ABMs. For example, in many ABMs the agents live on a square grid (like a chessboard), and can only interact with their neighbors. I have created a set of Mata functions that will do those tasks, and someone can also import their own ABM. In this presentation, I will illustrate how to build an ABM in Mata with these functions. Additional information: germany19_Buis.zip
Maarten Buis
Universität Konstanz
|
12:00–12:30 |
Abstract:
Traditional fit measures like RMSEA, TLI, or CFI are based on noncentral
chi-square distribution assuming the multinormal distribution of the
observed indicators (Jöreskog 1970). If this assumption has violated programs
like Stata, EQS or LISREL calculate the fit indices using the Sattora-Bentler
correction. It rescales the likelihood ratio chi2 test statistics of the
baseline and the hypothesized model (Satorra & Bentler 1994, Newitt &
Hancock 2000). Brosseu-Liard et al. (2012, 2014) and Savalei (2018) showed
two results in their simulation studies with nonnormal data: First,
they demonstrated that the ad hoc nonnormality corrections of the fit
indices provided by the SEM software made the fit worse, better, or unchanged
compared with their uncorrected counterparts. Second, the authors proposed
new robust versions of RMSEA, CFI, and TLI that performed very well in their
simulation studies. They systematically varied the sample size, the extent of
misspecification, and nonnormality. Therefore, the same rules of thumb or
criteria that are used for normal distributed data can be applied to assess
the fit of the structural equation model.
My robust_gof.ado ado-file stimates the robust RMSEA, CFI, and TLI fit measures using
the corrections proposed by Brosseu-Liard et al. and Savalei. It also estimates
a 90% confidence interval for the root mean squared error of approximation.
robust_gof.ado can be executed after the sem command with the vce(sbentler)
option and estat gof, stats(all) as a postestimation command by simply typing
robust_gof. It returns the estimated fit indices and scalars as r containers.
I will present a survey example of islamophobia analysis in Germany to
demonstrate the usefulness of robust_gof.ado.
Asparouhov, T., and B. Muthén. 2010. Simple second order chi-square correction. MPLUS Working papers. Borsseau-Liard, P.E., V. Savalei, and L. Li. 2012. An investigation of the sample performance of two nonnormality corrections for RMSEA. Multivariate Behavioral Research 47: 904–930. Borsseau-Liard, P.E, and V. Savalei. 2014. Adjusting incremental fit indices for nonnormality. Multivariate Behavioral Research 49: 460–470. Jöreskog, K.G. 1970. A general method for analysis of covariance structures. Biometrika 57: 239–251. Jöreskog, K.G., U.H. Olsson and F.Y. Wallentin. 2016. Multivariate Analysis with LISREL. Additional information: germany19_Langer.pdf
Wolfgang Langer
Universitéit vu Lëzebuerg
Martin-Luther-Universität Halle-Wittenberg |
1:15–1:45 |
Abstract:
The Oaxaca–Blinder (Oaxaca 1973) decomposition approach has been widely
used to attribute group-level differences in an outcome to differences
in endowment, coefficients, and their interactions. This method has been
implemented for Stata in the popular oaxaca command for cross-sectional
analyses (Jann 2008). However, in the last decades, research questions are more
often focused on the decomposition of group-based differences in change over
time, such as diverging income trajectories and decomposition of
change in differences between groups, for example, change in the gender pay gap.
Decomposition analyses can also be extended to longitudinal data
by repeated cross-sectional decompositions and time point-specific decomposition
of group-level differences based on latent growth curve models. We propose to
unify these different research interests under a more general longitudinal
perspective that has each of the applications as a special case of the
Oaxaca–Blinder decomposition. We present this general view, give examples
of applied research questions that can be answered within the framework, and
propose a first version of the command xtoaxaca, which works as a postestimation
command in Stata to maximize flexibility in modeling and forms of
longitudinal decompositions according to the user's preferences.
References: Jann, B. (2008). The Blinder–Oaxaca decomposition for linear regression models. The Stata Journal. 8: 453–479. Oaxaca, R. 1973. Male-female wage differentials in urban labor markets. International Economic Review 14: 693–709. Additional information: germany19_Kröger.pdf
Hannes Kröger
DIW - Deutsches Institut für Wirtschaftsforschung
Jörg Hartman
Universität Göttingen
|
1:45–2:15 |
Abstract:
Linear fixed-effects estimators (first differences, within transformation)
are workhorses of applied econometrics because they straightforwardly allow
for eliminating unobserved time-invariant individual heterogeneity that
otherwise may cause a bias. However, I show that these popular estimators are
biased and inconsistent when applied in a discrete time hazard
setting, that is, with the outcome variable being a binary dummy indicating
an absorbing state. I suggest an alternative, computationally simple
adjusted first-differences estimator. This estimator is shown to be consistent
in the considered nonrepeated event setting under the assumption of
unobserved time-invariant individual heterogeneity being uncorrelated with
the changes in the explanatory variables. Using higher-order differences
instead of first differences allows for consistent estimation under weaker
assumptions. In this presentation, I introduce the new community-contributed
command xtlhazard, which implements the suggested estimation procedure in Stata.
Additional information: germany19_Tauchmann.pdf
Harald Tauchmann
Friedrich-Alexander-Universität Erlangen-Nürnberg
RWI - Leibniz-Institut für Wirtschaftsforschung CINCH - gesundheitsökonomisches Forschungszentrums |
2:15–2:45 |
Abstract:
In this presentation, I will present two new Stata commands to produce heat plots.
Generally speaking, a heat plot is a graph in which one of the dimensions of the
data is visualized using a color gradient. One example of such a plot is a
two-dimensional histogram in which the frequencies of combinations of binned X
and Y are displayed as rectangular (or hexagonal) fields using a color gradient.
Another example is a plot of a trivariate distribution where the color gradient
is used to visualize the (average) value of Z within bins of X and Y. Yet
another example is a plot that displays the contents of a matrix, say, a
correlation matrix or a spacial weights matrix, using a color gradient.
The two commands I will present are called heatplot and hexplot.
Additional information: germany19_Jann.pdf
Ben Jann
Universität Bern
|
3:00–3:30 |
Abstract:
Four years ago, I first suggested extending Stata's label commands to
manipulate variable labels and value labels in a more systematically. By now,
I have refined my earlier approach and released a new suite of commands, elabel,
that facilitate these everyday data management tasks. In contrast to most existing
community-contributed commands to manipulate labels, elabel does not focus
on solving specific problems. Combined with any of Stata's label commands,
it addresses any problem related to variable and value labels. elabel accepts
wildcard characters in value label names, allows referring to value labels via variable
names, selects subsets of integer to text mappings, and applies any of Stata's
functions to define new or modify existing labels. I demonstrate these features
drawing on various examples and show how to write new ado-files to further extend
the elabel commands.
Additional information: germany19_Klein.zip
Daniel Klein
INCHER - Internationales Zentrum für Hochschulforschung
|
3:30–4:00 |
Abstract:
The Global Multidmensional Poverty Index is a cross-country poverty measure
published by the Oxford Poverty and Human Development Initiative since 2010.
The estimation requires household survey data because multidimensional poverty
measures seek to exploit the joint distribution of deprivations in the
identification step of poverty measurement. Moreover, analyses of
multidimensional poverty draw on several aggregate measures (for example,
headcount ratio, intensity), and on dimensional quantities (for example,
indicator contributions). Robustness analyses of key parameters (for example,
poverty cutoffs and weighting schemes) further increase the number of estimates.
During the 2018 revision, for the first time figures for 105 countries were calculated in one round. For a large-scale project like this, a clear and efficient workflow is essential. This presentation introduces key elements of a workflow and presents solutions with Stata for particular problems, including the structure of a comprehensive results file, which facilitates both analysis and production of deliverables, the usability of the estimation files, the collaborative nature of the project, the labelling of 1200 subnational units, and the documentation of code and decisions. This presentation seeks to share the gained experience and to subject both the principal workflow and selected solutions to public scrutiny. Additional information: germany19_Suppa.pdf
Nicolai Suppa
Centre d'Estudis Demogràfics
|
4:15–5:00 |
Abstract:
Discrete choice models are used across many disciplines to analyze
choices made by individuals or other decision-making entities. Stata
supports many discrete choice models, such as multinomial logit and
mixed logit models. While applying these models to a given dataset can be
straightforward, it is often challenging to interpret their results. In this
presentation, I will provide an overview of Stata's discrete choice modeling
capabilities and show how to use postestimation commands to get the most
out of these models and their interpretation.
Additional information: germany19_Luedicke.pdf
Joerg Luedicke
StataCorp
|
5:00–5:30 |
Abstract:
Stata developers present will carefully and cautiously
consider wishes and grumbles from Stata users in the audience.
Questions, and possibly answers, may concern reports of
present bugs and limitations or requests for new features in
future releases of the software.
StataCorp personnel
StataCorp
|
Workshop: Multiple Imputation
by Jan Paul Heisig, Wissenschaftszentrum Berlin für Sozialforschung (WZB)
Date & time
23 May 2019
Description
Missing data are a pervasive problem in the social sciences. Data for a given unit may be missing entirely, for example, because a sampled respondent refused to participate in a survey (survey nonresponse). Alternatively, information may be missing only for a subset of variables (item non-response), for example, because a respondent refused to answer some of the questions in a survey. The traditional way of dealing with item nonresponse, referred to as "complete case analysis" (CCA) or "listwise deletion", excludes any observation with missing information from the analysis. While easy to implement, complete case analysis is wasteful and can lead to biased estimates. Multiple imputation (MI) addresses these issues and provides more efficient and unbiased estimates if certain conditions are met. Therefore, it is increasingly replacing CCA as the method of choice for dealing with item nonresponse in applied quantitative work in the social sciences. The goals of the course are to introduce participants to the principles of MI and its implementation in Stata, with a primary focus on MI using iterated chained equations (also known as "fully conditional specification").
Prerequisites
Basic knowledge of Stata.
Scientific committee
Katrin Auspurg
Ludwig-Maximilians-Universität Munich
Josef Brüderl
Ludwig-Maximilians-Universität Munich
Johannes Giesecke
Humboldt-Universität zu Berlin
Ulrich Kohler
Universität Potsdam
Logistics organizer
The logistics organizer for the 2019 German Stata Users Group meeting is DPC Software GmbH, the distributor of Stata in Germany, the Netherlands, Austria, the Czech Republic, and Hungary.
View the proceedings of previous Stata Users Group meetings.