Last updated: 8 June 2013
2013 German Stata Users Group meeting
Friday, 7 June 2013
University of Potsdam
Germany
Proceedings
Creating complex tables for publication
John Luke Gallup
Portland State University
Complex statistical tables often must be built up by parts from the results
of multiple Stata commands. I show the capabilities of
frmttable and
outreg for creating complex tables, and even fully formatted
statistical appendices, for Word and TeX documents. Precise formatting of
these tables from within Stata has the same benefits as writing do-files
for statistics commands. They are reproducible and reusable when the data
change, saving the user time.
Additional information
de13_gallup.pdf
An expanded framework for mixed process modeling in Stata
David Roodman
Center for Global Development
Roodman (Stata Journal, 2011) introduced the program
cmp for using
maximum likelihood to fit multiequation combinations of Gaussian-based
models such as tobit, probit, ordered probit, multinomial probit, interval
censoring, and continuous linear. This presentation describes substantial
extensions to the framework and software: factor variable support; the
rank-ordered probit model; the ability to specify precensoring truncation in
most model types; hierarchical random effects and coefficients that are
potentially correlated across equations; the ability to include the
unobserved linear variables behind endogenous variables—not just their
observed, censored manifestations—on the right side of other equations
and, when so doing, the allowance for simultaneity in the system of
equations. Contrary to the title of Roodman (2011), models no longer need
be recursive or fully observed.
Additional information
de13_roodman.pptx
Provide, Enrich, and Make Accessible: Using Stata’s Capabilities
for Disseminating NEPS Scientific Use Data
Daniel Bela
National Educational Panel Study (NEPS), Data Center, University of Bamberg
The National Educational Panel Study (NEPS) is rising as one of
Germany's major publisher of scientific use data for educational research.
Disseminating data from six panel cohorts makes not only structured data
editing but also documentation and user support a major challenge. In order
to accomplish this task, the NEPS Data Center has implemented a sophisticated
metadata system. It does not only allow the structured documentation of the
metadata of survey instruments and data files. It also allows one to enrich
the scientific use files with further information, thus significantly easing
access for data analyses. As a result, NEPS provides bilingual dataset files
(German and English) and allows the user to instantly see, for instance, the
exact wording of the question leading to the data in a distinct variable
without leaving the dataset. To achieve this, structured metadata is
attached to the data using Stata's characteristics functionality. To make
handling additional metadata even easier, the NEPS Data Center provides a
package of user-written programs,
NEPStools, to data users. The
presentation will cover an introduction to the NEPS data preparation
workflow, focusing on the metadata system and its role in enriching the
scientific use data by using Stata's capabilities. Afterward,
NEPStools will be introduced.
Additional information
de13_bela.pdf
newspell—Easy Management of Complex Spell Data
Hannes Neiss
German Institute for Economic Research
Biographical data gathered in surveys is often stored in spell format,
allowing for overlaps between spell states. This gives useful information to
researchers but leaves them with a very complex data structure, which is
not easy to handle. I present my work on the ado-package newspell. It
includes several subprograms for management of complex spell data. Spell
states can be merged, reducing the overall number of spells. newspell
allows a user to fill gaps with information from spells before and after the
gap, given a user-defined preference. However, the two most important
features of newspell are, first, the ability to rank spells and cut off
overlaps according to the rank order. This is a necessary step before
performing, for example, sequence analysis on spell data. Second, newspell
can combine overlapping spells into new categories of spells, generating
entirely new states. This is useful for cleaning data, for analyzing
simultaneity of states, or for combining two spell datasets that have
information on different kinds of states (for example, labor market and
marital status). newspell is useful for users who are not familiar with
complex spell data and have little experience in Stata programming for data
management. For experienced users, it saves a lot of time and coding work.
Additional information
de13_kroeger.pdf
Instrumental variables estimation using heteroskedasticity-based
instruments
Christopher F. Baum
Boston College
Arthur Lewbel
Boston College
Mark E. Schaffer
Heriot–Watt University, Edinburgh
Oleksandr Talavera
University of Sheffield
In a 2012 article in the Journal of Business and Economic Statistics, Arthur
Lewbel presented the theory of allowing the identification and estimation of
"mismeasured and endogenous regressor models" by exploiting
heteroskedasticity. These models include linear regression models
customarily estimated with instrumental variables (IV) or IV-GMM techniques.
Lewbel's method, under suitable conditions, can provide instruments where no
conventional instruments are available or augment standard instruments to
enable tests of overidentification in the context of an exactly identified
model. In this talk, I discuss the rationale for Lewbel's methodology and
illustrate its implementation in a variant of Baum, Schaffer, and Stillman'
sivreg2 routine,
ivreg2h.
Additional information
de13_baum.pdf
Using simulation to inspect the performance of a test, in
particular tests of the parallel regressions assumption in ordered logit and
probit models
Maarten L. Buis
Social Science Research Center (WZB)
Richard Williams
University of Notre Dame
In this talk, we will show how to use simulations in Stata to explore to
what extent and under what circumstances a test is problematic. We will
illustrate this for a set of tests of the parallel regression assumption in
ordered logit and probit models: the Brant, likelihood ratio, Wald, score,
and Wolfe-Gould test of the parallel regression assumption. A common
impression is that these tests tend to be too anti-conservative; that is,
they tend to reject a true null hypothesis too often. We will use
simulations to try to quantify when and to what extent this is the case. We
will also use these simulations to create a more robust bootstrap variation
of the tests. The purpose of this talk is twofold: first, we want to explore
the performance of these tests. For this purpose, we will present a new
program, oparallel, that implements all tests and their bootstrap variation.
Second, we want to give more general advice on how to use Stata to create
simulations when one has doubts about a certain test. For this purpose, we
will present the
simpplot command, which can help to interpret the
p-values returned by such a simulation.
Additional information
de13_buis.pdf
Fitting Complex Mixed Logit Models with Particular Focus on
Labor Supply Estimation
Max Löffler
Institute for the Study of Labor (IZA)
When one estimates discrete choice models, the mixed logit approach is
commonly superior to simple conditional logit setups. Mixed logit models not
only allow the researcher to implement difficult random components but also
overcome the restrictive IIA assumption. Despite these theoretical
advantages, the estimation of mixed logit models becomes cumbersome when the
model’s complexity increases. Applied works therefore often rely on rather
simple empirical specifications because this reduces the computational
burden. I introduce the user-written command
lslogit, which fits
complex mixed logit models using maximum simulated likelihood methods. As
lslogit is a d2-ML-evaluator written in Mata, the estimation is
rather efficient compared with other routines. It allows the researcher to
specify complicated structures of unobserved heterogeneity and to choose
from a set of frequently used functional forms for the direct utility
function—for example, Box-Cox transformations, which are difficult to
estimate in the context of logit models. The particular focus of
lslogit is on the estimation of labor supply models in the discrete
choice context; therefore, it facilitates several computationally exhausting
but standard tasks in this research area. However, the command can be used
in many other applications of mixed logit models as well.
Additional information
de13_loeffler.pdf
Simulated Multivariate Random Effects Probit Models for Unbalanced Panels
Alexander Plum
Otto-von-Guericke University Magdeburg
This paper develops an implementation method of a simulated multivariate
random-effects probit model for unbalanced panels, illustrating it by using
artificial data. By mdraws, generated Halton draws are used to simulate
multivariate normal probabilities with the command
mvnp(). The
estimator can be easily adjusted (for example, to allow for autocorrelated
errors). Advantages of this simulated estimation are high accuracy and lower
computation time compared with existing commands such as
redpace.
Additional information
de13_plum.pdf
xsmle—A Command to Estimate Spatial Panel Models in Stata
Federico Belotti
University of Rome "Tor Vergata"
Gordon Hughes
University of Edinburgh
Andrea Piano Mortari
University of Rome "Tor Vergata"
Econometricians have begun to devote more attention to spatial interactions
when carrying out applied econometric studies. The new command we are
presenting,
xsmle, fits fixed- and random-effects spatial models for
balanced panel data for a wide range of specifications: the spatial
autoregressive model, spatial error model, spatial Durbin model, spatial
autoregressive model with autoregressive disturbances, and generalized
spatial random effect model with or without a dynamic component. Different
weighting matrices may be specified for different components of the models
and both Stata matrices and spmat objects are allowed. Furthermore,
xsmle calculates direct, indirect, and total effects according to
Lesage (2008), implements Lee and Yu (2010) data transformation for
fixed-effects models, and may be used with
mi prefix when the panel
is unbalanced.
Additional information
de13_mortari.pdf
Estimating the dose-response function through the GLM approach
Barbara Guardabascio
Italian National Institute of Statistics, Rome
Marco Ventura
Italian National Institute of Statistics, Rome
How effective are policy programs with continuous treatment exposure?
Answering this question essentially amounts to estimating a dose-response
function as proposed in Hirano and Imbens (2004). Whenever doses are not
randomly assigned but are given under experimental conditions, estimation
of a dose-response function is possible using the Generalized Propensity
Score (GPS). Since its formulation, the GPS has been repeatedly used in
observational studies, and ad hoc programs have been provided for Stata users
(
doseresponse and
gpscore, Bia and Mattei 2008). However, many
applied works remark that the treatment variable may not be normally
distributed. In this case, the Stata programs are not usable because they do
not allow for different distribution assumptions other than the normal
density. In this paper, we overcome this problem. Building on Bia and
Mattei's (2008) programs, we provide
doseresponse2 and
gpscore, which allow one to accommodate different distribution
functions of the treatment variable. This task is accomplished through by
the application of the generalized linear models estimator in the first step
instead of the application of maximum likelihood. In such a way, the user
can have a very versatile tool capable of handling many practical
situations. It is worth highlighting that our programs, among the many
alternatives, take into account the possibility to consistently use the GPS
estimator when the treatment variable is fractional, the flogit case by
Papke and Wooldridge (1998), a case of particular interest for economists.
Additional information
de13_ventura.ppt
Predictive Margins and Marginal Effects in Stata
Ben Jann
University of Bern
Tables of estimated regression coefficients, usually accompanied by
additional information such as standard errors,
t statistics,
p-values, confidence intervals, or significance stars, have long been
the preferred way of communicating results from statistical models. In
recent years, however, the limits of this form of exposition have been
increasingly recognized. For example, interpretation of regression tables
can be very challenging in the presence of complications such as interaction
effects, categorical variables, or nonlinear functional forms. Furthermore,
while these issues might still be manageable in the case of linear
regression, interpretational difficulties can be overwhelming in nonlinear
models (for example, logistic regression). To facilitate sensible
interpretation of these models, one must often compute additional results
such as marginal effects, predictive margins, or contrasts. Moreover, smart
graphical displays of results can be very valuable in making complex
relations accessible. A number of helpful commands geared at supporting
these tasks have been recently introduced in Stata, making elaborate
interpretation and communication of regression results possible without much
extra effort. Examples of these commands are
margins,
contrasts, and
marginsplot. In my talk, I will discuss the
capabilities of these commands and present a range of examples illustrating
their use.
Additional information
de13_jann.pdf
Scientific organizers
Johannes Giesecke, University of Bamberg
[email protected]
Ulrich Kohler, University of Potsdam
[email protected]
Logistics organizers
The conference is sponsored and organized by Dittrich & Partner Consulting GmbH
(http://www.dpc.de),
the distributor of Stata in several countries, including
Germany, The Netherlands, Austria, Czech Republic, and Hungary.