Last updated: 30 June 2008
2008 German Stata Users Group meeting
Friday, 27 June 2008
WZB Berlin (Wissenschaftszentrum Berlin für Sozialforschung)
Reichpietschufer 50
D-10785 Berlin-Tiergarten
Germany
Proceedings
Using instrumental variables techniques in economics and finance
Christopher F. Baum
Boston College Department of Economics and DIW Berlin
I will discuss the usefulness of instrumental variables (IV) techniques in
addressing research questions in economics and finance. IV methods provide
workable solutions to problems of endogeneity, measurement error and proxy
variables, but they are easily misused. I will present a wide array of
diagnostic techniques that should be employed to validate the use of IV in a
particular context. I will also discuss the advantages of employing the
Generalized Method of Moments form of IV (IV-GMM) and the Continuously
Updated Estimator (GMM-CUE), and I will display some newly developed code that
efficiently employs Stata's Mata programming language to implement the
GMM-CUE.
Additional information
Baum.DESUG8621.beamer.pdf
Ordinal regression models: Problems, solutions, and problems with the solutions
Richard Williams
Notre Dame Department of Sociology
Ordered logit/probit models are among the most popular ordinal regression
techniques. However, these models often have serious problems. The
proportional odds/parallel lines assumptions made by these methods are often
violated. Further, because of the way these models are identified, they have
many of the same limitations as are encountered when analyzing standardized
coefficients in OLS regression, e.g., interaction terms and crosspopulation
comparisons of effects can be highly misleading. This paper shows how
generalized ordered logit/probit models (estimated via
gologit2) and
heterogeneous choice/location scale models (estimated via
oglm) can often
address these concerns in ways that are more parsimonious and easier to
interpret than is the case with other suggested alternatives. At the same
time, the paper cautions that these methods sometimes raise their own
concerns that researchers need to be aware of and know how to deal with.
First, misspecified models can create worse problems than the ones these
methods were designed to solve. Second, estimates are sometimes implausible,
suggesting that the data are being spread too thin and/or yet another method
is needed. Third, multiple and very different interpretations of the same
results are often possible and plausible. I will present guidelines for
identifying and dealing with each of these problems.
Additional information
GSUG2008-Handout.pdf
GSUG2008.pdf
Charts for comparing results between many categories
Ulrich Kohler
WZB
Charts are useful tools for comparing a statistic between groups defined by
a categorical variable with many different categories. It has turned out from
a number of postings on Statalist that Stata’s standard implementation
of these graphs with
graph dot and
graph bar often limits the
the users in their ambition to design such graphs. In most cases, however,
users’ design wishes can be satisfied by reverting to the low-level command
graph twoway. This tutorial talk demonstrates the construction of
charts with
graph twoway. We will start by reconstructing a simple
bar chart with
graph twoway and then move to a number of extensions
that are possible when using
graph twoway. I will illustrate some
trickery with stored results and local macros, as well as a number of useful
user-written programs.
Additional information
kohler.zip
Graph editing
Vince Wiggins
StataCorp
We will take a quick tour of the Graph Editor, covering the basic concepts:
adding text, lines, and markers; changing the defaults for added objects;
changing properties; working quickly by combining the contextual toolbars
with the more complete object dialogs; and using the object browser
effectively. Leveraging these concepts, we will discuss how and when to use
the grid editor and techniques for combined and by-graphs. Finally, we will
look at some tricks and features that are not apparent at first blush.
Relative distribution methods in Stata
Ben Jann
ETH Zürich
The concept of the relative density seems like a fruitful nonparametric
approach to studying distributional differences between groups (Handcock and
Morris 1999), yet it appears that the technique has gone more or less
unnoticed in applied social science research. A scarcity of canned software
might be one of the reasons the method is underutilized. Therefore, I
present a new Stata command called
reldist to plot the relative density,
decompose distributional differences into location and shape effects, and
compute relative distribution summary measures. The command is illustrated
by an application comparing earnings by sex.
Reference:
- Handcock, M. S., and M. Morris. 1999.
- Relative Distribution Methods in the Social Sciences. New York: Springer.
Additional information
jann_reldist_berlin08.pdf
Direct and indirect effects in a logit model
Maarten Buis
Vrije Universiteit, Amsterdam
In this presentation, I discuss a method by Erikson et al. (2005) for
decomposing a total effect in a logit model into direct and indirect effects,
and I propose a generalization of this method. Consider an example where
social class has an indirect effect on attending college through academic
performance in high school. The indirect effect is obtained by comparing the
proportion of lower-class students that attend college with the
counterfactual proportion of lower-class students if they had the
distribution of performance of the higher-class students. This captures the
association between class and attending college because of differences in
performance, i.e., the indirect effect. The direct effect of class is
obtained by comparing the proportion of higher-class students with the
counterfactual proportion of lower-class students if they had the same
distribution of performance as the higher-class students. This way, the
variable performance is kept constant, and this results in the direct effect.
If these comparisons are carried out in the form of log odds ratios, then the
total effect will equal the sum of the direct and indirect effects. In its
original form, this method assumes that the variable through which the
indirect effect occurs is normally distributed. In this article, the method
is generalized by allowing this variable to have any distribution, which has
the added advantage of simplifying the method.
Reference:
- Erikson, R., J. H. Goldthorpe, M. Jackson, M. Yaish, and D. R. Cox. 2005.
- On class differentials in educational attainment. Proceedings of the National Academy of Science
102(27): 9730–9733.
Additional information
Buis.pdf
Multiple imputation using ICE: A simulation study on a binary response
Jochen Hardt
Mathematical Statistics, Chalmers University, Göteborg, Sweden;
Masters Programme, Bernstein Center for Computational Neuroscience, Berlin
Background: Various methods for multiple imputations of missing values are
available in statistical software. They have been shown to work well when
small proportions of missings were to be imputed. However, some researchers
have started to impute large proportions of missings.
Method: We performed a simulation using ICE on datasets of 50/100/200/400
cases and 4/11/25 variables. A varying proportion of data (3–63%) were
randomly set missing and subsequently substituted by multiple
imputation.
Results: (1) It is shown when and how the algorithm breaks down
by decreasing n of cases and increasing number of variables in the model.
(2) Some unexpected results are demonstrated, e.g. flawed coefficients. (3)
Compared to the second program that performs multiple imputations by
chained equations, i.e., “mice” in “R”, the Stata
program, “ice”, results in a slightly higher precision of the
estimates by similar features of the program.
Conclusion: The imputation of missings by chained equations is a useful tool
for imputing small to moderate proportions of missings. The replacement of
larger amounts, however, can be critical.
Additional information
Hardt_missing5.ppt
Using Stata for a memory-saving fixed-effects estimation of the
three-way error-components model
Thomas Cornelissen
Leibniz Universität Hannover
Researchers trying to estimate tens or hundreds of thousands of fixed
effects for two or more groups (workers and firms; pupils, teachers and
schools; etc.) in datasets with high numbers of observations are often
limited by the size of computer memory available. Such a model is
commonly estimated by sweeping out one of the effects by the fixed-effects
transformation (time-demeaning) and by including the remaining effects as
dummy variables. If K is the number of fixed effects to be included as
dummy variables, and N is the number of observations, then the design matrix
is of dimension N x K (neglecting any remaining right-hand-side regressors).
The time-demeaned dummies have to be stored as “float” variables
consuming 8 bytes per cell in Stata. For example, with 2 million
observations (N) and 10 thousand fixed effects (K), the memory requirement
would be 160 gigabytes. This paper describes how the memory requirement can
be reduced to store only a K x K matrix, which in the given example reduces
the memory requirement to below 1 gigabyte. The paper also describes the
Stata program felsdvreg.ado, which implements the method in Mata. Besides
implementing the memory-saving estimation method, the program also takes
care of checking the identification of the effects and provides useful
summary statistics.
Additional information
Cornelissen_2008_German_Stata_Meeting.pdf
Scientific organizers
Johannes Giesecke, University of Mannheim
[email protected]
Ulrich Kohler, WZB
[email protected]
Logistics organizers
The logistics are being organized by Dittrich and Partner
(http://www.dpc.de), the distributor of Stata in several countries including
Germany, The Netherlands, Austria, Czech Republic, and Hungary.