Last updated: 16 August 2011
2011 Stata Conference Chicago
14–15 July 2011
Gleacher Center
The University of Chicago Booth School of Business
450 North Cityfront Plaza Drive
Chicago, IL 60611
Proceedings
Tricks with Hicks: Stata gmm code for nonlinear GMM
Carl Nelson
University of Illinois–Urbana–Champaign
In a June, 2009
American Economic Review article entitled
“Tricks with Hicks: The EASI demand system”, Arthur Lewbel and
Krishna Pendakur proposed the exact affine Stone index demand system. This
system allows Engel curve behavior higher than rank 3, demographics, and
unobserved heterogeneity in tastes. The
American Economic Review web supplement for the article
provides Stata code to estimate linear and iterative linear versions of
the model. But the full nonlinear system instrumental variable estimates
were obtained with TSP econometric software using command
frml to obtain
nonlinear three-stage least-squares estimates. I present Stata code to estimate the nonlinear
exact affine Stone index demand system using the Stata
gmm command. This is an example of the
important estimation extensions that have been made possible by the
introduction of the
gmm command.
Additional information
chi11_nelson.pdf
engel.png
lewbelpendakur09_20.pdf
xtmixed and denominator degrees of freedom: Myth or magic
Phil Ender
UCLA Statistical Consulting Group
I review issues and controversy surrounding
F-ratio denominator degrees
of freedom in linear mixed models. I will look at the
history of denominator degrees of freedom and survey their use in
various statistical packages.
Additional information
chi11_ender.pdf
Using the margins command to estimate and interpret adjusted predictions
and marginal effects
Richard Williams
University of Notre Dame
As Long and Freese show, it can often be helpful to compute
predicted and expected values for hypothetical or prototypical cases. Stata 11
introduced new tools—factor variables and the
margins
command—for making such calculations. These can do many of the things
that were previously done by Stata’s own
adjust and
mfx
commands, as well as Long and Freese’s
spost9 commands like
prvalue. Unfortunately, the complexity of the
margins syntax, the
daunting 50-page reference manual entry that describes it, and a lack of
understanding about what
margins offers over older commands may have
dissuaded researchers from using it. This paper therefore shows how
margins can easily replicate analyses done by older commands. It
demonstrates how
margins provides a superior means for dealing with
interdependent variables (for example,
X and
X2;
X1,
X2, and
X1 ×
X2; multiple dummies created from a
single categorical variable), and is also superior for data that are
svyset. The paper explains how the new
asobserved option works
and the substantive reasons for preferring it over the
atmeans
approach used by older commands. The paper primarily focuses on the
computation of adjusted predictions, but also shows how
margins has
the same advantages for computing marginal effects.
Additional information
chi11_williams.pptx
Using margins to test for group differences in growth
trajectories in generalized linear mixed models
Sarah Mustillo (with L.R. Landerman and K.C. Land)
Purdue University, Duke University School of Medicine, and Duke University
To test for group differences in growth trajectories in mixed (fixed and
random-effects) models, researchers frequently interpret the coefficient of
group-by-time product terms. While this practice is straightforward in
linear mixed models, testing for group differences in generalized linear
mixed models is more complex. Using both an empirical example and simulated
data, we show that the coefficient of group-by-time product terms in mixed
logistic and Poisson models estimate the multiplicative change with respect
to the baseline rates, while researchers often are more interested in
differences in the predicted rate of change between groups. The latter can
be obtained by using the
margins command in Stata. This may be
especially desirable when the mean of the outcome variable is low and
marginal change differs from multiplicative change. We propose and
illustrate the use of
margins to interpret group differences in rates
of change over time following estimation with generalized linear models.
Additional information
chi11_mustillo.pptx
Graphics tips for all
Nicholas J. Cox
Durham University, United Kingdom
Stata’s graphics were completely rewritten for Stata 8, with further
key additions in later versions. Its official commands have, as usual, been
supplemented by a variety of user-written programs. The resulting variety
presents even experienced users with a system that undeniably is large,
often appears complicated, and sometimes seems confusing. In this talk, I
provide a personal digest of graphics strategy and tactics for Stata users;
I emphasize details large and small that, in my view, deserve to be known by
all.
Additional information
chi11_cox.zip
Stata as a data-entry management tool
Ryan Knight
Innovations for Poverty Action
It is increasingly common for social scientists to be involved in primary
data collection, whether through the administration of unique survey
instruments or the execution of field experiments. Novel datasets present
novel challenges for researchers, who may find themselves responsible for
ensuring that any information collected is entered into the computer
accurately. This presentation discusses why and how one might use Stata as a
tool for data-entry management and introduces three new user-written
commands that streamline the data-entry process. The commands are:
cfout, which is an extension of the
cf command that outputs a user-friendly
list of all discrepancies between two datasets (for example, the first and second
entry of a double-entered dataset);
readreplace, which makes many
replacements to a dataset, based on a corrected list of the discrepancies
generated by
cfout; and
mergeall, which merges many files without
loss of information due to string and numeric differences. This suite of
commands can help reduce the cost and increase of the accuracy of primary
data collection, and it extends Stata’s data-management capabilities to
include the management of data entry.
Additional information
chi11_knight.pptx
Universal and mass customization of tables in Stata
Roy Wada
University of Illinois–Chicago
There is a strong demand for a systematic and uniform approach to
table-making, yet it is currently believed that this is not plausible or
is nonexistent in Stata. There is also an impression that tabulation tables
are inherently different from summary tables or regression tables. This
presentation shows that it is possible to design a programmatic, universal
solution once the similarities between the apparently different types of
tables are understood. The universal approach to table-making is implemented
in the latest version of
outreg2. Thus a mass customization of
various types of tables, including cross-tabulations and stub-and-banner
types of tables, can be readily produced in Stata.
Additional information
chi11_wada.pptx
Fractional response models with endogenous explanatory variables and heterogeneity
Jeffrey M. Wooldridge
Michigan State University
In this talk, I will discuss ways of using Stata to fit fractional
response models when explanatory variables are not exogenous. Two questions
are of primary concern: First, how does one account for endogenous
explanatory variables, both continuous and discrete, when the response
variable is fractional and may take values at the corners? Second, how can
we incorporate unobserved heterogeneity in panel-data fractional models when
the panel might be unbalanced? I will draw on Papke and Wooldridge (2008,
Journal of Econometrics 145: 121–133) and two unpublished
papers of mine, “Quasi-maximum likelihood estimation and testing for
nonlinear models with endogenous explanatory variables” and
“Correlated random effects models with unbalanced panels”. One
practically important conclusion is that by expanding the scope of existing
Stata commands to allow fractional responses—in particular, the
ivprobit,
biprobit,
hetprob, and (user-written)
gllamm commands—flexible fractional response models can easily
be fit.
Additional information
chi11_wooldridge.pdf
Causal inference for binary regression with observational data
Austin Nichols
Urban Institute
Special problems arise when trying to do causal inference for binary
regression with observational data; we will examine some of these problems
and critically examine several common and not-so-common solutions.
Additional information
chi11_nichols.pdf
Estimating the parameters of simultaneous-equations models with the sem command in Stata 12
David M. Drukker
StataCorp
In this talk, I introduce Stata 12’s new
sem command for
estimating the parameters of
simultaneous-equations models. Some of the considered models
include unobserved factors. Estimation methods include maximum likelihood
and the generalized method of moments.
Additional information
chi11_drukker_sem.pdf
Calculating bronchiolitis severity using ordinal regression with a new function in Stata
Carl Mitchell (with Paul Walsh)
Kern County Medical Center Department of Emergency Medicine/UCLA
A new command has been developed implementing a previously validated tool
for describing bronchiolitis severity. Bronchiolitis is one of the most
common causes of hospital admission for infants and it is widely studied.
This command classifies predicted severity of illness using an ordinal
regression model. Optionally, the user can obtain the predicted probability of
hospital admission and the probability of an infant falling into a
severity of illness classification different than that predicted.
Additional information
chi11_mitchell.pdf
Teaching statistics with Stata in emergency medicine (EM) journal club
Muhammad Waseem
Lincoln Medical and Mental Health Center
Residency training is an important period when a physician learns and
acquires the necessary skills of searching for, evaluating, and applying medical
knowledge. The journal club is an academic event and an important forum for
this purpose. The objective of the journal club is to learn and develop a
skill to find, appraise, and implement practice-changing advancements in the
medical literature. We report our experience with Stata in journal club in
teaching emergency medicine residents statistics in addition to critical appraisal
skills. To understand and utilize the current literature effectively, an
understanding of basic statistical methods is essential. We introduced Stata
while discussing the methods and results section of an article in the
journal club to teach application of some common statistical tests.
Published studies were selected to illustrate and provide the insight of
commonly used statistical concepts. We noted that improved understanding of
statistics resulted in increased interest and enthusiasm of residents to
participate in journal club. Integrating a statistical software program such
as Stata into journal club can serve as an important tool to enhance learning.
Further studies should be conducted to fully utilize these
opportunities for enhanced learning of in-training physicians.
Additional information
chi11_waseem.pptx
Use of cure fraction models for the survival analysis of uterine cancer patients
Noori Akhtar-Danesh (with Alice Lytwyn and Laurie Elit)
McMaster University
In population-based cancer studies, a cure fraction model
classifies patients into those who survive the cancer and those who
encounter excess mortality risk compared with the general population
(2007,
Stata Journal 7: 1–25). In
this presentation, we report the proportion cured and the relative survival
pattern for patients diagnosed with uterine cancer in Canada over the period
of 1992–2005. We used a nonmixture cure fraction model to estimate
the cure fraction rate and the relative survival among “uncured”
patients (2007,
Stata Journal 7: 1–25). Then we predicted the cure fraction rate and median survival
for each age group based on the year of diagnosis. Relative
survival and cure fraction rate decreased with age but increased gradually
over time. Relative survivals for Eastern Canada and Ontario were lower
compared with the other regions. The same applies to the comparison of
cure fraction rates between the geographical regions. This is
the first study using a cure fraction model for analysis of uterine cancer.
Although there are some limitations attached to this model, it is flexible
enough to be used with different parametric distributions and to include
different link functions for relative survival analysis.
Additional information
chi11_akhtar_danesh.ppt
Using Mata to import Illumina SNP chip data for genome-wide association studies
Chuck Huber (with Michael Hallman, Victoria Friedel,
Melissa Richard, and Huandong Sun)
Texas A&M Health Science Center School of Rural
Public Health and University of Texas School of Public Health
Modern genetic genome-wide association studies typically rely on
single nucleotide polymorphism (SNP) chip technology to determine hundreds
of thousands of genotypes for an individual sample. Once these genotypes are
ascertained, each SNP (alone or in combination) is tested for association
outcomes of interest such as disease status or severity. Project Heartbeat!
was a longitudinal study conducted in the 1990s that explored changes in
lipids and hormones and morphological changes in children from age 8–18
years. A genome-wide association study is currently being conducted to look
for SNPs that are associated with these developmental changes. While there
are specialty programs available for the analysis of hundreds of thousands
of SNPs, they are not capable of modeling longitudinal data. Stata is
well-equipped for modeling longitudinal data but cannot load hundreds of
thousands of variables into memory simultaneously. This talk will briefly
describe the use of Mata to import hundreds of thousands of SNPs from the
Illumina SNP chip platform and how to load those data into Stata for
longitudinal modeling.
Additional information
chi11_huber.pptx
Graphics tricks for models
Bill Rising
StataCorp
Visualizing interactions and response surfaces can be difficult. In this
talk, I will show how to do the former by graphing adjusted means and the
latter by showing how to roll together contour plots. I will demonstrate
this for both linear and nonlinear models.
Additional information
chi11_rising.pdf
chi11_rising_files.zip
Malmquist productivity analysis using DEA frontier in Stata
Choonjoo Lee
Korea National Defense University
In this presentation, the author presents a procedure and an illustrative
application of a user-written Malmquist productivity analysis (MPA) using
data envelopment analysis (DEA) frontier in Stata. MPA measures the
productivity changes for units between time periods. MPA has been used
widely for assessing the productivity changes of public and private sectors,
such as banks, airlines, hospitals, universities, defense firms, and
manufacturers, when the panel data are available. The MPA using DEA frontier
in Stata will allow Stata users to conduct not only the stochastic approach
for productivity analysis using stochastic-frontier analysis but also the nonstochastic
approach using DEA frontier, also suggested by the author. The user-written
MPA approach in Stata will provide some possible future extensions of Stata
programming in productivity analysis.
Additional information
chi11_lee.ppt
chi11_lee_files.zip
An interpretation and implementation of the Theil–Goldberger
“mixed” estimator
Christopher Baum
Boston College and DIW Berlin
In the early 1960s, Theil and Goldberger proposed a
generalized least-squares approach to “mixing” sample
information and prior beliefs about the coefficients of a regression
equation. Their “mixed” estimator may be considered as a
stochastic version of constrained least squares (Stata’s
cnsreg). Although based on frequentist statistics, the Theil–Goldberger estimator
is identical to that used in a Bayesian estimation approach when an
informative prior density is employed. It may also be
viewed as a one-shot application of the Kalman filter,
providing an updating equation for point and interval coefficients based on
prior and sample information. I discuss the
motivation for the estimator and my implementation in Stata code,
tgmixed, and give illustrations of how it might be usefully employed.
Additional information
chi11_baum.pdf
Multilevel regression and poststratification in Stata
Maurizio Pisati (with Valeria Glorioso)
University of Milano–Bicocca and Harvard School of Public Health
Sometimes, social scientists are interested in determining whether, and to
what extent, the distribution of a given variable of interest
Y
varies across the categories of a second variable
D. When the number of
valid observations within one or more categories of
D is small or the
collected data are affected by selection bias, relatively accurate estimates
of
E(
Y|
D) can be obtained by using a proper combination
of multilevel regression modeling and poststratification, called the multilevel regression modeling and poststratification
approach (Gelman and Little 1997,
Survey Methodology 23: 127–135; Gelman and Bafumi 2004,
Political Analysis 12: 375–385; and Lax and Phillips 2009,
American Journal of Political Science 53: 107–121). The purpose of this talk is to illustrate the main features
and applications of
mrp, a new user-written program that implements
the multilevel regression modeling and poststratification approach in Stata.
Additional information
chi11_pisati.pdf
Mata, the missing manual
William W. Gould
StataCorp
Mata is Stata’s matrix programming language. StataCorp provides
detailed documentation on it, but so far has failed to give users—and
especially users who add new features to Stata—any guidance in when
and how to use the language. In this talk, I provide what has been missing.
In practical ways, I show how to include Mata code in Stata ado-files,
reveal when to include Mata code and when not to, and provide an
introduction to the broad concepts of Mata—the concepts that will make the
Mata Reference Manual approachable.
Additional information
chi11_gould.pdf
Stata Graph Library for network analysis
Hirotaka Miura
Federal Reserve Bank of San Francisco
Network analysis is a multidisciplinary research method that is fast
becoming a popular and exciting field of study. Though a number of
statistical programs possess sophisticated packages for analyzing networks,
similar capabilities have yet to be made available in Stata. In an effort to
motivate the use of Stata for network analysis, I designed in Mata the Stata
Graph Library (SGL), which consists of algorithms that construct matrix
representations of networks, compute centrality measures, and calculate
clustering coefficients. Performance tests conducted between C++ and SGL
implementations indicate gross inefficiencies in current SGL routines, making
SGL practically infeasible to be used for large networks. The obstacles are,
however, welcome challenges in the effort to spread the use of Stata as an
instrument for analyzing networks, and future developments will focus on
addressing computational time complexities as well as integrating additional
capabilities into SGL.
Additional information
chi11_miura.pdf
chi11_miura_SGL_version_1.1.2.zip
Filtering and decomposing time series in Stata 12
David M. Drukker
StataCorp
In this talk, I introduce new methods in Stata 12 for filtering and
decomposing time series and I show how to implement them. I
provide an underlying framework for understanding and comparing the
different methods. I also present a framework for interpreting the
parameters.
Additional information
chi11_drukker_filter.pdf
Scientific organizers
Phil Schumm, (chair) University of Chicago
Lisa Barrow, Federal Reserve Bank of Chicago
Scott Long, Indiana University
Rich Williams, University of Notre Dame
Logistics organizers
Chris Farrar, StataCorp
Gretchen Farrar, StataCorp