Last updated: 15 January 2009
2008 Summer North American Stata Users Group meeting
24–25 July 2008
Gleacher Center, University of Chicago
450 North Cityfront Plaza Drive
Chicago, IL 60611
Proceedings
Understanding statistics using simulate
Maarten Buis
Department of Social Research Methodology, Vrije Universiteit Amsterdam
Many of us, at some point, have received a comment from a member of the audience, a
reviewer, or an advisor who thinks the technique used is bad/biased/evil and
who knows of some new fancy method that solves the problem. In those cases,
you often want to know two things: 1) How big is the problem? and 2) Does that new
fancy method actually work? In this talk, I will demonstrate how to
answer these questions using the
simulate command in Stata. I will illustrate
using the following two examples: First, say we have a dependent
variable that is collected not as a continuous variable but as a series of
ranges, e.g., wage measured in categories ($0–5/hour, $6–10/hour,
etc.). How bad is it to assign each category its middle value and treat it
as a continuous variable? How much better is
intreg at dealing with
this problem? Second, various approaches are proposed if we have missing
data. The default in Stata (and most other packages) is to ignore all
observations with missing data. Official Stata also contains the
impute command, and there is the user-written
ice command by
Patrick Royston. This raises the question of which method is the best.
Additional information
buis_MLBsimulate.zip
GMM estimation in Mata
Austin Nichols
Urban Institute
I will present a brief introduction to fitting generalized
method-of-moments models in Stata, using the
optimize() function in
Mata, with applications to nonlinear instrumental-variables models.
Additional information
nichols_gmm.pdf
The effects of single mothers’ welfare participation and work decisions on children’s attainments
Hau Chyi
WISE, Xiamen University, China
Orgul Ozturk
Moore School of Business, University of South Carolina
This research examines the effects of mothers’ welfare and work
decisions on their children’s attainments by using two types of
estimation methods in Stata: 1) an instrumental-variables (IV) approach and
2) a nonlinear simultaneous-equation estimation. The estimator employs
sibling comparisons in a random-effects framework and an IV approach to
address the unobserved heterogeneity that may influence mothers’ work
and welfare decisions. We use the popular Stata command
ivreg2 to
estimate the coefficients. Because production function of a child’s
ability can be written as a nonlinear function in a mother’s
decisions, we can also use the
nlsur command to simultaneously
estimate the production function as well as the (first-stage) IV
projections. We focus on children who were born to single mothers with 12
or fewer years of schooling. IV in this study are welfare use during
childhood and a mother’s expected years of work. The identification
comes from the variation in mothers’ different economic incentives
that arises from the AFDC benefit structures across the United States. The
estimates imply that, relative to no welfare participation, participating
in welfare for one to three years provides up to a 5-percentage-point gain
in a child’s Picture Individual Achievement Test (PIAT) scores. The
negative effect of childhood welfare participation on adult earnings found
by others is not significant if one accounts for mothers’ work
decisions. At the estimated values of the model parameters, a
mother’s number of years of work contributes between $3,000 and
$7,000 1996 dollars to her child’s labor income but has no
significant effect on the child’s PIAT test scores. Finally, the
number of years of schooling for the children is relatively unresponsive to
their mother’s work and welfare participation choices.
Additional information
chyi_est_afdc_short.pdf
Multivariate mixed models for meta-analysis of paired-comparison studies of two medical diagnostic tests
Ben Dwamena
University of Michigan Radiology and VA Nuclear Medicine Service, Ann Arbor, Michigan
I have previously demonstrated Stata implementation of bivariate
random-effects meta-analysis of the sensitivity and specificity of a single
binary diagnostic test by means of the midas module (Dwamena NASUG 2007;
Dwamena WCSUG 2007). In this presentation, I extend the work to
paired-comparison studies of two binary diagnostic tests. Using a dataset of
studies comparing the accuracy of positron emission tomography (PET) and
x-ray computed tomography (CT) for staging lung cancer, I compare the
fit (deviance) and complexity (BIC, AIC) and test performance estimates
(sensitivity, specificity, diagnostic odds ratios, and likelihood ratios) of
four multivariate models: 1) bivariate binomial mixed models with test type as
fixed-effect covariate; 2) bivariate binomial mixed models with test type as
random-effect covariate; 3) independent test-specific bivariate binomial
mixed models; and 4) correlated test-specific bivariate binomial mixed
models. I perform estimation with the Stata-native procedure
xtmelogit
using both the default adaptive quadrature method and its Laplacian
approximation (nip=1). I then compare results with those from the
user-written
gllamm command (written by Sophia Rabe-Hesketh, Andrew Pickles,
and Anders Skrondal).
Additional information
dwamena_snasug2008.pdf
Teaching consumer theory with maximum likelihood estimation of demand systems
Carl Nelson
Agricultural and Consumer Economics, University of Illinois at Urbana–Champaign
The quaids ado-files written by Brian Poi provide a good template for
constructing alternative ado-files for maximum likelihood estimation of
demand systems. I describe how I used the template to construct ado-files to
estimate a five-commodity almost-ideal demand system with demographic
scaling. The system is applied to USDA national food consumption survey
data. The estimation is used as an exercise in a PhD-level microtheory
course that aims to connect the empirical implications of theory with
econometric estimation. I report on how maximum likelihood estimation of
demand systems contributes to student learning of both consumer theory and
nonlinear estimation. I include a discussion of how Mata is used to recover
coefficients from maximum likelihood estimation to perform postestimation
processing like calculation of elasticities.
Additional information
nelson_snasug08.pdf
Semiparametric generalized linear models
Paul Rathouz
Department of Health Studies, University of Chicago
I propose a new class of generalized linear models. As with the existing
models, these new models are specified via a linear predictor and a link
function for the mean of response Y as a function of predictors X. However,
here, the “baseline” distribution of Y when the linear predictor is zero is
left unspecified and is estimated from the data. The response distribution when
the linear predictor differs from zero is then generated via exponential
tilting of the baseline distribution, yielding a response model that is a
member of the natural exponential family, with corresponding canonical link
and variance functions. The resulting model has a similar level of
flexibility as the proportional odds model. Maximum likelihood estimators
are developed for response distribution with finite support, and the new
model is studied and illustrated through simulations and example analyses
from aging and psychiatry research.
Additional information
rathouz_sug_2008.pdf
Using Stata as a computational tool in a relational database environment
Tom Mustillo
Assistant Professor of Political Science, Indiana University–Purdue University Indianapolis
Sarah Mustillo
Associate Professor of Sociology, Purdue University
Stata can be used as a companion to relational database programs to compute
and serve up statistical and nonstandard functions for public use.
This session builds upon previous North America Stata Users Group meetings
on “Translating Data between MySQL and Stata” (2004),
“Working with ODBC Data Sources in Stata” (2004), and
“Integrating Stata with Database Management Systems” (2005) by demonstrating
how a Microsoft Access database of electoral data can call Stata do-files
to compute and/or estimate alternative measures of political party
nationalization. This database uses Stata to compute Jones and Mainwaring’s
(2003) measure of “Party Nationalization” using the
egen_inequal
command and Morgenstern and Potthoff’s (2005) measure of the
“Components of Elections” using
xtmixed. More generally, where
data reside live and for broad public consumption, Stata can play a valuable
role operating behind the scenes for nontechnical users where measures of
conceptual value cannot be generated from within the database environment.
Additional information
mustillo_nasug2008.ppt
USESPSS: Processing SPSS files in Stata
Sergiy Radyakin
The World Bank
The new command
USESPSS allows users to open and process SPSS system
files in Stata for Windows.
USESPSS is a “true reader” in
that it is completely independent from any specialized conversion
software, like Stat/Transfer, and it does not require SPSS
to be installed.
USESPSS converts data files on the fly, preserving
variable labels, value labels, and missing values. Similarly to other conversion
software,
USESPSS optimizes data storage types by looking for the most
efficient way to store SPSS data in Stata’s memory.
USESPSS is
implemented as a plugin and works in a Windows 32-bit environment (however, it
understands SPSS files originating from both Windows and Unix platforms,
compressed and not compressed). The critical portions of its code are
written in assembly language; thus, SPSS data can be used in Stata programs
without a significant loss of performance. In part, the talk will also include
the process of developing plugins for Stata.
Additional information
radyakin_usespss.ppt
Reshaping the World Development Indicators (WDI) for panel data and seemingly unrelated regression modeling in Stata
P. Wilner Jeanty
The Ohio State University
The World Bank’s World Development Indicators (WDI) compilation is a rich
and widely used dataset about development of most economies in the world.
However, after obtaining the data from the World Bank’s website or the
WDI CD-ROM, users need to manage or reorganize the data in a certain way for
statistical applications. The World Bank has made great strides in rendering
WDI in several forms for download. Yet, seemingly unrelated regression
analysis, for example, cannot be performed using any of such structures.
Reorganizing the data for seemingly unrelated regression analysis as well as
renaming the series with meaningful variable names and maintaining the
series descriptors as variable labels in the reshaped dataset represent
significant data-management challenges for the inexperienced Stata user. I
will present a new Stata program,
wdireshape, that reduces data-management
time and effort to zero when the ultimate structure is to fit panel-data and
seemingly unrelated regression models, or to have a dataset with the
countries as rows and the variables for each year as columns.
Additional information
jeanty_nasug08.zip
Estimating the parameters of dynamic panel-data models using Stata
David Drukker
StataCorp
In this talk, I will review dynamic panel-data analysis and how to perform
it using Stata. I also cover static models with predetermined variables.
For each model discussed, I review the econometrics and
then show how to perform the estimation using Stata.
Additional information
drukker_xtdpd.pdf
Estimation of constant-CV regression models
Alan Feiveson
NASA Johnson Space Center
A typical formulation for a linear mixed model is Y = X(be) + Z(u), where
(be) is a vector of “fixed” parameters, (u) is a vector of “random
effects”, and X and Z are matrices whose columns consist of design
variables and/or covariates. In some applications, the elements of Z may
depend on the unknown fixed parameters (be) as well as known covariates. A
common example is when an error variance is proportional to some power of
E(Y), the mean of Y. In particular, if the variance is proportional to the
square of E(Y), we have a constant-CV model. In this talk, I will give examples
of such models, including those with hierarchical structures, and show how
xtmixed can be used to estimate them and do proper inference on the
estimated parameters. I will compare the results with Bayesian estimation
under WINBUGS.
Additional information
feiveson_snasug_2008.ppt
Logistic regression by means of penalized maximum likelihood estimation in cases of separation
Joseph Coveney
Cobridge Co., Ltd.
Users of
logit or
logistic occasionally encounter instances in
which one or more predictors perfectly predict one or both outcomes (a
condition called separation), or in which some outcomes are completely
determined (quasi-complete separation). Finite maximum likelihood estimates
do not exist under conditions of separation. Exact logistic regression with
exlogistic can serve as an alternative in these circumstances but is
sometimes infeasible. In the 1990s, David Firth proposed a type of
penalization for reducing bias of maximum likelihood estimates in
generalized linear models by means of modifying the score equations.
Firth’s method has the interpretation of penalized maximum likelihood
when the canonical link function is used, such as in logistic regression.
In this decade, Georg Heinze and colleagues have explored this technique as
a solution to the problem of separation in logistic regression. I describe a Stata
implementation,
firthlogit, which maximizes the penalized log-likelihood
using
ml. I illustrate its use in model fitting and predictions, inference
with penalized likelihood-ratio tests, and construction of profile
penalized likelihood confidence intervals. I use examples
where
logit and
logistic balk or do not give finite
maximum likelihood estimates, and where exact logistic regression is
problematic because of memory requirements or degenerate conditional
distributions.
Additional information
coveney_snasug08.pps
Finite mixture models
Partha Deb
Hunter College and the Graduate Center, CUNY
Finite mixture models provide a natural way of modeling continuous or
discrete outcomes that are observed from populations consisting of a finite
number of homogeneous subpopulations. Applications of finite mixture models
are abundant in the social and behavioral sciences, biological and
environmental sciences, engineering, and finance. Such models have a natural
representation of heterogeneity in a finite, usually small, number of latent
classes, each of which may be regarded as a type. More generally, the finite
mixture model can be shown to approximate any unknown distribution under
suitable regularity conditions. The Stata package
fmm implements a maximum
likelihood estimator for a class of finite mixture models. In this talk, I
will begin by introducing finite mixture models with a number of examples,
and then I will discuss issues of estimation, testing, and model selection. I will then
describe estimation using
fmm, calculations of predictions, marginal
effects, and posterior class probabilities, and I will illustrate these by using
examples from econometrics and finance.
Additional information
deb_fmm_slides.pdf
Inference for partial effects in nonlinear panel-data models using Stata
Jeffrey Wooldridge
Department of Economics, Michigan State University
Abstract not available.
Additional information
wooldridge.zip
Analyzing survey data using Stata 10
Roberto G. Gutierrez
StataCorp
Stata’s approach to the analysis of data from complex surveys is
unique in that it clearly separates the declaration of the design aspects
of the survey (accomplished by
svyset) from the actual analysis. Such
an arrangement is ideal because the design characteristics of the data do
not change according to the analysis being performed. Whether you are
constructing contingency tables or performing Cox regression, the sampling
weights and primary sampling units (not to mention the other design
specifications) remain constant. Stata’s treatment of survey data makes
it easy to maintain that consistency. Most of Stata’s model fitting and
other analysis commands can be applied easily to survey data, including (with
the release of Stata 10) commands for Cox regression and parametric models
for survival data in a survey setting. This talk is a tutorial on how to
make full use of Stata’s capabilities for survey data. Alternative variance
estimation is a key component of performing valid inference in light of
complex-survey designs, and I will discuss several variance-estimation
options. That discussion will include modern computationally intensive
methods such as balanced and repeated replication, the jackknife, and the
bootstrap, which are made feasible with the advent of better computer
technology. For these three methods, variance estimation can be done
directly or by using a series of replication weights.
Additional information
gutierrez_survey.pdf
Survey bootstrap and bootstrap weights
Stas Kolenikov
Department of Statistics, University of Missouri–Columbia
In this presentation, I will review the bootstrap for complex surveys with
designs featuring stratification, clustering, and unequal probability
weights. I will present the Stata module
bsweights, which creates the
bootstrap weights for designs specified through and supported by
svy.
I will also provide simple demonstrations highlighting the use of the
procedure and its syntax. I will discuss various tuning parameters and
their impact on the performance of the procedure, and I will give arguments
for the bootstrap by the method of weights in nonsurvey settings.
Additional information
kolenikov_snasug08.pdf
kolenikov_bsw-example.do
Analyzing spatial autoregressive models in Stata
David Drukker
StataCorp
In this talk, I will provide a quick introduction to estimators for the
parameters of spatial-autoregressive models and a quick introduction to a
suite of user-written Stata commands for managing spatial data and parameter
estimation.
Additional information
drukker_spatial.pdf
Scientific organizers
Phil Schumm, (chair), University of Chicago
Scott Long, Indiana University
Pravin Trivedi, Indiana University
Richard Williams, University of Notre Dame
Logistics organizers
Chris Farrar, StataCorp
Gretchen Farrar, StataCorp