Last updated: 15 January 2009
 2008 Summer North American Stata Users Group meeting 
 24–25 July 2008 
  
  Gleacher Center, University of Chicago
  450 North Cityfront Plaza Drive 
  Chicago, IL 60611
Proceedings
Understanding statistics using simulate
Maarten Buis
Department of Social Research Methodology, Vrije Universiteit Amsterdam
  Many of us, at some point, have received a comment from a member of the audience, a
  reviewer, or an advisor who thinks the technique used is bad/biased/evil and
  who knows of some new fancy method that solves the problem. In those cases,
  you often want to know two things: 1) How big is the problem? and 2) Does that new
  fancy method actually work? In this talk, I will demonstrate how to
  answer these questions using the 
simulate command in Stata. I will illustrate
  using the following two examples: First, say we have a dependent
  variable that is collected not as a continuous variable but as a series of
  ranges, e.g., wage measured in categories ($0–5/hour, $6–10/hour,
  etc.). How bad is it to assign each category its middle value and treat it
  as a continuous variable? How much better is 
intreg at dealing with
  this problem?  Second, various approaches are proposed if we have missing
  data. The default in Stata (and most other packages) is to ignore all
  observations with missing data. Official Stata also contains the
  
impute command, and there is the user-written 
ice command by
  Patrick Royston. This raises the question of which method is the best.
  
   
Additional information
   buis_MLBsimulate.zip
GMM estimation in Mata
Austin Nichols
Urban Institute
  I will present a brief introduction to fitting generalized
  method-of-moments models in Stata, using the 
optimize() function in
  Mata, with applications to nonlinear instrumental-variables models.
  
   
Additional information
   nichols_gmm.pdf
The effects of single mothers’ welfare participation and work decisions on children’s attainments
Hau Chyi
WISE, Xiamen University, China
Orgul Ozturk
Moore School of Business, University of South Carolina
   This research examines the effects of mothers’ welfare and work
   decisions on their children’s attainments by using two types of
   estimation methods in Stata: 1) an instrumental-variables (IV) approach and
   2) a nonlinear simultaneous-equation estimation. The estimator employs
   sibling comparisons in a random-effects framework and an IV approach to
   address the unobserved heterogeneity that may influence mothers’ work
   and welfare decisions. We use the popular Stata command 
ivreg2 to
   estimate the coefficients. Because production function of a child’s
   ability can be written as a nonlinear function in a mother’s
   decisions, we can also use the 
nlsur command to simultaneously
   estimate the production function as well as the (first-stage) IV
   projections. We focus on children who were born to single mothers with 12
   or fewer years of schooling. IV in this study are welfare use during
   childhood and a mother’s expected years of work. The identification
   comes from the variation in mothers’ different economic incentives
   that arises from the AFDC benefit structures across the United States.  The
   estimates imply that, relative to no welfare participation, participating
   in welfare for one to three years provides up to a 5-percentage-point gain
   in a child’s Picture Individual Achievement Test (PIAT) scores. The
   negative effect of childhood welfare participation on adult earnings found
   by others is not significant if one accounts for mothers’ work
   decisions. At the estimated values of the model parameters, a
   mother’s number of years of work contributes between $3,000 and
   $7,000 1996 dollars to her child’s labor income but has no
   significant effect on the child’s PIAT test scores.  Finally, the
   number of years of schooling for the children is relatively unresponsive to
   their mother’s work and welfare participation choices.
  
   
Additional information
   chyi_est_afdc_short.pdf
Multivariate mixed models for meta-analysis of paired-comparison studies of two medical diagnostic tests
Ben Dwamena
University of Michigan Radiology and VA Nuclear Medicine Service, Ann Arbor, Michigan
  I have previously demonstrated Stata implementation of bivariate
  random-effects meta-analysis of the sensitivity and specificity of a single
  binary diagnostic test by means of the midas module (Dwamena NASUG 2007;
  Dwamena WCSUG 2007). In this presentation, I extend the work to
  paired-comparison studies of two binary diagnostic tests. Using a dataset of
  studies comparing the accuracy of positron emission tomography (PET) and
  x-ray computed tomography (CT) for staging lung cancer, I compare the
  fit (deviance) and complexity (BIC, AIC) and test performance estimates
  (sensitivity, specificity, diagnostic odds ratios, and likelihood ratios) of
  four multivariate models: 1) bivariate binomial mixed models with test type as
  fixed-effect covariate; 2) bivariate binomial mixed models with test type as
  random-effect covariate; 3) independent test-specific bivariate binomial
  mixed models; and 4) correlated test-specific bivariate binomial mixed
  models. I perform estimation with the Stata-native procedure 
xtmelogit
  using both the default adaptive quadrature method and its Laplacian
  approximation (nip=1). I then compare results with those from the
  user-written 
gllamm command (written by Sophia Rabe-Hesketh, Andrew Pickles,
  and Anders Skrondal).
  
   
Additional information
   dwamena_snasug2008.pdf
Teaching consumer theory with maximum likelihood estimation of demand systems
Carl Nelson
Agricultural and Consumer Economics, University of Illinois at Urbana–Champaign
  The quaids ado-files written by Brian Poi provide a good template for
  constructing alternative ado-files for maximum likelihood estimation of
  demand systems. I describe how I used the template to construct ado-files to
  estimate a five-commodity almost-ideal demand system with demographic
  scaling. The system is applied to USDA national food consumption survey
  data. The estimation is used as an exercise in a PhD-level microtheory
  course that aims to connect the empirical implications of theory with
  econometric estimation. I report on how maximum likelihood estimation of
  demand systems contributes to student learning of both consumer theory and
  nonlinear estimation. I include a discussion of how Mata is used to recover
  coefficients from maximum likelihood estimation to perform postestimation
  processing like calculation of elasticities.
  
   Additional information
   nelson_snasug08.pdf
Semiparametric generalized linear models
Paul Rathouz
Department of Health Studies, University of Chicago
  I propose a new class of generalized linear models. As with the existing
  models, these new models are specified via a linear predictor and a link
  function for the mean of response Y as a function of predictors X. However,
  here, the “baseline” distribution of Y when the linear predictor is zero is
  left unspecified and is estimated from the data. The response distribution when
  the linear predictor differs from zero is then generated via exponential
  tilting of the baseline distribution, yielding a response model that is a
  member of the natural exponential family, with corresponding canonical link
  and variance functions. The resulting model has a similar level of
  flexibility as the proportional odds model. Maximum likelihood estimators
  are developed for response distribution with finite support, and the new
  model is studied and illustrated through simulations and example analyses
  from aging and psychiatry research.
  
   Additional information
   rathouz_sug_2008.pdf
Using Stata as a computational tool in a relational database environment
Tom Mustillo
Assistant Professor of Political Science, Indiana University–Purdue University Indianapolis
Sarah Mustillo
Associate Professor of Sociology, Purdue University
  Stata can be used as a companion to relational database programs to compute
  and serve up statistical and nonstandard functions for public use.
  This session builds upon previous North America Stata Users Group meetings
  on “Translating Data between MySQL and Stata” (2004),
  “Working with ODBC Data Sources in Stata” (2004), and
  “Integrating Stata with Database Management Systems” (2005) by demonstrating
  how a Microsoft Access database of electoral data can call Stata do-files
  to compute and/or estimate alternative measures of political party
  nationalization. This database uses Stata to compute Jones and Mainwaring’s
  (2003) measure of “Party Nationalization” using the 
egen_inequal
  command and Morgenstern and Potthoff’s (2005) measure of the
  “Components of Elections” using 
xtmixed.  More generally, where
  data reside live and for broad public consumption, Stata can play a valuable
  role operating behind the scenes for nontechnical users where measures of
  conceptual value cannot be generated from within the database environment.
  
   
Additional information
   mustillo_nasug2008.ppt
USESPSS: Processing SPSS files in Stata
Sergiy Radyakin
The World Bank
  The new command 
USESPSS allows users to open and process SPSS system
  files in Stata for Windows. 
USESPSS is a “true reader” in
  that it is completely independent from any specialized conversion
  software, like Stat/Transfer, and it does not require SPSS 
  to be installed. 
USESPSS converts data files on the fly, preserving
  variable labels, value labels, and missing values. Similarly to other conversion
  software, 
USESPSS optimizes data storage types by looking for the most
  efficient way to store SPSS data in Stata’s memory. 
USESPSS is
  implemented as a plugin and works in a Windows 32-bit environment (however, it
  understands SPSS files originating from both Windows and Unix platforms,
  compressed and not compressed). The critical portions of its code are
  written in assembly language; thus, SPSS data can be used in Stata programs
  without a significant loss of performance. In part, the talk will also include
  the process of developing plugins for Stata.
  
   
Additional information
   radyakin_usespss.ppt
Reshaping the World Development Indicators (WDI) for panel data and seemingly unrelated regression modeling in Stata
P. Wilner Jeanty
The Ohio State University
  The World Bank’s World Development Indicators (WDI) compilation is a rich
  and widely used dataset about development of most economies in the world.
  However, after obtaining the data from the World Bank’s website or the
  WDI CD-ROM, users need to manage or reorganize the data in a certain way for
  statistical applications. The World Bank has made great strides in rendering
  WDI in several forms for download. Yet, seemingly unrelated regression
  analysis, for example, cannot be performed using any of such structures.
  Reorganizing the data for seemingly unrelated regression analysis as well as
  renaming the series with meaningful variable names and maintaining the
  series descriptors as variable labels in the reshaped dataset represent
  significant data-management challenges for the inexperienced Stata user. I
  will present a new Stata program, 
wdireshape, that reduces data-management
  time and effort to zero when the ultimate structure is to fit panel-data and
  seemingly unrelated regression models, or to have a dataset with the
  countries as rows and the variables for each year as columns.
  
   
Additional information
   jeanty_nasug08.zip
Estimating the parameters of dynamic panel-data models using Stata
David Drukker
StataCorp
  In this talk, I will review dynamic panel-data analysis and how to perform
  it using Stata.  I also cover static models with predetermined variables.
  For each model discussed, I review the econometrics and
  then show how to perform the estimation using Stata.
  
   Additional information
   drukker_xtdpd.pdf
Estimation of constant-CV regression models
Alan Feiveson
NASA Johnson Space Center
   A typical formulation for a linear mixed model is Y = X(be) + Z(u), where
   (be) is a vector of “fixed” parameters, (u) is a vector of “random
   effects”, and X and Z are matrices whose columns consist of design
   variables and/or covariates. In some applications, the elements of Z may
   depend on the unknown fixed parameters (be) as well as known covariates. A
   common example is when an error variance is proportional to some power of
   E(Y), the mean of Y.  In particular, if the variance is proportional to the
   square of E(Y), we have a constant-CV model. In this talk, I will give examples
   of such models, including those with hierarchical structures, and show how
   
xtmixed can be used to estimate them and do proper inference on the
   estimated parameters. I will compare the results with Bayesian estimation
   under WINBUGS.
  
   
Additional information
   feiveson_snasug_2008.ppt
Logistic regression by means of penalized maximum likelihood estimation in cases of separation
Joseph Coveney
Cobridge Co., Ltd.
   Users of 
logit or 
logistic occasionally encounter instances in
   which one or more predictors perfectly predict one or both outcomes (a
   condition called separation), or in which some outcomes are completely
   determined (quasi-complete separation). Finite maximum likelihood estimates
   do not exist under conditions of separation. Exact logistic regression with
   
exlogistic can serve as an alternative in these circumstances but is
   sometimes infeasible. In the 1990s, David Firth proposed a type of
   penalization for reducing bias of maximum likelihood estimates in
   generalized linear models by means of modifying the score equations.
   Firth’s method has the interpretation of penalized maximum likelihood
   when the canonical link function is used, such as in logistic regression.
   In this decade, Georg Heinze and colleagues have explored this technique as
   a solution to the problem of separation in logistic regression. I describe a Stata
   implementation, 
firthlogit, which maximizes the penalized log-likelihood
   using 
ml. I illustrate its use in model fitting and predictions, inference
   with penalized likelihood-ratio tests, and construction of profile
   penalized likelihood confidence intervals. I use examples
   where 
logit and 
logistic balk or do not give finite
   maximum likelihood estimates, and where exact logistic regression is
   problematic because of memory requirements or degenerate conditional
   distributions.
  
   
Additional information
   coveney_snasug08.pps
Finite mixture models
Partha Deb
Hunter College and the Graduate Center, CUNY
  Finite mixture models provide a natural way of modeling continuous or
  discrete outcomes that are observed from populations consisting of a finite
  number of homogeneous subpopulations. Applications of finite mixture models
  are abundant in the social and behavioral sciences, biological and
  environmental sciences, engineering, and finance. Such models have a natural
  representation of heterogeneity in a finite, usually small, number of latent
  classes, each of which may be regarded as a type. More generally, the finite
  mixture model can be shown to approximate any unknown distribution under
  suitable regularity conditions. The Stata package 
fmm implements a maximum
  likelihood estimator for a class of finite mixture models. In this talk, I
  will begin by introducing finite mixture models with a number of examples,
  and then I will discuss issues of estimation, testing, and model selection. I will then
  describe estimation using 
fmm, calculations of predictions, marginal
  effects, and posterior class probabilities, and I will illustrate these by using
  examples from econometrics and finance.
  
   
Additional information
   deb_fmm_slides.pdf
Inference for partial effects in nonlinear panel-data models using Stata
Jeffrey Wooldridge
Department of Economics, Michigan State University
  Abstract not available.
  
   Additional information
   wooldridge.zip
Analyzing survey data using Stata 10
Roberto G. Gutierrez
StataCorp
  Stata’s approach to the analysis of data from complex surveys is
  unique in that it clearly separates the declaration of the design aspects
  of the survey (accomplished by 
svyset) from the actual analysis.  Such
  an arrangement is ideal because the design characteristics of the data do
  not change according to the analysis being performed.  Whether you are
  constructing contingency tables or performing Cox regression, the sampling
  weights and primary sampling units (not to mention the other design
  specifications) remain constant. Stata’s treatment of survey data makes
  it easy to maintain that consistency.  Most of Stata’s model fitting and
  other analysis commands can be applied easily to survey data, including (with
  the release of Stata 10) commands for Cox regression and parametric models
  for survival data in a survey setting.  This talk is a tutorial on how to
  make full use of Stata’s capabilities for survey data.  Alternative variance
  estimation is a key component of performing valid inference in light of
  complex-survey designs, and I will discuss several variance-estimation
  options.  That discussion will include modern computationally intensive
  methods such as balanced and repeated replication, the jackknife, and the
  bootstrap, which are made feasible with the advent of better computer
  technology.  For these three methods, variance estimation can be done
  directly or by using a series of replication weights.
  
   
Additional information
   gutierrez_survey.pdf
Survey bootstrap and bootstrap weights
Stas Kolenikov
Department of Statistics, University of Missouri–Columbia
   In this presentation, I will review the bootstrap for complex surveys with
   designs featuring stratification, clustering, and unequal probability
   weights. I will present the Stata module 
bsweights, which creates the
   bootstrap weights for designs specified through and supported by 
svy.
   I will also provide simple demonstrations highlighting the use of the
   procedure and its syntax. I will discuss various tuning parameters and
   their impact on the performance of the procedure, and I will give arguments
   for the bootstrap by the method of weights in nonsurvey settings.
  
   
Additional information
   kolenikov_snasug08.pdf
   kolenikov_bsw-example.do
Analyzing spatial autoregressive models in Stata
David Drukker
StataCorp
  In this talk, I will provide a quick introduction to estimators for the
  parameters of spatial-autoregressive models and a quick introduction to a
  suite of user-written Stata commands for managing spatial data and parameter
  estimation.
  
   Additional information
   drukker_spatial.pdf
Scientific organizers
Phil Schumm, (chair), University of Chicago
Scott Long, Indiana University
Pravin Trivedi, Indiana University
Richard Williams, University of Notre Dame
Logistics organizers
Chris Farrar, StataCorp 
Gretchen Farrar, StataCorp