2008 Summer North American Stata Users Group meeting

Home / Resources & support / User Group meetings / 2008 Summer North American Stata Users Group meeting

Last updated: 15 January 2009

2008 Summer North American Stata Users Group meeting

24–25 July 2008

Gleacher Center, University of Chicago
450 North Cityfront Plaza Drive
Chicago, IL 60611

Proceedings

Understanding statistics using simulate

Maarten Buis

Department of Social Research Methodology, Vrije Universiteit Amsterdam

Many of us, at some point, have received a comment from a member of the audience, a reviewer, or an advisor who thinks the technique used is bad/biased/evil and who knows of some new fancy method that solves the problem. In those cases, you often want to know two things: 1) How big is the problem? and 2) Does that new fancy method actually work? In this talk, I will demonstrate how to answer these questions using the simulate command in Stata. I will illustrate using the following two examples: First, say we have a dependent variable that is collected not as a continuous variable but as a series of ranges, e.g., wage measured in categories ($0–5/hour, $6–10/hour, etc.). How bad is it to assign each category its middle value and treat it as a continuous variable? How much better is intreg at dealing with this problem? Second, various approaches are proposed if we have missing data. The default in Stata (and most other packages) is to ignore all observations with missing data. Official Stata also contains the impute command, and there is the user-written ice command by Patrick Royston. This raises the question of which method is the best.

Additional information
buis_MLBsimulate.zip

GMM estimation in Mata

Austin Nichols

Urban Institute

I will present a brief introduction to fitting generalized method-of-moments models in Stata, using the optimize() function in Mata, with applications to nonlinear instrumental-variables models.

Additional information
nichols_gmm.pdf

The effects of single mothers’ welfare participation and work decisions on children’s attainments

Hau Chyi

WISE, Xiamen University, China

Orgul Ozturk

Moore School of Business, University of South Carolina

This research examines the effects of mothers’ welfare and work decisions on their children’s attainments by using two types of estimation methods in Stata: 1) an instrumental-variables (IV) approach and 2) a nonlinear simultaneous-equation estimation. The estimator employs sibling comparisons in a random-effects framework and an IV approach to address the unobserved heterogeneity that may influence mothers’ work and welfare decisions. We use the popular Stata command ivreg2 to estimate the coefficients. Because production function of a child’s ability can be written as a nonlinear function in a mother’s decisions, we can also use the nlsur command to simultaneously estimate the production function as well as the (first-stage) IV projections. We focus on children who were born to single mothers with 12 or fewer years of schooling. IV in this study are welfare use during childhood and a mother’s expected years of work. The identification comes from the variation in mothers’ different economic incentives that arises from the AFDC benefit structures across the United States. The estimates imply that, relative to no welfare participation, participating in welfare for one to three years provides up to a 5-percentage-point gain in a child’s Picture Individual Achievement Test (PIAT) scores. The negative effect of childhood welfare participation on adult earnings found by others is not significant if one accounts for mothers’ work decisions. At the estimated values of the model parameters, a mother’s number of years of work contributes between $3,000 and $7,000 1996 dollars to her child’s labor income but has no significant effect on the child’s PIAT test scores. Finally, the number of years of schooling for the children is relatively unresponsive to their mother’s work and welfare participation choices.

Additional information
chyi_est_afdc_short.pdf

Multivariate mixed models for meta-analysis of paired-comparison studies of two medical diagnostic tests

Ben Dwamena

University of Michigan Radiology and VA Nuclear Medicine Service, Ann Arbor, Michigan

I have previously demonstrated Stata implementation of bivariate random-effects meta-analysis of the sensitivity and specificity of a single binary diagnostic test by means of the midas module (Dwamena NASUG 2007; Dwamena WCSUG 2007). In this presentation, I extend the work to paired-comparison studies of two binary diagnostic tests. Using a dataset of studies comparing the accuracy of positron emission tomography (PET) and x-ray computed tomography (CT) for staging lung cancer, I compare the fit (deviance) and complexity (BIC, AIC) and test performance estimates (sensitivity, specificity, diagnostic odds ratios, and likelihood ratios) of four multivariate models: 1) bivariate binomial mixed models with test type as fixed-effect covariate; 2) bivariate binomial mixed models with test type as random-effect covariate; 3) independent test-specific bivariate binomial mixed models; and 4) correlated test-specific bivariate binomial mixed models. I perform estimation with the Stata-native procedure xtmelogit using both the default adaptive quadrature method and its Laplacian approximation (nip=1). I then compare results with those from the user-written gllamm command (written by Sophia Rabe-Hesketh, Andrew Pickles, and Anders Skrondal).

Additional information
dwamena_snasug2008.pdf

Teaching consumer theory with maximum likelihood estimation of demand systems

Carl Nelson

Agricultural and Consumer Economics, University of Illinois at Urbana–Champaign

The quaids ado-files written by Brian Poi provide a good template for constructing alternative ado-files for maximum likelihood estimation of demand systems. I describe how I used the template to construct ado-files to estimate a five-commodity almost-ideal demand system with demographic scaling. The system is applied to USDA national food consumption survey data. The estimation is used as an exercise in a PhD-level microtheory course that aims to connect the empirical implications of theory with econometric estimation. I report on how maximum likelihood estimation of demand systems contributes to student learning of both consumer theory and nonlinear estimation. I include a discussion of how Mata is used to recover coefficients from maximum likelihood estimation to perform postestimation processing like calculation of elasticities.

Additional information
nelson_snasug08.pdf

Semiparametric generalized linear models

Paul Rathouz

Department of Health Studies, University of Chicago

I propose a new class of generalized linear models. As with the existing models, these new models are specified via a linear predictor and a link function for the mean of response Y as a function of predictors X. However, here, the “baseline” distribution of Y when the linear predictor is zero is left unspecified and is estimated from the data. The response distribution when the linear predictor differs from zero is then generated via exponential tilting of the baseline distribution, yielding a response model that is a member of the natural exponential family, with corresponding canonical link and variance functions. The resulting model has a similar level of flexibility as the proportional odds model. Maximum likelihood estimators are developed for response distribution with finite support, and the new model is studied and illustrated through simulations and example analyses from aging and psychiatry research.

Additional information
rathouz_sug_2008.pdf

Using Stata as a computational tool in a relational database environment

Tom Mustillo

Assistant Professor of Political Science, Indiana University–Purdue University Indianapolis

Sarah Mustillo

Associate Professor of Sociology, Purdue University

Stata can be used as a companion to relational database programs to compute and serve up statistical and nonstandard functions for public use. This session builds upon previous North America Stata Users Group meetings on “Translating Data between MySQL and Stata” (2004), “Working with ODBC Data Sources in Stata” (2004), and “Integrating Stata with Database Management Systems” (2005) by demonstrating how a Microsoft Access database of electoral data can call Stata do-files to compute and/or estimate alternative measures of political party nationalization. This database uses Stata to compute Jones and Mainwaring’s (2003) measure of “Party Nationalization” using the egen_inequal command and Morgenstern and Potthoff’s (2005) measure of the “Components of Elections” using xtmixed. More generally, where data reside live and for broad public consumption, Stata can play a valuable role operating behind the scenes for nontechnical users where measures of conceptual value cannot be generated from within the database environment.

Additional information
mustillo_nasug2008.ppt

USESPSS: Processing SPSS files in Stata

Sergiy Radyakin

The World Bank

The new command USESPSS allows users to open and process SPSS system files in Stata for Windows. USESPSS is a “true reader” in that it is completely independent from any specialized conversion software, like Stat/Transfer, and it does not require SPSS to be installed. USESPSS converts data files on the fly, preserving variable labels, value labels, and missing values. Similarly to other conversion software, USESPSS optimizes data storage types by looking for the most efficient way to store SPSS data in Stata’s memory. USESPSS is implemented as a plugin and works in a Windows 32-bit environment (however, it understands SPSS files originating from both Windows and Unix platforms, compressed and not compressed). The critical portions of its code are written in assembly language; thus, SPSS data can be used in Stata programs without a significant loss of performance. In part, the talk will also include the process of developing plugins for Stata.

Additional information
radyakin_usespss.ppt

Reshaping the World Development Indicators (WDI) for panel data and seemingly unrelated regression modeling in Stata

P. Wilner Jeanty

The Ohio State University

The World Bank’s World Development Indicators (WDI) compilation is a rich and widely used dataset about development of most economies in the world. However, after obtaining the data from the World Bank’s website or the WDI CD-ROM, users need to manage or reorganize the data in a certain way for statistical applications. The World Bank has made great strides in rendering WDI in several forms for download. Yet, seemingly unrelated regression analysis, for example, cannot be performed using any of such structures. Reorganizing the data for seemingly unrelated regression analysis as well as renaming the series with meaningful variable names and maintaining the series descriptors as variable labels in the reshaped dataset represent significant data-management challenges for the inexperienced Stata user. I will present a new Stata program, wdireshape, that reduces data-management time and effort to zero when the ultimate structure is to fit panel-data and seemingly unrelated regression models, or to have a dataset with the countries as rows and the variables for each year as columns.

Additional information
jeanty_nasug08.zip

Estimating the parameters of dynamic panel-data models using Stata

David Drukker

StataCorp

In this talk, I will review dynamic panel-data analysis and how to perform it using Stata. I also cover static models with predetermined variables. For each model discussed, I review the econometrics and then show how to perform the estimation using Stata.

Additional information
drukker_xtdpd.pdf

Estimation of constant-CV regression models

Alan Feiveson

NASA Johnson Space Center

A typical formulation for a linear mixed model is Y = X(be) + Z(u), where (be) is a vector of “fixed” parameters, (u) is a vector of “random effects”, and X and Z are matrices whose columns consist of design variables and/or covariates. In some applications, the elements of Z may depend on the unknown fixed parameters (be) as well as known covariates. A common example is when an error variance is proportional to some power of E(Y), the mean of Y. In particular, if the variance is proportional to the square of E(Y), we have a constant-CV model. In this talk, I will give examples of such models, including those with hierarchical structures, and show how xtmixed can be used to estimate them and do proper inference on the estimated parameters. I will compare the results with Bayesian estimation under WINBUGS.

Additional information
feiveson_snasug_2008.ppt

Logistic regression by means of penalized maximum likelihood estimation in cases of separation

Joseph Coveney

Cobridge Co., Ltd.

Users of logit or logistic occasionally encounter instances in which one or more predictors perfectly predict one or both outcomes (a condition called separation), or in which some outcomes are completely determined (quasi-complete separation). Finite maximum likelihood estimates do not exist under conditions of separation. Exact logistic regression with exlogistic can serve as an alternative in these circumstances but is sometimes infeasible. In the 1990s, David Firth proposed a type of penalization for reducing bias of maximum likelihood estimates in generalized linear models by means of modifying the score equations. Firth’s method has the interpretation of penalized maximum likelihood when the canonical link function is used, such as in logistic regression. In this decade, Georg Heinze and colleagues have explored this technique as a solution to the problem of separation in logistic regression. I describe a Stata implementation, firthlogit, which maximizes the penalized log-likelihood using ml. I illustrate its use in model fitting and predictions, inference with penalized likelihood-ratio tests, and construction of profile penalized likelihood confidence intervals. I use examples where logit and logistic balk or do not give finite maximum likelihood estimates, and where exact logistic regression is problematic because of memory requirements or degenerate conditional distributions.

Additional information
coveney_snasug08.pps

Finite mixture models

Partha Deb

Hunter College and the Graduate Center, CUNY

Finite mixture models provide a natural way of modeling continuous or discrete outcomes that are observed from populations consisting of a finite number of homogeneous subpopulations. Applications of finite mixture models are abundant in the social and behavioral sciences, biological and environmental sciences, engineering, and finance. Such models have a natural representation of heterogeneity in a finite, usually small, number of latent classes, each of which may be regarded as a type. More generally, the finite mixture model can be shown to approximate any unknown distribution under suitable regularity conditions. The Stata package fmm implements a maximum likelihood estimator for a class of finite mixture models. In this talk, I will begin by introducing finite mixture models with a number of examples, and then I will discuss issues of estimation, testing, and model selection. I will then describe estimation using fmm, calculations of predictions, marginal effects, and posterior class probabilities, and I will illustrate these by using examples from econometrics and finance.

Additional information
deb_fmm_slides.pdf

Inference for partial effects in nonlinear panel-data models using Stata

Jeffrey Wooldridge

Department of Economics, Michigan State University

Abstract not available.

Additional information
wooldridge.zip

Analyzing survey data using Stata 10

Roberto G. Gutierrez

StataCorp

Stata’s approach to the analysis of data from complex surveys is unique in that it clearly separates the declaration of the design aspects of the survey (accomplished by svyset) from the actual analysis. Such an arrangement is ideal because the design characteristics of the data do not change according to the analysis being performed. Whether you are constructing contingency tables or performing Cox regression, the sampling weights and primary sampling units (not to mention the other design specifications) remain constant. Stata’s treatment of survey data makes it easy to maintain that consistency. Most of Stata’s model fitting and other analysis commands can be applied easily to survey data, including (with the release of Stata 10) commands for Cox regression and parametric models for survival data in a survey setting. This talk is a tutorial on how to make full use of Stata’s capabilities for survey data. Alternative variance estimation is a key component of performing valid inference in light of complex-survey designs, and I will discuss several variance-estimation options. That discussion will include modern computationally intensive methods such as balanced and repeated replication, the jackknife, and the bootstrap, which are made feasible with the advent of better computer technology. For these three methods, variance estimation can be done directly or by using a series of replication weights.

Additional information
gutierrez_survey.pdf

Survey bootstrap and bootstrap weights

Stas Kolenikov

Department of Statistics, University of Missouri–Columbia

In this presentation, I will review the bootstrap for complex surveys with designs featuring stratification, clustering, and unequal probability weights. I will present the Stata module bsweights, which creates the bootstrap weights for designs specified through and supported by svy. I will also provide simple demonstrations highlighting the use of the procedure and its syntax. I will discuss various tuning parameters and their impact on the performance of the procedure, and I will give arguments for the bootstrap by the method of weights in nonsurvey settings.

Additional information
kolenikov_snasug08.pdf
kolenikov_bsw-example.do

Analyzing spatial autoregressive models in Stata

David Drukker

StataCorp

In this talk, I will provide a quick introduction to estimators for the parameters of spatial-autoregressive models and a quick introduction to a suite of user-written Stata commands for managing spatial data and parameter estimation.

Additional information
drukker_spatial.pdf

Scientific organizers

Phil Schumm, (chair), University of Chicago
Scott Long, Indiana University
Pravin Trivedi, Indiana University
Richard Williams, University of Notre Dame

Logistics organizers

Chris Farrar, StataCorp
Gretchen Farrar, StataCorp