2011 Stata Conference Chicago

Home / Resources & support / Stata Conferences and Users Group meetings / 2011 Stata Conference Chicago

Last updated: 16 August 2011

2011 Stata Conference Chicago

14–15 July 2011

Gleacher Center
The University of Chicago Booth School of Business
450 North Cityfront Plaza Drive
Chicago, IL 60611

Proceedings

Tricks with Hicks: Stata gmm code for nonlinear GMM

Carl Nelson

University of Illinois–Urbana–Champaign

In a June, 2009 American Economic Review article entitled “Tricks with Hicks: The EASI demand system”, Arthur Lewbel and Krishna Pendakur proposed the exact affine Stone index demand system. This system allows Engel curve behavior higher than rank 3, demographics, and unobserved heterogeneity in tastes. The American Economic Review web supplement for the article provides Stata code to estimate linear and iterative linear versions of the model. But the full nonlinear system instrumental variable estimates were obtained with TSP econometric software using command frml to obtain nonlinear three-stage least-squares estimates. I present Stata code to estimate the nonlinear exact affine Stone index demand system using the Stata gmm command. This is an example of the important estimation extensions that have been made possible by the introduction of the gmm command.

Additional information
chi11_nelson.pdf
engel.png
lewbelpendakur09_20.pdf

xtmixed and denominator degrees of freedom: Myth or magic

Phil Ender

UCLA Statistical Consulting Group

I review issues and controversy surrounding F-ratio denominator degrees of freedom in linear mixed models. I will look at the history of denominator degrees of freedom and survey their use in various statistical packages.

Additional information
chi11_ender.pdf

Using the margins command to estimate and interpret adjusted predictions and marginal effects

Richard Williams

University of Notre Dame

As Long and Freese show, it can often be helpful to compute predicted and expected values for hypothetical or prototypical cases. Stata 11 introduced new tools—factor variables and the margins command—for making such calculations. These can do many of the things that were previously done by Stata’s own adjust and mfx commands, as well as Long and Freese’s spost9 commands like prvalue. Unfortunately, the complexity of the margins syntax, the daunting 50-page reference manual entry that describes it, and a lack of understanding about what margins offers over older commands may have dissuaded researchers from using it. This paper therefore shows how margins can easily replicate analyses done by older commands. It demonstrates how margins provides a superior means for dealing with interdependent variables (for example, X and X²; X1, X2, and X1 × X2; multiple dummies created from a single categorical variable), and is also superior for data that are svyset. The paper explains how the new asobserved option works and the substantive reasons for preferring it over the atmeans approach used by older commands. The paper primarily focuses on the computation of adjusted predictions, but also shows how margins has the same advantages for computing marginal effects.

Additional information
chi11_williams.pptx

Using margins to test for group differences in growth trajectories in generalized linear mixed models

Sarah Mustillo (with L.R. Landerman and K.C. Land)

Purdue University, Duke University School of Medicine, and Duke University

To test for group differences in growth trajectories in mixed (fixed and random-effects) models, researchers frequently interpret the coefficient of group-by-time product terms. While this practice is straightforward in linear mixed models, testing for group differences in generalized linear mixed models is more complex. Using both an empirical example and simulated data, we show that the coefficient of group-by-time product terms in mixed logistic and Poisson models estimate the multiplicative change with respect to the baseline rates, while researchers often are more interested in differences in the predicted rate of change between groups. The latter can be obtained by using the margins command in Stata. This may be especially desirable when the mean of the outcome variable is low and marginal change differs from multiplicative change. We propose and illustrate the use of margins to interpret group differences in rates of change over time following estimation with generalized linear models.

Additional information
chi11_mustillo.pptx

Graphics tips for all

Nicholas J. Cox

Durham University, United Kingdom

Stata’s graphics were completely rewritten for Stata 8, with further key additions in later versions. Its official commands have, as usual, been supplemented by a variety of user-written programs. The resulting variety presents even experienced users with a system that undeniably is large, often appears complicated, and sometimes seems confusing. In this talk, I provide a personal digest of graphics strategy and tactics for Stata users; I emphasize details large and small that, in my view, deserve to be known by all.

Additional information
chi11_cox.zip

Stata as a data-entry management tool

Ryan Knight

Innovations for Poverty Action

It is increasingly common for social scientists to be involved in primary data collection, whether through the administration of unique survey instruments or the execution of field experiments. Novel datasets present novel challenges for researchers, who may find themselves responsible for ensuring that any information collected is entered into the computer accurately. This presentation discusses why and how one might use Stata as a tool for data-entry management and introduces three new user-written commands that streamline the data-entry process. The commands are: cfout, which is an extension of the cf command that outputs a user-friendly list of all discrepancies between two datasets (for example, the first and second entry of a double-entered dataset); readreplace, which makes many replacements to a dataset, based on a corrected list of the discrepancies generated by cfout; and mergeall, which merges many files without loss of information due to string and numeric differences. This suite of commands can help reduce the cost and increase of the accuracy of primary data collection, and it extends Stata’s data-management capabilities to include the management of data entry.

Additional information
chi11_knight.pptx

Universal and mass customization of tables in Stata

Roy Wada

University of Illinois–Chicago

There is a strong demand for a systematic and uniform approach to table-making, yet it is currently believed that this is not plausible or is nonexistent in Stata. There is also an impression that tabulation tables are inherently different from summary tables or regression tables. This presentation shows that it is possible to design a programmatic, universal solution once the similarities between the apparently different types of tables are understood. The universal approach to table-making is implemented in the latest version of outreg2. Thus a mass customization of various types of tables, including cross-tabulations and stub-and-banner types of tables, can be readily produced in Stata.

Additional information
chi11_wada.pptx

Fractional response models with endogenous explanatory variables and heterogeneity

Jeffrey M. Wooldridge

Michigan State University

In this talk, I will discuss ways of using Stata to fit fractional response models when explanatory variables are not exogenous. Two questions are of primary concern: First, how does one account for endogenous explanatory variables, both continuous and discrete, when the response variable is fractional and may take values at the corners? Second, how can we incorporate unobserved heterogeneity in panel-data fractional models when the panel might be unbalanced? I will draw on Papke and Wooldridge (2008, Journal of Econometrics 145: 121–133) and two unpublished papers of mine, “Quasi-maximum likelihood estimation and testing for nonlinear models with endogenous explanatory variables” and “Correlated random effects models with unbalanced panels”. One practically important conclusion is that by expanding the scope of existing Stata commands to allow fractional responses—in particular, the ivprobit, biprobit, hetprob, and (user-written) gllamm commands—flexible fractional response models can easily be fit.

Additional information
chi11_wooldridge.pdf

Causal inference for binary regression with observational data

Austin Nichols

Urban Institute

Special problems arise when trying to do causal inference for binary regression with observational data; we will examine some of these problems and critically examine several common and not-so-common solutions.

Additional information
chi11_nichols.pdf

Estimating the parameters of simultaneous-equations models with the sem command in Stata 12

David M. Drukker

StataCorp

In this talk, I introduce Stata 12’s new sem command for estimating the parameters of simultaneous-equations models. Some of the considered models include unobserved factors. Estimation methods include maximum likelihood and the generalized method of moments.

Additional information
chi11_drukker_sem.pdf

Calculating bronchiolitis severity using ordinal regression with a new function in Stata

Carl Mitchell (with Paul Walsh)

Kern County Medical Center Department of Emergency Medicine/UCLA

A new command has been developed implementing a previously validated tool for describing bronchiolitis severity. Bronchiolitis is one of the most common causes of hospital admission for infants and it is widely studied. This command classifies predicted severity of illness using an ordinal regression model. Optionally, the user can obtain the predicted probability of hospital admission and the probability of an infant falling into a severity of illness classification different than that predicted.

Additional information
chi11_mitchell.pdf

Teaching statistics with Stata in emergency medicine (EM) journal club

Muhammad Waseem

Lincoln Medical and Mental Health Center

Residency training is an important period when a physician learns and acquires the necessary skills of searching for, evaluating, and applying medical knowledge. The journal club is an academic event and an important forum for this purpose. The objective of the journal club is to learn and develop a skill to find, appraise, and implement practice-changing advancements in the medical literature. We report our experience with Stata in journal club in teaching emergency medicine residents statistics in addition to critical appraisal skills. To understand and utilize the current literature effectively, an understanding of basic statistical methods is essential. We introduced Stata while discussing the methods and results section of an article in the journal club to teach application of some common statistical tests. Published studies were selected to illustrate and provide the insight of commonly used statistical concepts. We noted that improved understanding of statistics resulted in increased interest and enthusiasm of residents to participate in journal club. Integrating a statistical software program such as Stata into journal club can serve as an important tool to enhance learning. Further studies should be conducted to fully utilize these opportunities for enhanced learning of in-training physicians.

Additional information
chi11_waseem.pptx

Use of cure fraction models for the survival analysis of uterine cancer patients

Noori Akhtar-Danesh (with Alice Lytwyn and Laurie Elit)

McMaster University

In population-based cancer studies, a cure fraction model classifies patients into those who survive the cancer and those who encounter excess mortality risk compared with the general population (2007, Stata Journal 7: 1–25). In this presentation, we report the proportion cured and the relative survival pattern for patients diagnosed with uterine cancer in Canada over the period of 1992–2005. We used a nonmixture cure fraction model to estimate the cure fraction rate and the relative survival among “uncured” patients (2007, Stata Journal 7: 1–25). Then we predicted the cure fraction rate and median survival for each age group based on the year of diagnosis. Relative survival and cure fraction rate decreased with age but increased gradually over time. Relative survivals for Eastern Canada and Ontario were lower compared with the other regions. The same applies to the comparison of cure fraction rates between the geographical regions. This is the first study using a cure fraction model for analysis of uterine cancer. Although there are some limitations attached to this model, it is flexible enough to be used with different parametric distributions and to include different link functions for relative survival analysis.

Additional information
chi11_akhtar_danesh.ppt

Using Mata to import Illumina SNP chip data for genome-wide association studies

Chuck Huber (with Michael Hallman, Victoria Friedel, Melissa Richard, and Huandong Sun)

Texas A&M Health Science Center School of Rural Public Health and University of Texas School of Public Health

Modern genetic genome-wide association studies typically rely on single nucleotide polymorphism (SNP) chip technology to determine hundreds of thousands of genotypes for an individual sample. Once these genotypes are ascertained, each SNP (alone or in combination) is tested for association outcomes of interest such as disease status or severity. Project Heartbeat! was a longitudinal study conducted in the 1990s that explored changes in lipids and hormones and morphological changes in children from age 8–18 years. A genome-wide association study is currently being conducted to look for SNPs that are associated with these developmental changes. While there are specialty programs available for the analysis of hundreds of thousands of SNPs, they are not capable of modeling longitudinal data. Stata is well-equipped for modeling longitudinal data but cannot load hundreds of thousands of variables into memory simultaneously. This talk will briefly describe the use of Mata to import hundreds of thousands of SNPs from the Illumina SNP chip platform and how to load those data into Stata for longitudinal modeling.

Additional information
chi11_huber.pptx

Graphics tricks for models

Bill Rising

StataCorp

Visualizing interactions and response surfaces can be difficult. In this talk, I will show how to do the former by graphing adjusted means and the latter by showing how to roll together contour plots. I will demonstrate this for both linear and nonlinear models.

Additional information
chi11_rising.pdf
chi11_rising_files.zip

Malmquist productivity analysis using DEA frontier in Stata

Choonjoo Lee

Korea National Defense University

In this presentation, the author presents a procedure and an illustrative application of a user-written Malmquist productivity analysis (MPA) using data envelopment analysis (DEA) frontier in Stata. MPA measures the productivity changes for units between time periods. MPA has been used widely for assessing the productivity changes of public and private sectors, such as banks, airlines, hospitals, universities, defense firms, and manufacturers, when the panel data are available. The MPA using DEA frontier in Stata will allow Stata users to conduct not only the stochastic approach for productivity analysis using stochastic-frontier analysis but also the nonstochastic approach using DEA frontier, also suggested by the author. The user-written MPA approach in Stata will provide some possible future extensions of Stata programming in productivity analysis.

Additional information
chi11_lee.ppt
chi11_lee_files.zip

An interpretation and implementation of the Theil–Goldberger “mixed” estimator

Christopher Baum

Boston College and DIW Berlin

In the early 1960s, Theil and Goldberger proposed a generalized least-squares approach to “mixing” sample information and prior beliefs about the coefficients of a regression equation. Their “mixed” estimator may be considered as a stochastic version of constrained least squares (Stata’s cnsreg). Although based on frequentist statistics, the Theil–Goldberger estimator is identical to that used in a Bayesian estimation approach when an informative prior density is employed. It may also be viewed as a one-shot application of the Kalman filter, providing an updating equation for point and interval coefficients based on prior and sample information. I discuss the motivation for the estimator and my implementation in Stata code, tgmixed, and give illustrations of how it might be usefully employed.

Additional information
chi11_baum.pdf

Multilevel regression and poststratification in Stata

Maurizio Pisati (with Valeria Glorioso)

University of Milano–Bicocca and Harvard School of Public Health

Sometimes, social scientists are interested in determining whether, and to what extent, the distribution of a given variable of interest Y varies across the categories of a second variable D. When the number of valid observations within one or more categories of D is small or the collected data are affected by selection bias, relatively accurate estimates of E(Y|D) can be obtained by using a proper combination of multilevel regression modeling and poststratification, called the multilevel regression modeling and poststratification approach (Gelman and Little 1997, Survey Methodology 23: 127–135; Gelman and Bafumi 2004, Political Analysis 12: 375–385; and Lax and Phillips 2009, American Journal of Political Science 53: 107–121). The purpose of this talk is to illustrate the main features and applications of mrp, a new user-written program that implements the multilevel regression modeling and poststratification approach in Stata.

Additional information
chi11_pisati.pdf

Mata, the missing manual

William W. Gould

StataCorp

Mata is Stata’s matrix programming language. StataCorp provides detailed documentation on it, but so far has failed to give users—and especially users who add new features to Stata—any guidance in when and how to use the language. In this talk, I provide what has been missing. In practical ways, I show how to include Mata code in Stata ado-files, reveal when to include Mata code and when not to, and provide an introduction to the broad concepts of Mata—the concepts that will make the Mata Reference Manual approachable.

Additional information
chi11_gould.pdf

Stata Graph Library for network analysis

Hirotaka Miura

Federal Reserve Bank of San Francisco

Network analysis is a multidisciplinary research method that is fast becoming a popular and exciting field of study. Though a number of statistical programs possess sophisticated packages for analyzing networks, similar capabilities have yet to be made available in Stata. In an effort to motivate the use of Stata for network analysis, I designed in Mata the Stata Graph Library (SGL), which consists of algorithms that construct matrix representations of networks, compute centrality measures, and calculate clustering coefficients. Performance tests conducted between C++ and SGL implementations indicate gross inefficiencies in current SGL routines, making SGL practically infeasible to be used for large networks. The obstacles are, however, welcome challenges in the effort to spread the use of Stata as an instrument for analyzing networks, and future developments will focus on addressing computational time complexities as well as integrating additional capabilities into SGL.

Additional information
chi11_miura.pdf
chi11_miura_SGL_version_1.1.2.zip

Filtering and decomposing time series in Stata 12

David M. Drukker

StataCorp

In this talk, I introduce new methods in Stata 12 for filtering and decomposing time series and I show how to implement them. I provide an underlying framework for understanding and comparing the different methods. I also present a framework for interpreting the parameters.

Additional information
chi11_drukker_filter.pdf

Scientific organizers

Phil Schumm, (chair) University of Chicago
Lisa Barrow, Federal Reserve Bank of Chicago
Scott Long, Indiana University
Rich Williams, University of Notre Dame

Logistics organizers

Chris Farrar, StataCorp
Gretchen Farrar, StataCorp