9:15–10:15 | Recent developments in the fitting and assessment of flexible parametric survival models
Abstract:
Flexible parametric survival models are an alternative to the Cox proportional hazards model and more standard parametric models for the modeling of survival (time-to-event) data. They are flexible in that spline functions are used to model the baseline and potentially complex time-dependent effects. I will give a brief overview of the models and the advantages over the Cox model. However, I will concentrate on some recent developments. This will include the motivation for developing a new command to fit the models (stpm3), which makes it much simpler to fit more complex models with nonlinear functions, nonproportional hazards, and interactions and simplifies and extends postestimation predictions, particularly marginal (standardized) predictions. I will also describe some new postestimation tools that help in the evaluation of model fit and validation in prognostic models.
Additional information:
Paul Lambert
University of Leicester, UK, and Karolinska Institutet, Sweden
|
10:45–11:15 | cfbinout and xtdhazard: Control-function estimation of binary-outcome models and the discrete-time hazard model
Abstract:
We introduce the new community-contributed Stata commands cfbinout and xtdhazard. The former generalizes ivprobit, twostep by allowing discrete endogenous regressors and different link functions than the normal link, specifically logit and cloglog. In terms of the underlying econometric theory, cfbinout is guided by Wooldridge (2015). In terms of the implementation in Stata and Mata, cfbinout follows Terza (2017). xtdhazard is essentially a wrapper for either cfbinout or ivregress 2sls. When calling ivregress 2sls, xtdhazard implements the linear first-differences (or higher-order differences) instrumental variables estimator suggested by Farbmacher and Tauchmann (2023) for dealing with time-invariant unobserved heterogeneity in the discrete-time hazard model. When calling cfbinout, xtdhazard implements—depending on the specified link function—several nonlinear counterparts of this estimator that are briefly discussed in the online supplement to Farbmacher and Tauchmann (2023). Using xtdhazard—rather than directly using ivregress 2sls, ivprobit, twostep, or cfbinout—simplifies the implementation of these estimators, because generating the numerous instruments required can be cumbersome, especially when using factor-variables syntax. In addition, xtdhazard performs several checks that may prevent ivregress 2sls and ivprobit, twostep, from failing and reports issues like perfect first-stage predictions. An (extended) replication of Cantoni (2012) illustrates the use of cfbinout and xtdhazard in applied empirical work.
References: Cantoni, D. (2012). Adopting a new religion: The case of Protestantism in 16th century Germany. The Economic Journal 122: 502–531. Farbmacher, H., and H. Tauchmann. (2023). Linear fixed-effects estimation with nonrepeated outcomes. Econometric Reviews 42(8): 635–654. Terza, J. (2017). Two-stage residual inclusion estimation: A practitioners guide to Stata implementation. The Stata Journal 17(4): 916–938. Wooldridge, J. M. (2015). Control function methods in applied econometrics. The Journal of Human Resources 50(2): 420–445. Contributor:
Elena Yurkevich
FAU Erlangen-Nurenberg
Additional information:
Harald Tauchmann
FAU Erlangen-Nurenberg
|
11:15–11:45 | Multidimensional well-being, deprivation, and inequality
Abstract:
This presentation offers a brief summary for a set of Stata programs for extended multidimensional applications on well-being, deprivation, and inequality. The first section illustrates the underlaying motivation by some empirical examples on decomposed multidimensional results. The second section on multidimensional well-being and deprivation measurement illustrates the conceptual background—based on the Alkire/Foster MPI framework (and CPI, N. Rippin)—which is also applied to well-being measurement, and extended by a parameter-driven fixed-fuzzy approach—with several illustrations and further details on the options offered in the Stata deprivation and well-being programs. The third section on multidimensional inequalities refers to a multidimensional Gini-based row-first measurement framework with a special emphasis on multiple within- and between-group inequalities—including conceptual extensions on horizontal between-group applications and further details on the options offered in the Stata inequality program. Section four summarizes and opens up for advice and discussion.
Additional information:
Peter Krause
DIW Berlin, SOEP
|
11:45–12:00 | How to assess the fit of choice models with Stata?
Abstract:
McFadden developed the conditional multinomial logit model in 1974 using it for rational choice modeling. In 1993, Stata introduced it in version 3. In 2007, Stata extended this model to asclogit or ascprobit being able to estimate the effects of alternative-specific and case-specific exogenous variables on the choice probability of the discrete alternatives. In 2021, Stata added the class of choice models, extending it to random-effect (mixed) and panel models. As it stands, Stata provides only a postestimation Wald chi-squared test to assess the overall model. However, although McFadden developed a pseudo-R-squared to assess the fit of the conditonal logit model in 1974, Stata still does not provide it even in version 18. Thus, I developed fit_cmlogit to calculate the McFadden pseudo-R-squared using a zero model with alternative-specific constants to correct the uneven distribution of alternatives. Furthermore, it calculates the corresponding likelihood-ratio chi-squared test, which is more reliable and conservative than the Wald test. The program uses the formulas of Hensher and Johnson (1981) and Ben-Akiva and Lerman (1985) for the McFdden pseudo-R-squared to correct the number of exogenous variables and faced alternatives. Train (2003) discussed these characteristics of the McFadden pseudo-R-squared in detail. Additionally, it calculates the log-likelihood-based pseudo-R-squares developed by Maddala (1983, 1988), Cragg and Uhler (1970), and Aldrich and Nelson (1984). The last uses the correction formula proposed by Veall and Zimmermann (1994). An empirical example of predicting voting behavior in the German federal election study of 1990 demonstrates the usefulness of the program to assess the fit of logit choice models with alternative-specific and case-specific exogenous variables.
References: Aldrich, J. H., and F. D. Nelson. (1984). Linear probability, logit and probit models. Beverly Hills, CA: Sage. Ben-Akiva, M., and S. R. Lerman. (1985). Discrete choice analysis: Theory and application to travel demand. Cambridge, MA: MIT Press. Cragg, G., and R. Uhler. (1970). The demand of automobiles. Canadian Journal of Economics 3: 386–406. Hensher, D. A., and L. W. Johnson. (1981). Applied discrete choice modelling. London: Croom Helm/Wiley. Domencich, T. A., and D. McFadden. (1975). Urban travel demand. A behavioral analysis. Amsterdam u. Oxford: North Holland Publishing Company. Maddala, G. S. (1983). Limited-dependent and qualitative variables in econometrics. Cambridge, U.K.: Cambridge University Press. Maddala, G. S. (1992 [1988]). Introduction to Econometrics. New York, N.Y.: Maxwell. Macmillan McFadden, D. (1974). Conditional logit analysis of qualitative choice behavior. Frontiers of Econometrics. P. Zarembka (editor): 105–142. New York: Academic Press. McFadden, D. (1979). Quantitative methods for analysing travel behaviour of individuals: Some recent developments. Hensher, D. A., and P. R. Stopher (editors): Behavioural travel modelling. London: Croom Helm>: 279–318. Train, K. E. (2003). Discrete choice methods with Simulations. Cambridge, U.K.: Cambridge University Press. Veall, M. R., and K. F. Zimmermann. (1994). Evaluating Pseudo-R2’s for binary probit models. Quality & Quantity 28: 151–164.
Additional information:
Wolfgang Langer
Martin-Luther-University Halle-Wittenberg
|
1:00–2:15 | Customizable tables
Abstract:
Presenting results effectively is a crucial step in statistical analyses, and creating tables is an important part of this step. Whether you need to create a cross-tabulation, a Table 1 reporting summary statistics, a table of regression results, or a highly customized table of results returned by multiple Stata commands, the tables features introduced in Stata 17 and Stata 18 provide ease and flexibility for you to create, customize, and export your tables. In this presentation, I will demonstrate how to use the table, dtable, and etable commands to easily create a variety of tables. I will also show how to use the collect suite to build and customize tables and to create table styles with your favorite customizations that you can apply to any tables you create in the future. Finally, I will demonstrate how to export individual tables to Word, Excel, LaTeX, PDF, Markdown, and HTML and how to incorporate your tables into complete reports containing formatted text, graphs, and other Stata results.
Additional information:
Kristin MacDonald
StataCorp
|
2:45–3:15 | geoplot: A new command to draw maps
Abstract:
geoplot is a new command for drawing maps from shape files and other datasets. Multiple layers of elements such as regions, borders, lakes, roads, labels, and symbols can be freely combined and the look of elements (for example, color) can be varied depending on the values of variables. Compared with previous solutions in Stata, geoplot provides more user convenience, more functionality, and more flexibility. In this talk, I will introduce the basic components of the command and illustrate its use with examples.
Additional information:
Ben Jann
University of Bern
|
3:15–3:45 | repreport: Facilitating reproducible research in Stata
Abstract:
In theory, Stata provides a stable computational environment and includes commands (for example, version) that are specifically designed to ensure reproducibility. In practice, however, users often lack the time or the knowledge to exploit this potential. Insights from an ongoing research project on reproducibility in the social sciences show that computational reproducibility is regularly impeded by researchers being unaware of what files (for example, datasets and do-files), software components (for example, ados), infrastructure (for example, directories), and information (for example, ReadMe files) are needed to enable reproduction. This presentation introduces the new Stata command repreport as a potential remedy. The command works like a log, with one key difference: Instead of logging the entire analysis, repreport extracts specific pieces of information pertinent to reproduction (for example, names and locations of datasets, ados, etc.) and compiles them into a concise reproduction report. Furthermore, the command includes an option for generating a full-fledged reproduction package containing all components needed for push-button reproducibility. While repreport adds little value for researchers whose workflow is already perfectly reproducible, it constitutes a powerful tool for those who strive to make their research in Stata more reproducible at (almost) no additional cost.
Additional information:
Daniel Krähmer
Ludwig-Maximilians-Universität München
|
3:45–4:15 | mkproject and boilerplate: Automate the beginning
Abstract:
There is usually a set of commands that are included in every do-file a person makes, like clear all or log using. What those commands are can differ from person to person, but most persons have such a standard set. Similarly, a project usually has a standard set of directories and files. Starting a new do-file or a new project thus involves a number of steps that could easily be automated. Automating has the advantage of reducing the amount of work you need to do. However, the more important advantage of automating the start of a do-file or project is that it makes it easier to maintain your own workflow: it is so easy to start “quick and dirty” and promise to yourself that you will fix that “later”. If the start is automated, then you don’t need to fix it.
The mkproject command automates the beginning of a project. It comes with a set of templates I find useful. A template contains all the actions (like creating sub-directories, creating files, running other Stata commands) that mkproject will take when it creates a new project. Since everybody’s workflow is different, mkproject allows users to create their own template. Similarly, the boilerplate command creates a new do-file with boilerplate code in it. It comes with a set of templates, but the user can create their own. This talk will illustrate the use of both mkproject and boilerpate and how to create your own templates.
Maarten L. Buis
University of Konstanz
|
4:45–5:15 | Data structures in Stata
Abstract:
This presentation starts out by enumerating and describing the main data structures in Stata (for example, datasets, frames, and matrices) and Mata (for example, string and numeric matrices, objects like associative arrays). It analyzes ways in which data can be represented and coerced from one data container into another. After assessing the strengths and limitations of existing data containers, it muses on potential additions of new data structures and on enriching the functionality of existing data structures and their interplay. Moreover, data structures from other languages, such as Python lists, are described and examined for their potential introduction into Stata and Mata. The goal of the presentation is to stimulate a discussion among Stata users and developers about ways in which the capabilities of Stata’s data structures could be enhanced in order to ease and open up new possibilities for data management and analysis.
Additional information:
Daniel C. Schneider
Max Planck Institute for Demographic Research, Rostock
|
5:15–6:00 | Open panel discussion with Stata developers
Contribute to the Stata community by sharing your feedback with StataCorp's developers. From feature improvements to bug fixes and new ways to analyze data, we want to hear how Stata can be made better for our users.
|
6 June 2024
View the proceedings of previous Stata Conferences and Users Group meetings.