9:00–10:00 | Session I: Exploiting the potential of Stata 17, ICustom estimation tables Abstract: This presentation illustrates how to construct custom tables from one or more estimation commands.
I demonstrate how to add custom labels for significant coefficients and make targeted style
edits to cells in the table using the following commands:
Additional information: Jeff Pitblado
StataCorp
|
|||
10:00–11:15 | Session II: Community-contributed, IMachine learning using Stata/Python Abstract: Two related Stata modules, r_ml_stata and c_ml_stata, are presented for fitting popular machine learning (ML) methods both in regression and classification settings.
Using the recent Stata/Python integration platform(), introduced in Stata 16, these
commands provide hyper-parameters’ optimal tuning via K-fold cross-validation using
grid search. More specifically, they make use of the Python scikit-learn API to carry
out both cross-validation and outcome/label prediction.
Additional information: Giovanni Cerulli
IRcRES, Rome
A Stata routine for estimating the blocking with regression adjustment Abstract: The psreg command implements the blocking with regression adjustment estimator, proposed by Imbens (Journal of Human Resources 2015).
It relies on the estimate of the propensity score and uses
regressions in subclasses (blocks) of the propensity score. The
ATT is given by estimates within-block averaged for the number
of treated units in each block. In the case of ATE, the estimates
are averaged for the number of units (treated and untreated) in
each block.
Additional information: Martina Bazzoli
FBK-IRVAPP
|
|||
11:30–1:00 | Session III: Community-contributed, IIA Stata package for cluster-weighted modeling Abstract: The cluster-weighted model (CWM) is a member of the family of the mixtures of regression models, and is also referred to in the literature as the mixture of regression with random covariates.
These models extend finite mixture models by allowing the
researcher to model the marginal distribution of regression
covariates along with the conditional distribution. The
attention on CWMs is increasing; indeed, software for estimating
these kinds of models is available to R users but not for Stata
users. Thus, the aim of this presentation is to introduce the Stata
package cwmglm. This package extends the capabilities of
fmm by introducing more advanced mixture models based on
maximum likelihood estimation and the expectation maximization (EM)
algorithm.
cwmglm allows users to fit CWMs based on the most common generalized linear models (GLM) with random covariates. The supported GLM families are Gaussian, Poisson and binomial, while the allowed marginal distributions for the covariates are multivariate normal, multinomial, binomial, and Poisson. cwmglm extends the current capabilities in the estimation of CWMs by allowing users to evaluate model fit by introducing the generalized determination coefficients and by incorporating bootstrap-based inference. These features are not available in the current version of the R-package software for CWMs. Furthermore, cwmglm allows one to estimate parsimonious models of Gaussian distributions. This approach is based on assuming the correlation structure between concomitants within multivariate Gaussian mixture components and on the equality/inequality of variance–covariance matrices between components. Fourteen parsimonious models are possible by exploiting the eigenvalue decomposition of the variance–covariance matrix. Parsimonious mixtures of multivariate Gaussian distributions can be used to model random covariates within CWM-GLM or as stand-alone models (mixture of multivariate Gaussians with defined covariance matrix). This feature is completely new for Stata users because it is not allowed by gsem and fmm. Last, the flexibility of cwmglm allows one to estimate the “canonical” finite mixture of regressions.
Additional information: Daniele Spinelli
University of Milan–Bicocca
Stacking generalization and machine learning in Stata Abstract: pystacked implements stacked generalization (Wolpert 1992) for regression and binary classification via Python’s scikit-learn.
Stacking combines multiple supervised machine
learners—the “base” or “level-0” learners—into a
single learner. The currently supported base learners include
regularized regression, random forest, gradient boosting,
support vector machines and feed-forward neural nets (multilayer
perceptron). pystacked can also be used with a “regular”
machine-learning program to fit a single base learner and
thus provides an easy-to-use API for scikit-learn’s
machine-learning algorithms.
Additional information: Achim Ahrens
ETH Zürich
Double/debiased machine learning in Stata Abstract: ddml implements algorithms for causal inference aided by supervised machine learning as proposed in "Double/ debiased machine learning for treatment and structural parameters" (Econometrics Journal 2018).
Five different models are supported, allowing for binary or
continuous treatment variables and endogeneity. ddml
supports a variety of different ML programs, including
lassopack and pystacked.
Additional information: Achim Ahrens
ETH Zürich
|
|||
2:00–3:00 | Session IV: Exploiting the potential of Stata 17, IITreatment-effects estimation using lasso Abstract: One can use treatment-effects estimators to draw causal inferences from observational data.
You can use lasso when you want to control for many potential
covariates. With standard treatment-effects models, there is an
intrinsic conflict between two required assumptions. The
conditional independence assumption is likely to be satisfied
with many variables in the model, while the overlap assumption
is likely to be satisfied with fewer variables in the model.
This presentation shows how to overcome this conflict by using
Stata 17’s telasso command.
telasso estimates the average treatment effects with high-dimensional controls while using lasso for model selection. This estimator is robust to the model-selection mistakes. Moreover, it is doubly robust, so only one of the outcome or treatment model needs to be correctly specified.
Additional information: Di Liu
StataCorp
|
|||
3:00–4:00 | Session V: Community-contributed, IIIrbiprobit: Recursive bivariate probit estimation and decomposition of marginal effects Abstract: This presentation describes a new Stata command, rbiprobit, for fitting recursive bivariate probit models, which differ from bivariate probit models in allowing the first dependent variable to appear on the right-hand side of the second dependent variable.
Although the estimation of model parameters does not differ from
the bivariate case, the existing commands biprobit and cmp
do not consider the structural model’s recursive nature for
postestimation commands. rbiprobit estimates the model
parameters, computes treatment effects of the first dependent
variable, and gives the marginal effects of independent
variables. In addition, marginal effects can be decomposed into
direct and indirect effects if covariates appear in both
equations. Moreover, the postestimation commands incorporate
the two community-contributed goodness-of-fit tests scoregof
and bphltest. Dependent variables of the recursive probit model
may be binary, ordinal, or a mixture of both. I present and
explain the rbiprobit command and the available postestimation
commands using data from the European Social Survey.
Additional information: Mustafa Coban
Institute for Employment Research
A Stata package to handle metadata Abstract: In this presentation, I offer a brief tour of mdata, a Stata community-contrubuted package that provides a set of tools to help users handle metadata in large and complex datasets.
The package uses an Excel file to store all metadata related to
a dataset. This is particularly useful to edit and modify
metadata outside of Stata, and also to deal with datasets
stored in non-Stata format. The presentation will focus on the
most important features of the package, namely on how to extract
metadata from data in memory, perform consistency checks on the
metadata, apply metadata to data in memory, and compare and
combine metadata from two datasets.
Additional information: Gustavo Iglésias
Microdata Research Laboratory, Banco de Portugal
|
|||
4:15–4:45 | nwxtregress: Network regressions in Stata
Abstract:
In this presentation, I introduce nwxtregress, a new
community-contributed routine to estimate network regressions.
It uses MCMC estimation methods (LeSage and Pace 2009) to
produce estimates of endogenous peer effects, as well as
own-node (direct) and cross-node (indirect) partial effects,
where nodes correspond to cross-sectional units of observation.
nwxtregress is designed to handle unbalanced panels of economic
and social networks as in Grieser et al. (2021). Networks can
be directed or undirected with weighted or unweighted edges, and
they can be imported in a list format that does not require a
shapefile or a Stata spatial weight matrix set by spmatrix.
Finally, the command allows for the inclusion or exclusion of
contextual effects. To improve speed, the command transforms the
spatial weighting matrix into a sparse matrix. Future work will
be targeted toward improving sparse matrix routines, as well as
introducing a framework that allows for multiple networks.
Additional information: Jan Ditzen
Free University of Bozen-Bolzano
|
|||
4:45–6:00 | Session VI: Application study using StataModeling the risk of multimorbidity: An application of multistate models to the Swedish National March Cohort Abstract: Chronic diseases, defined as health problems requiring ongoing management over a period of years or decades, currently represent the predominant burden of healthcare. To address the coexistence of two or more diseases or conditions, I use the term "multimorbidity".
When combined, chronic diseases create additional challenges to
patient care because clinical trials usually exclude patients
with coexisting conditions; therefore, most guidelines do not
provide recommendations for patients presenting with multiple
diseases. With worldwide life expectancy increasing from
45.7 years in 1950 to 72.6 years in 2019 and 20% of people
aged ≥ 65 years in Europe in the same year, understanding
the patterns and risk factors of multimorbidity has become of
great relevance for public health. Multistate models are a
well-suited statistical framework to address this problem.
Additional information: Giulia Peveri
University of Milan
Net Promoter Score–Beyond the measure: A statistical approach based on generalized ordered logit models implemented by Stata to conduct an NPS key drivers’ analysis Abstract: The Net Promoter Score (NPS) index is a popular satisfaction measure that allows one to gauge customer loyalty (CL) at most large and medium-sized firms in different fields.
Because of its impact on a company’s growth, line managers are
strongly interested in knowing which factors can increase NPS by
increasing promoters and decreasing detractors. NPS key
drivers’ analysis (NPS KDA) can be a suitable tool for this
task. A KDA may be conducted by implementing different
statistical approaches for identifying those factors or drivers
with a significant impact on a specific outcome variable. In
the context of NPS KDA, the regression models for ordinal
outcomes represent a statistical approach for identifying those
significant customer experience (CX) attributes that can drive
customer status (CS) from detractors to promoters, leading
companies to design appropriate improvement strategies,
involving those facets of product or service with the highest
improvement priority.
In this presentation, the NPS KDA has been conducted by implementing in Stata two special cases of the generalized ordered logit models, the proportional odds model (POM) and the partial proportional odds model (PPOM), where the dependent variable CS was modeled as a function of different CX attributes.
Additional information: Debora Giovannelli
Florence
Absences from work and climate change: An empirical analysis Abstract: The research aims to observe the Italian regions with most absences from work and verify if there is a relationship between the absences and climate change.
I used the INPS database relating to employees; the time interval
considered was 2009–2018, and the variable credit
difference was examined, which is a measure of the salary that
workers have not received because of absence from work.
Then, the existence of geographical influence between Italian regions was verified through the creation of maps using Stata. From which other variables available, a new variable was created, measures the number of absences of workers for each region. The creation of maps enabled us to see the Italian regions where workers are more absent. Looking only at the sectors most affected by climate change, we see the results vary. Finally, only absences due to sickness and injury were observed, because they could be caused by climate change and extreme weather events. By observing the outliner values of the variable that measures absences from work, we found that extreme weather events actually occurred in the month and in the region in which the value far from the average was recorded.
Additional information: Grazia Errichiello
Università degli Studi di Napoli Parthenope
|
|||
6:00–6:15 | Open panel discussion with Stata developers
Contribute to the Stata community by sharing your feedback with StataCorp's developers. From feature improvements to bug fixes and new ways to analyze data, we want to hear how Stata can be made better for our users.
|
20 May from 9:00 a.m. to 4:30 p.m.
Python integration is one of the most interesting features recently incorporated into Stata, because it allows users to use the wide range of Python packages (opensource) to process, visualize, and explore data within the Stata environment or to incorporate Python codes directly in the do-files of Stata.
This workshop offers participants an excellent opportunity to acquire the programming skills necessary for integrating Python's capability into Stata 17 through a series of examples that allow you to highlight when, and why, you should take advantage of the connectivity between Python and Stata for your own research.
The goal is to offer an overview of the applicability of the Python programming language within Stata.
Operational knowledge of Stata. Knowledge of Python is not required, although it will be an advantage.
Una-Louise Bell TStat – TStat Training |
Rino Bellocco University of Milano-Bicocca |
Giovanni Capelli University of Cassino and Southern Lazio |
Maurizio Pisati University of Milano-Bicocca |
The logistics organizer for the 2024 Italian Stata Conference is TStat S.r.l., the distributor of Stata for Italy, Albania, Bosnia and Herzegovina, Greece, Kosovo, North Macedonia, Malta, Montenegro, Serbia, Slovakia, and Slovenia.
View the proceedings of previous Stata Conferences and Users Group meetings.