Join us for the 2025 Stata Biostatistics and Epidemiology Virtual Symposium, a meeting of researchers in biostatistics and epidemiology from around the world discussing current theory and applied methods using Stata. The program consists of invited talks by top Stata users, and the virtual platform allows you to experience this one-day event from wherever you are.
Enjoy insightful and informative presentations by these experienced Stata users in the field.
Sigrid Leithe
Cancer Registry of Norway
Robert Thiesmeier
Karolinska Institutet
Bianca De Stavola
UCL Great Ormond Street Institute of Child Health
Joie Ensor
University of Birmingham
Giselle Kolenic
University of Michigan
Alyssa Bilinski
Brown University
All times Central Standard Time
9:00 a.m.
Balancing the privacy-utility tradeoff for synthetic time-to-event data
Sigrid Leithe, Cancer Registry of Norway
View abstract
Generation of synthetic patient records can preserve the structure and statistical properties of the original data without violating privacy, providing access to high-quality data for research and innovation. Few synthetization methods account for the censoring mechanism in time-to-event data, and formal privacy risk evaluations are often lacking. Improvements in synthetic data utility come with increased risks of privacy disclosure, necessitating a careful evaluation to obtain the proper balance. In this talk, I will demonstrate a method for generating synthetic time-to-event data based on regression models and a flexible parametric survival model in Stata. I show how to evaluate the synthetic data utility and present a method for estimating the privacy loss from publishing a synthetic dataset.
9:45 a.m.
Multiple imputation for recovering missing values when data cannot be shared
Robert Thiesmeier, Karolinska Institutet
View abstract
Multisite studies are increasingly used to study human health across different populations and countries. However, a common challenge in using data from multiple studies is the presence of systematically missing values – when some studies have not recorded information on certain variables. Although it is possible to use data from sites with recorded observations to impute the missing values, this process becomes challenging when data pooling is not feasible because of logistic or legal constraints. We address this by introducing a framework for multiple imputation across study sites without the need of sharing individual data. In this talk, we present some motivating examples alongside a new command mi impute from that can handle the imputation of binary, discrete, and continuous variables. Given the increasing importance of multisite studies in medical and epidemiological research, mi impute from can offer a practical approach for imputing variables that have not been recorded in some study sites.
10:15 a.m.
Break
10:30 a.m.
Stata: A short history viewed through epidemiology
Bianca De Stavola, UCL Great Ormond Street Institute of Child Health
View abstract
In this talk, I will use personal recollections to revisit the challenges many public health researchers have faced since the birth of Stata in 1985. I will discuss how, from the 1990s onward, the increasing demands for data management and analysis were met by Stata developers and the broader Stata community, particularly by Michael Hills. Additionally, I will review how Stata's expansion in scope and capacity with each new version has enhanced our ability to train new generations of medical statisticians and epidemiologists. Finally, I will reflect on current and future challenges.
11:00 a.m.
Harnessing uncertainty in clinical prediction models using Stata
Joie Ensor, University of Birmingham
View abstract
Development of new clinical prediction models is in vogue, with many showing off their ill-fitting wares on journal runways. The vast majority of these models ultimately aim to inform care for the individual, based on the probability of their outcome as calculated by the prediction model. We should all therefore be concerned about the reliability of such models.
Unfortunately, most models are ill-fitting, developed using small samples, exacerbating overfitting and leading to large uncertainty in model predictions for an individual. This issue makes internal validation non-negotiable in the development of any new model, and its reporting is mandated by the recent TRIPOD+AI guidelines. At the development stage we know that our model and any estimates of performance are optimistic – our model is fitted to our data and so should perform well. Therefore, we commonly assess the internal validity of our model using bootstrapping, allowing us to quantify the optimism in our development process and uncertainty in the model predictions, giving a better feel for how accurate and reliable our model is.
In this talk I will discuss the concept of model uncertainty and demonstrate how our new Stata packages allow developers to estimate uncertainty in their model and harness this information to inform the next steps in the pipeline of their model.
11:45 a.m.
Lunch
1:00 p.m.
Increase efficiency and reproducibility in clinical trial reporting with Stata tools
Giselle Kolenic, University of Michigan
View abstract
The Statistical Analysis of Biomedical and Education Research Group (SABER) unit of the Department of Biostatistics is an academic data coordinating center (DCC) that provides expertise in the design, conduct, and analysis of multicenter clinical trials. These trials often require reporting to Data and Safety Monitoring Boards (DSMBs), usually every six months over the course of multiple years. DSMB members are provided reports that contain tables, listings, and figures (TLFs) that summarize cumulative data for evidence of study-related adverse events, adherence to the protocol, site performance, compliance with recruitment and retention goals, and data quality, timeliness, and completeness. Stata tools can be used for consistent generation of TLFs and DSMB reports over the life of trials, increasing efficiency and reproducibility. This presentation provides an overview and illustration of some of these tools, including the putdocx Stata command.
2:00 p.m.
Title forthcoming
Alyssa Bilinski, Brown University
View abstract
Abstract forthcoming
2:45 p.m.
Adjourn
The symposium is conducted in real time and will not be recorded, so all registered users are encouraged to attend. Login information will be sent to registered users on 19 February. Seats are limited. Register now.