8:15–9:15 | Running machine learning in Stata: Performance and usability evaluation
Abstract:
This presentation provides a comprehensive survey reviewing
machine learning (ML) commands in Stata. It will systematically
categorize and summarize the available ML commands in Stata and
evaluate their performance and usability for different tasks
such as classification, regression, clustering, and dimension
reduction. The presentation also provides examples of how to use
these commands with real-world datasets and compare their
performance. This review aims to help researchers and
practitioners choose appropriate ML methods and related Stata
tools for their specific research questions and datasets and to
improve the efficiency and reproducibility of ML analyses using
Stata. It concludes by discussing some limitations and future
directions for ML research in Stata.
Additional information:
Giovanni Cerulli
IRCrES-CNR
|
9:15–10:00 | pystacked and ddml: Machine learning for prediction and causal inference in Stata
Additional information:
Mark Schaffer
Heriot-Watt University
|
10:05–11:05 | Bayesian model averaging
Abstract:
Are you unsure which predictors to include in your model? Rather
than choosing one model, aggregate results across all candidate
models to account for model uncertainty with Bayesian model
averaging (BMA). Which predictors are important given the
observed data? Which models are more plausible? How do
predictors relate to each other across different models? BMA can
answer these questions and many more.
Stata 18 introduced the bma suite of commands to perform BMA in linear regression models. In this talk, you will learn how to explore influential models, make inferences, and obtain better predictions with BMA. I will demonstrate the utility of BMA for any researcher—Bayesian, frequentist, and everyone in between! No prior knowledge of the Bayesian framework is required.
Additional information:
Meghan Cain
StataCorp
|
11:05–11:35 | Sectoral reallocation and income growth in the labor market during the COVID-19 pandemic
Abstract:
This presentation investigates the effects of the COVID-19
pandemic on the labor market in New Zealand. Utilizing a
comprehensive administrative dataset, I delve into the
intricacies of labor reallocation during the pandemic, while
establishing links between these reallocations and two distinct
measures of income growth. Our findings reveal that COVID-19
presented as an atypical and relatively persistent reallocation
shock to the New Zealand labor market. Notably, the surge in
job-to-job transitions primarily stemmed from transitions
between industries, rather than those within industries.
Moreover, it is these between-industry transitions that
exhibited a positive correlation with overall income growth in
the labor market.
Contributor:
Guanyu Zheng
Ministry of Business, Innovation and Employment
Additional information:
Marea Sing
Reserve Bank of New Zealand
|
11:35–12:05 | Machine learning techniques to predict timeliness of care among lung cancer patients
Abstract:
Delays in the assessment, management, and treatment of lung
cancer patients may adversely impact prognosis and survival.
This study is the first to use machine learning techniques to
predict the quality and timeliness of care among lung cancer
patients, utilizing data from the Victorian Lung Cancer Registry
(VLCR) between 2011 and 2022, in Victoria, Australia.
Additional information:
Arul Earnest
Monash University
|
12:05–12:50 | Stata developer feedback session
Meghan Cain
StataCorp
|
1:20–1:50 | ChatGPT and other large language models: How useful are they to statisticians using Stata?
Abstract:
Some statisticians, including Stata users, are already using
ChatGPT and other LLMs for answers to questions about
statistics, code generation, or data processing (for example,
sentiment analysis). Some researchers may already be using the
technology to automatically perform their analyses. This
presentation explores these four uses through examples and brief
case studies.
Additional information:
Andrew Gray
University of Otago
|
1:50–2:20 | Beauty of Stata: Relevant and plausible
Abstract:
Stata software makes it easy for users in medical and health
sciences research fields because of its easy data transfer from
other databases, competent intermediate and advanced statistical
methods by both common and menu options, relevant and meaningful
output for making inferences, interpretation and conclusion for
both interventional (clinical and community trials), and
observational studies (cohort, case–control and cross-sectional
studies as examples). It is also applicable and friendly to
determine minimum required sample size with appropriate power
for those studies. Various regression methods, general linear
models, and cross-sectional time series are frequently used by
these researchers. Step-by-step procedures of statistical
analyses using Stata are taught to academic staff in
universities, researchers at research institutes, clinicians and
health personnel at ministries of health, biostatisticians,
epidemiologists, and pharmaceutical companies' staff from the
levels of basic to intermediate to advanced. The favorite
features of Stata based on feedback by users include the log file,
do-file, and ado-file. Output of epidemiological studies are
much superior to those of other software in terms of relevance
and biological plausibility. The regular added features of Stata
in new versions make the users more loyal to the software
because of up-to-date applications to their particular field of
research.
Additional information:
Nyi Nyi Naing
Universiti Sultan Zainal Abidin
|
2:20–3:05 | Panel discussion: Tips for teaching Stata
Abstract:
Stata, a globally recognized software, is pivotal in teaching
statistics and data analysis across diverse university
disciplines, including biostatistics, economics, econometrics,
epidemiology, health sciences, and social sciences. This panel
session offers a unique opportunity to delve into the
experiences of three distinguished lecturers who have
extensively utilized Stata in their teaching endeavors for many
years.
Additional information:
Tai Bee Choo (Saw Swee Hock School of Public Health), Siew-Pang Chan, and Chris Erwin (Auckland University of Technology)
|
3:10–3:40 | Nice log (and log-like) scaled axes
Abstract:
In this presentation, I will show how to i) create graph
commands, which nicely label a log-scaled axis, and ii) produce
a nice log-like-scaled axis showing 0 and ∞.
With the exception of meta forestplot, Stata does not automatically label a log-scaled axis with multiplicative labels, for example, 1/4, 1/2, 1, 2, 4. With a twoway graph, specifying yscale(log) will create a log-scaled y axis but with additive labels, for example, 1, 2, 3, 4. The niceloglabels command (Cox 2018) can suggest a variety of nice multiplicative labels, which can benefit community-contributed graph commands that use log-scaled axes. However, decisions still need to be made such as when to choose which set of labels. There is no log-scale equivalent of natscale to do this for you. I will show how I overcame this for my blandaltman and box_logscale commands (Chatfield 2023). The latter is an example of working with log-transformed data but labeling the axis with multiplicative, original-scale labels. The mylabels command (Cox 2022) is helpful here. I will also show how to use other transformations such as asinh(y/#) or logistic(#*log(y/#)) to produce a nice log-like-scaled axis showing 0 and ∞.
Additional information:
Mark Chatfield
University of Queensland
|
3:40–4:10 | Answering Stata assignments using generative artificial intelligence: An example
Abstract:
ChatGPT and Bard are now part of the research landscape. They
are tools being used daily by students, professionals, academics,
and researchers. We can choose to ignore them or acknowledge
that they have a part in our practice. In this presentation, we
demonstrate how these tools can be used (ineffectively and
effectively) to develop answers to real assignment questions
using Stata.
Contributor:
Amy Grant
Survey Design and Analysis Services
Additional information:
David White
Survey Design and Analysis Services
|
4:10–4:40 | EpiTable
Abstract:
Exporting results of multivariable models to a Word document can
be time consuming. This presentation covers the epitable2
and epitable3 packages developed to create table 2 and
table 3 used in epidemiological studies.
Additional information:
Zumin Shi
Qatar University
|
The logistics organizer for the 2024 Oceania Stata Conference is Survey Design and Analysis Services (SDAS), the distributor of Stata in Australia, Indonesia, and New Zealand.
View the proceedings of previous Stata Conferences and Users Group meetings.