The 2016 Oceania Stata Users Group meeting was September 29–30, but you can still interact with the user community even after the meeting and learn more about the presentations shared.
Proceedings
9:10–10:40 |
Abstract:
I present a comprehensive overview of my tabout module, a Stata ado-program for
the batch production of publication-quality tables. I explain the philosophy behind the
program, touching on issues of aesthetics, functionality, and reproducible research. I
demonstrate the use of tabout to show how easy it is to produce publication-quality
multidimensional tables in a number of different formats and styles. tabout does not cover
estimation tables. Extending tabout by incorporating more advanced Stata features—such as
macros and loops—is also explained, and Stata users are encouraged to extend their skills
in this area.
In the final part of the presentation, I will provide an overview of some forthcoming changes in tabout. These incorporate a number of new advanced features, as well as some long overdue enhancements—such as removing unwanted columns. Many of these new features are designed to make tabout more efficient and flexible. These include the use of configuration files, where users can save customized sets of tabout options in files that can be loaded when tabout runs. Better integration with word processors, such as Microsoft Word, will also be incorporated into the new version of tabout. This will allow users to streamline their exporting of tabout output to their word processor and prescribe the formatting of that output. While word processors will never be as versatile as LaTeX, some of the efficiencies of the latter can be realized within a word-processor environment, and my presentation of the new version of tabout will illustrate this. I will conclude by inviting existing users of tabout to provide feedback on their use of the program and to suggest enhancements they would like to see in future versions.
Additional information
watson-oceania16.pdf Ian Watson
Macquarie University and SPRC, UNSW
|
11:00–12:30 |
Abstract:
Stata has multiple estimators that account for endogeneity. I will briefly discuss these estimators
and their assumptions. However, my main focus will be to talk about estimators that account for
endogeneity that are not in Stata and can be implemented using gsem and gmm.
Additional information
pinzon-oceania16.pdf Enrique Pinzón
StataCorp LP |
1:30–2:00 |
Abstract:
The statistic most commonly used to evaluate the adequacy of the logistic regression model is the
Hosmer–Lemeshow statistic. The authors proposed a goodness-of-fit test based on
partitioning the fitted probabilities into a number of groups and compared observed events with
expected events within each group. They showed via simulations that the resulting statistic
follows a chi-squared distribution with degrees of freedom approximately equal to the number
of groups minus two. The normalized unweighted sum of squares (USOS) test also assesses
model adequacy and is based on a statistic originally proposed by Copas. In this talk, I compare the
Hosmer–Lemeshow and USOS statistic in binary regression models with the
complementary log-log regression, and I describe the usos command that calculates the
statistic.
Additional information
quinn-oceania16.pdf Steve Quinn
Flinders University
|
2:00–2:30 |
Abstract:
The getpatent command crawls relevant websites that store patent-related information to store
the source code and then uses regular expressions to web-scrape key patent data into Stata,
gradually building a database. The database holds observations on official patent application
numbers and dates, the granting date, inventors and patent's name, classification codes and
patent claims, plus cross-referencing data on the number of patent backward and forward
citations.
Additional information
ma-oceania16.pdf Le Ma
University of Technology Sydney
|
?:??–?:?? |
Abstract:
Tagging a segmentation solution for large data sets is problematic when the segmentation
is built on soft variables such as attitudes, interests and opinions, because the tagging
variables are usually demographics. We examine an alternative approach to increasing the
tagging success of soft variable segmentations for large data sets.
Con Menictas
University of Newcastle
|
2:30–3:00 |
Abstract:
Self-control designs are necessitated in situations where there is a desire to assess the
effectiveness of an intervention in a small-study population. One such situation is the
assessment of a treatment modality called abdominal functional electrical stimulation and
whether or not its application results in improved respiratory function in patients suffering from
paralysis. Further, meta-analysis of studies applying a self-control study design with repeat
measures require adaptation of established methods in order to perform scientifically sound
analysis. In this study, we applied a methodology using a specific adaptation of METAN to carry
out this complex statistical analysis.
Studies that met inclusion criteria were classified into two broad categories: acute and chronic. Acute studies compared respiratory function prior to and during abdominal functional electronic stimulation (FES). Chronic studies measured the chronic effect of abdominal (FES) training. For both acute and chronic studies, analyses were carried out using either fixed-effects models, using the inverse of the variance (IV) approach, or random-effects models, using the DerSimonian and Laird (D-L) approach. Model choice was determined by the between-study heterogeneity of pooled results, using the I2 statistic. Becasuse of differences in baseline function between studies, estimates of effect were made using the standardized mean difference (SMD), applying Glass's △. This method is preferred where the intervention may potentially alter observed variability and is less susceptible to small-sample bias than other SMD techniques. Multiple models were applied to compare time points in the self-control chronic studies, with similar analysis applied to RCTs at equal time points. A descriptive approach was used to analyze trends observed in the chronic studies, with data normalized based on minimum within-study values for each measure of respiratory function. Publication bias was assessed using the Begg and Mazumdar test and the Eggar approach. All statistical analyses were carried out using Stata 14. This methodology was successfully applied and is in press (McCaughey et al. Abdominal functional electrical stimulation to improve respiratory function after spinal cord injury: A systematic review and meta-analysis. Spinal Cord [accepted 2015]). The methodology, applying computational methods enabled by Stata represents an important approach to the meta-analysis of self-control study designs.
Additional information
borotkanics-oceania16.pdf Robert Borotkanics
Macquarie University
|
3:20–4:50 |
Abstract:
Stata boasts an impressive graphics engine with an extensive suite of visualization capabilities.
The challenging aspect of this richness is its overwhelming syntax. The workflow for data
visualization brings structure to the vast syntax by organizing graph code consistent with
graphics theory.
Demetris Christodoulou
MEAFA, The University of Sydney
|
9:10–10:40 |
Abstract:
Do you suffer from the tedium of moving statistical results by hand from
Stata into your research documents or reports? Have you ever had the
nightmare of updating a document because of changes to your analysis only to find that you
missed some results? Have you ever dreamed of automating production of otherwise brainnumbing
standarized reports? If so, you need dynamic documents. Dynamic documents get
their name from their ability to update their statistical results when they are created, ensuring
complete reproducibility and mimimal maintenance. In the world of Stata, there are quite a few
user-written packages for creating dynamic documents, both from within Stata and from within
other applications that call back to Stata. In this talk, I'll briefly demonstrate a few different
packages, each with their own strengths. You can then choose your package, get more done,
and sleep more easily at night.
Additional information
rising-oceania16.pdf Bill Rising
StataCorp LP
|
11:00–11:30 |
Abstract:
Cassava is the second most important food crop in Africa after maize. It is a major staple crop for
more than 200 million people in East and Central Africa, most of them living in poverty in rural
areas. However, its production is undermined by several factors, particularly the problem of
emerging and endemic pests and diseases. We conducted a comprehensive socio-economic
study covering Uganda, Tanzania, and Malawi to determine the status of cassava production with
the following specific objectives and research questions:
Paul Mwebaze
CSIRO
|
11:30–12:00 |
Abstract:
Line plots encode a series of slopes from adjoining coordinates and aim to reveal suggestive
patterns in the sequential rates of change. The judged prevalence of patterns in the bivariate
series and the degree of steepness in the rates of change are largely determined by the choice of
aspect ratio that is imposed on the line plot. Choosing an appropriate aspect ratio is key in
designing informative line plots. The command optaspect calculates the optimal aspect ratio in a
two-variable line graph using a number of heuristic criteria.
Demetris Christodoulou
MEAFA, The University of Sydney
|
1:30–2:30 |
Abstract:
An increasing number of social sciences are now paying much closer attention to the effect of
context on behavior: how the characteristics of the neighborhood moderate the behavior of
residents, for example, or the degree to which characteristics of the workplace condition job
satisfaction. The classic application in the social sciences is how the performance of pupils is
moderated by characteristics of their class and their school. In each case, level 1 units of
analysis (usually individuals) are nested within level 2 or level 3 categories.
In each of the above examples, individuals are clustered either spatially or organizationally (or both). Multilevel modeling is now a standard way of addressing not only the need to recognize the lack of statistical independence that joint membership of given contexts usually brings but also the relationship that context plays theoretically. My presentation will introduce the capabilities of two commands, mixed-effects linear regression (mixed) and mixed-effects binary regression (melogit). Special attention will be paid to postestimation and the graphical representation of intercept and slope effects, including the use of margins. I will reflect on how much additional information about specific behaviors I have learned by applying these applications in Stata 14 in my home discipline of human geography.
Additional information
morrison-oceania16.pdf Philip Morrison
Victoria University of Wellington
|
2:30–3:00 |
Abstract:
In this presentation we talk about the challenges of teaching with Stata students from nonscience
(and science) backgrounds who are taking first steps in their methodological training at the
university level. We are newcomers to Stata as a teaching tool, although we have used it for
years for our research.
In social sciences, such as sociology or criminology, a typical introductory course covers the rudiments of statistical theory and analytical methods ranging from cross-tabulations through Pearson Product-Moment correlations to ordinary least-square regressions. Stata offers simple command language to execute analyses needed to generate the relevant tables, but the output for these procedures is not easy to control in Stata. More advanced users of Stata employ user-written procedures such as tabout or estout to produce publication-quality tables. However, for our students, these procedures are too complex to use. Or so we believe at the moment, having perused the standard documentation and examples for these procedures. We would like to start a conversation about the best ways of creating publication-quality tables easily using Stata output. In our experience, even the standard "right-click" and copy table solution often does not work in practice as it should in theory. We start the conversation by showing three examples of tables we need to easily generate in Stata.
Additional information
sikora-oceania16.pdf Panelist: Joanna Sikora
Australian National University
Panelist: Philip Morrison
Victoria University of Wellington
Panelist: Bill Rising
StataCorp
|
3:20–3:50 |
Abstract:
table1 is a Stata ado-program that produces one- and two-way tables of summary statistics for
a list of numeric variables. The rows of the table are formed from the list of specified variables.
If no by-variable is specified, the table has only one column of results. If a by-variable is
specified, the table has a column of results for each level of the by-variable, with an optional
additional totals column. Unlike other Stata tabulation commands (such as tabulate, table,
and tabstat),
the row variables can be a mixture of continuous variables (summarized by mean, standard deviation,
etc.) and categorical variables (summarized by percentages and frequencies).
Additional features include (i) several different options for displaying missing and non-missing counts; (ii) considerable fexibility in the way the results are displayed, in particular, the summary statistics and their possible different presentation for each row variable; (iii) results being restricted to subgroups of the data for individual row variables; (iv) the contents of the table being saved as a Stata data file or text file or exported to Excel. The motivation for table1 is the descriptive table commonly seen in health research publications in which the baseline characteristics of two or more groups are compared. This descriptive table usually has only one column for each group, generally with at least two summary statistics in each column (for example, mean and standard deviation for continuous variables or percentage and frequency for categorical variables). The output of table1 therefore differs from that of tabout in that there is only a single column for each group. The aim of table1 is to assist with reproducible research by enabling creation of a table whose contents can be used unchanged in publications.
Additional information
donath-oceania16.pdf Susan Donath
Murdoch Children’s Research Institute / The University of Melbourne
|
3:50–4:20 |
Abstract:
Depression is a common mental illness worldwide. The World Health Organization (WHO)
estimates that 350 million people of all ages suffer from depression globally. This illness affects a
person's well-being, ability to work, and social interactions. However, many suffer from undiagnosed
depression. The aim of this analysis was to develop a risk index for depression using a well-known US
population-based sample.
Depression was measured using a self-report diagnostic and dichotomized into those with and without depression. A number of generalized structural equation models (GSEM) using Stata 14 were developed with depression as the outcome to form a final path model for the index. SEM models utilized a set of statistical techniques to measure and analyze relationships between a set of observed biomarkers, lifestyle and medical symptom indicators (path analysis), and a latent diet variable (confirmatory factor analysis) with depression. Linear causal relationships among variables were examined while simultaneously accounting for measurement error. Using Stata's gsem command with the complex multistage survey sample meant the point estimates, standard errors, and tests were adjusted accordingly. The final model consisted of more than one dependent variable with multiple direct and indirect effects. The model was tested across certain key demographic groups to ensure configurable invariance. Joanna Dipnall
Deakin University
|
Stata Users Group Workshops
There will be Stata Users Groups workshops the preceding day, September 28. Choose from one of the two topics below (lecture plus hands-on workshops).
Workshop Topic #1: Introduction to Causal Inference in Stata
Introductory training via interactive lecturing and practical exercises, covering the basics of causal inference, including propensity scores; marginal structural models (MSMs); causal mediation analysis, and G-estimation. This course assumes a basic knowledge of how to operate Stata. Participants should have completed a first course in statistics for nonspecialists and at least be familiar with multivariable regression models.
Trainer: Lyle Gurrin
Associate Professor Lyle C. Gurrin is a teaching and research academic in biostatistics at the Melbourne School of Population and Global Health, which he joined in 2003. Prior to that, he held senior biostatistician positions in Perth at large public hospitals and associated medical research institutes devoted to women’s and children’s health. He is a Chief or Principal Investigator on several large, international, multidisciplinary studies of health and disease in both early life (infant food allergy, childhood adversity and well-being) and later years (hereditary haemochromatosis and men’s health). He promotes the sound practice of statistical reasoning by teaching short courses and classes of postgraduate students and has methodological inte rests in the analysis of longitudinal and correlated data, and causal inference in observational studies.
Trainer: Jessica Kasza
Dr. Jessica Kasza is a biostatistician in the Department of Epidemiology and Preventive Medicine at Monash University. After completing a Ph.D. in 2010 at the University of Adelaide, she spent time at the University of Copenhagen before returning to the University of Adelaide. She has been at Monash University since April 2013. Her research interests include causal inference methodology for the comparison of treatments, and methodology for the comparison of the performance of health care providers. She has a strong interest in the translation and dissemination of complex statistical methodology.
Workshop Topic #2: Bayesian Analysis Using Stata
Bayesian analysis provides a theoretically more intuitive approach to statistical inference and model selection and provides practical computational advantages in implementing complex statistical models. This course presents a basic overview of Bayesian statistics and its implementation in Stata. Lectures will cover an introduction to basic Bayesian models (one parameter and normal models), Bayesian implementation of linear and generalized linear models, and a few examples of complex extensions (including change point models, variable selection, multivariate and multilevel regression, measurement models and structural equations, latent class and mixture models, etc.). Labs will focus on the implementation of these methods with the new Bayesian commands introduced in Stata 14 and include coverage of available user-written commands, examples of direct implementation in Mata, and analysis of Bayesian simulation output produced from other programs.
Trainer: Shawn Treier
Shawn Treier is a lecturer at the School of Politics and International Relations at the Australian National University and received his Ph.D. from Stanford University. His research involves the application of Bayesian measurement models to the study of political institutions, political behavior and public opinion, and the measurement of democracy.
His work has appeared in the American Journal of Political Science, Political Analysis, Journal of Politics, Public Opinion Quarterly, American Politics Research, and Legislative Studies Quarterly.
Organizers
Scientific committee
Demetris Christodoulou
University of Sydney
Rob Herbert
Neuroscience Research Australia
Logistics organizer
The logistics organizer for the 2016 Oceania Stata Users Group meeting is Survey Design and Analysis Services Pty Ltd, the distributor of Stata in Australia and New Zealand.
View the proceedings of previous Stata Users Group meetings.