Last updated: 13 November 2009
2009 Canada Stata Users Group meeting
Thursday, 22 October 2009
Pantages Hotel
200 Victoria Street
Toronto, ON M5B 1V8
Proceedings
Reflections on how Stata enhances creativity and problem solving in the real world
Lee Sieswerda
Thunder Bay District Health Unit
Many people think of science and statistics as dry and even lifeless
endeavors. In fact, the wellspring of good science and statistics is
creativity, and creativity is enhanced by memory, imagination, beauty, and
collaboration. I will discuss some of the Stata features that I believe work
together to stimulate the creative impulse, including a unified interface
design and syntax structure, the “type a little, get a little” paradigm, a
very large suite of statistical procedures, expandability, excellent
documentation, automation, beautiful and flexible graphics, and
inter-operability with other statistical packages.
Additional information
ca09_sieswerda.ppt
Automating the production of descriptive tables at Statistics Canada: mog.ado, a user-written program
Matt Hurst
Statistics Canada
Research at Canadian Social Trends within Statistics Canada, Canada’s
premiere Statistical agency, often involves the creation and analysis of
numerous descriptive tables. These tables provide convenient and
easy-to-understand information for the general public, one of our many
clients. Analysis generally requires an understanding of what estimates are
statistically different from each other. Statistics Canada’s quality control
measures require that any released estimates pass reliability and
confidentiality standards. Both of these needs are often operationalized by
numerous lines of Stata code after the use of a command, such as
mean.
This presentation is about a user-designed program,
mog, that is
essentially a front-end for the
mean and
test commands. It
produces a fixed-width table of means over the groups
specified. This table can then be easily copied into other productivity
tools (Word, Excel, Open Office Apps, etc.) for any additional formatting
and publication. The key is that the results are tabular and can copy
properly as a table, significance tests of estimates versus a reference
group are already performed and indicated, and quality control symbols
indicating minimum sample size and individual significance are shown. I
plan to present the amount of code to perform the tasks the old way, and
thus time saved using the command, as well as the many options it has.
Additional information
ca09_hurst.ppt
Using Stata graphs to visually monitor the progress of multicenter randomized clinical trials
Glenn Jones
McMaster University
Alexandra Whate
University of Guelph
Medical randomized trials testing treatments are a complex technology and
they require regular attention to assure data quality prior to definitive
analysis. Typically, only simple non-graphical methods (tables,
proportions) help monitor trial progress, except one graph of cumulative
accrual over calendar time; surprisingly, graphical methods are largely
ignored for this purpose. For six multicenter trials of the
International Atomic Energy Agency, we have developed a graphical
approach to data management and trial monitoring, using histograms,
scatterplots, dot plots, and cumulative distributions as indicators of
overall study and investigator-specific quality. Monthly reports are
automated (do-files) and are sent as slideshows by email to
investigators and the International Atomic Energy Agency staff. Visual
patterns and shapes of curves facilitate early and rapid identification of
issues. Clear pictures help investigators to better adhere to a protocol and
improve accuracy and completeness of trial data. Visual methods assist in
the tracking patients, submitting forms, and clarifying data. Clinical
investigators find graphs to be far more intuitive, engaging, efficient,
meaningful, and compelling, as compared with conventional tables and text
(especially in developing countries where statistical training and language
barriers may interfere). This presentation will demonstrate our visual
strategy to trial management and explore how this may be optimized.
Additional information
ca09_whate.ppt
Using and teaching Stata in emergency medicine research rotation
Muhammad Waseem
Lincoln Medical & Mental Health Center
Participation in scholarly activities is a requirement in Emergency Medicine
(EM) Residency Curriculum. A research project is a necessity for graduation
for EM residents. To fulfill this requirement, EM residents have a
mandatory research rotation. During this rotation, residents learn basic
research designs, write protocols for IRB, and collect data. In addition,
they are required to understand basic statistical concepts before the data
are analyzed. I believe that their understanding will be enhanced if they
are provided with the basic knowledge of a statistical program. During the
EM research rotation, residents are introduced to Stata and research methods.
I developed a manual explaining the basic operation of Stata, which includes,
but is not restricted to the following: pull-down menus (rather than
commands), 4 windows, 9 tabs, basic commands with pull-down menus,
description and summarization of data, tables of frequencies, tables of mean,
data input, data output, data import, saving files, graph commands with
dialog boxes, box plots, histograms, and scatterplots. In my experience,
introduction to Stata facilitated accurate data recording. It also provided
residents the experience necessary to navigate Stata following the
completion of the research rotation.
Additional information
ca09_waseem.ppt
Teaching Stata—Some reflections after 8 years of training experiences
Karen Robson
York University
This presentation focuses on the author’s 8 years of experience teaching
Stata to international audiences—primarily at the Essex Summer School
in Social Science Data Analysis and Collection in the United Kingdom, but
also in the World Bank funded statistical capacity-building initiatives in
Bosnia-Herzegovina and Albania. The author has recently co-authored (with
David Pevalin)
The Stata Survival Manual, published by Open
University/McGraw Hill. The author will focus on common student
questions and some approaches she has used to assist students in learning
the software.
Additional information
ca09_robson.ppt
Teaching Stata and statistics in contexts of evidence-based medicine and clinical trials
Glenn Jones
McMaster University
Alexandra Whate
University of Guelph
International experiences with students (high school, medical) and clinical
investigators (courses, trials' meetings) demonstrate that Stata is highly
visual, intuitive, and relatively straightforward. Stata helps the teacher
communicate efficiently and effectively about methods and concepts relating
to data management, statistics, reporting, the nature of evidence and
causality, and the technology of trials. For example, core aspects of
medical research (randomized trials, survival plots) do not require
sophisticated modeling methods and are essential (i.e. repeatedly used to
answer different questions). A subset of Stata components aligns with
non-Stata course content to constitute a "basic curriculum" for individuals
without much statistical training or research experience. Hands-on use of
Stata (e.g. individual laptops) using a small set of concocted databases
with highly relevant questions may be matched in real-time to a
presentation of course content. Stata quickly becomes an easy "add-on" to
an organized presentation of course content. Consistent with educational
psychology, the combination of didactic presentation and dynamic (i.e.
Stata) interactions more effectively engages learners and improves learning
and retention. Learners simultaneously pick up Stata as a skill.
Theoretical and practical features of this teaching approach, relevant from
elementary school to medical professionals and clinical investigators, will
be described and demonstrated.
Additional information
ca09_jones.pptx
Survey data analysis in Stata
Jeff Pitblado
StataCorp
In this presentation, I cover how to use Stata for survey data analysis
assuming a fixed population. We will begin by reviewing the sampling methods
used to collect survey data, and how they affect the estimation of totals,
ratios, and regression coefficients. We will then cover the three variance
estimators implemented in Stata’s survey estimation commands. Strata with
a single sampling unit, certainty sampling units, subpopulation estimation,
and poststratification will be also covered in some detail.
Additional information
ca09_pitblado_presentation.pdf
ca09_pitblado_handout.pdf
ca09_pitblado_stata.zip
Data cleaning in Stata using Internet search engines
Sergiy Radyakin
The World Bank
Open-ended questions can be a nightmare for statistical processing. Any
mistake in spelling can result in a mismatch during merging, or multiple
counting of the same object. For example, the answers to the
"place-of-birth" question might be "Chicago" and "San Francisco", but in
practice they are often "Chicaga" and "SanFrancisko". Manual correction of
hundreds of answers is tedious, and becomes infeasible with a larger
dataset. For a long time, algorithms like SOUNDEX remained the only
alternative for researchers. A new Stata command allows taking advantage of
Internet search engines, like Google or Yahoo to find proper substitutes
for an unclear word or multiple words. The distinctive feature of the search
engines is that they rely not only on the spelling similarity, but are also
context driven: other words may affect the suggestion, such as including
"city" into the query. This will hint to the search engine to give more
priority to the names of cities. This presentation will demonstrate this
new command and explain the main steps necessary to programmatically acquire
information available on the Internet and convert it into Stata-usable
format. Keywords: data cleaning, search engine, spelling correction.
Additional information
ca09_radyakin.pdf
ca09_radyakin.wmv
Using Stata with Statistics Canada data: Incorporating complex survey design into analysis
Leslie-Anne Keown
Statistics Canada
Most Statistics Canada data are based on surveys with complex survey
designs. To allow users to account for the survey design in their analyses,
Statistics Canada generally provides both a probability weight and a set of
survey bootstrap weights in the survey data files. This presentation will
give an overview of how to use the survey commands in Stata to account for
the complex survey design using the weights and bootstrap weights provided.
It will also give some practical advice on using these elements with
various surveys and some of the pitfalls to avoid.
Additional information
ca09_keown.pptx
Report to users
Jeff Pitblado
StataCorp