Last updated: 9 June 2003
2003 Irish Stata Users Group Meeting
Thursday, 22 May 2003
Trinity College
Maxwell Theatre, Hamilton Building
Dublin, Ireland
22 May 2003
Proceedings
Arnaud Chevalier,
University College Dublin
-
Abstract
Propensity score matching has recently become a popular estimator. The basic
idea is to calculate a propensity of being treated and then match a treated
individual with a nontreated individual with a similar propensity score. The
estimate will be unbiased as long as the selection to treatment is based on
observable characteristics and if a common support is found. The common
support is basically that all treated observations can be matched with a
control. This programme provides some of the results needed to document the
common support assumption and can be used after psmatch.
Nick Cox,
Durham University, UK
-
Abstract
Most statistical data analysis, and thus most graphical data
analysis, is directed towards modelling of relationships,
but many statistical problems have a different flavour:
their focus is comparison, and the key question is assessing
agreement or disagreement between two or more datasets or subsets
with variables measured in the same units. I survey the range of
official and user-written graphical programs available
in Stata 8 for such problems, with emphasis on making
use of all the information in the data. Recurrent themes
include (1) the use of reference lines, especially horizontal
reference lines, indicating benchmark cases; (2) the relative merits
of superimposition and juxtaposition; (3) how far methods work well
at a range of sample sizes; (4) standing on giant's shoulders by
writing wrappers around existing Stata commands; (5) use (and abuse)
of summary statistics appropriate for such problems.
Cliona Molony, Tony Fitzgerald, Denis Shields (presenting author),
Royal College of Surgeons in Ireland
-
Abstract
There are about 30,000 genes in the human genome and a number of variants per
gene. Case–control studies are sensitive to non-independence of genetic
factors whose frequencies cluster according to population history. Corrections
for confounding have been presented in the literature. A simple implementation
of one of these tests in Stata is shown by simulation to be reasonably robust.
Allowing for overdispersion in allele frequency differences only marginally
alters results. Since individual genes often only contribute minor effects to
complex diseases such as cardiovascular disease, comparing the likelihood
ratio of a null model with that of a model fitting effects of multiple genes
provides a test with the number of degrees of freedom equal to the number of
genes. The utility and relevance of this approach are discussed, and
contrasted with models testing for each gene effect in turn.
Kenneth Benoit,
Political Science, TCD
-
Abstract
The "word-scoring" approach to content analysis developed by Laver, Benoit, and
Garry (American Political Science Review, June 2003) extracts has been used to
summarize content from political texts based on a statistical analysis of word
frequencies. Unlike nearly all other methods of computerized content
analysis, "wordscores" does not rely on predefined coding schemes or
dictionaries, but instead compares texts based on relative word frequencies,
mapping patterns from texts whose content is known or assumed onto texts
whose content the researcher wishes to estimate. Furthermore, because
Wordscores makes to attempt to assess the meaning or linguistic structure of
words, it works in any language. To implement this method, we have written
the Wordscores suite of software implemented as .ado extensions in Stata 7.0.
This software draws
heavily from Stata's built-in word-parsing capabilities and data merging
capabilities based on matching words. Not only is Stata capable of quickly
generating and analyzing huge matrices of word frequencies, but also Stata's
basic orientation as a statistical program makes it perfectly suited to
statistical analysis of the word frequency information. Stata's capability
for providing user-written help files, and for installing and updating .ado
packages over the Internet, also make it an ideal platform for distributing
our software for noncommercial, scientific use. To our knowledge, Wordscores
is the first Stata application to perform content analysis of texts.
Brendan Halpin,
Department of Sociology, University of Limerick
-
Abstract
More complex datasets such as panel surveys require a
good deal of repetitive processing. Stata's programmability makes
the repetition more manageable, reducing the risk of error and
increasing the analyst's efficiency. Other Stata features such as
the reshape command and iterative constructs such as for make
handling complex data substantially easier than in other
well-established stats packages.
Patrick Kelly,
Beaumont Hospital
-
Abstract
Stata has many features of particular interest to Biostatisticians and
epidemiologists, amongst which is the efficient and effective analysis of
survival data. Kaplan — Meier, Cox regression, parametric models and the
presentation of life tables are all extensively covered in the software.
Improved graphics and the analysis of time dependent covariates are some of
the recent advances made to the latest version of Stata (Stata 8). This study
looks at several years work involving survival-based analysis on solid organ
transplantation outcomes in the Republic of Ireland.
Data must initially be set for survival analysis. Following this summary
commands give an overview of the data for analysis as well as providing
summary survival statistics. The commands for graphing and listing the data
give the Kaplan — Meier survivor functions. Modelling commands generally
involves Cox regression and the covariates are tested for the proportional
hazards assumption using Schoenfeld residuals. Where appropriate, parametric
methods can also be deployed for various distributions from an overall single
command.
Approximately 130 renal transplants, 15 heart transplants, and 7 pancreas
transplants are performed annually in the Republic of Ireland. In total, 16
years of data were available for kidney transplants, 15 years for heart
transplants, and 10 years for pancreas transplants. Survival analysis for
organ transplantation is generally measured for two outcomes, graft and
patient survival. Graft survival is measured either with or without censoring
for death with a functioning graft. Patient survival is measured from time of
first transplant till end of study, death or lost to follow-up. Because of the
serious nature of the procedures involved in transplantation and the need for
constant follow up of patients health status, it is not to common that
patients are lost to follow up. Usually this occurs when a patient moves
outside the state.
Alan Kelly,
Trinity College Dublin
-
Abstract
The use of log-linear models in capture–recapture studies — both
animal and human — is a long established methodology dating back to the
beginning of the 1970s. In spite of this, there are still outstanding issues
regarding the choice of a best fitting model, with various alternative
goodness-of-fit measures proposed based on either theoretical or pragmatic
grounds. In this presentation a number of these measures will be considered
and their performance contrasted — particularly with due consideration to
the implications for the estimate (and its standard error) for the population
size N. These will be illustrated using a recent study on opiate abuse in
Ireland.
Nicholas J. Cox,
University of Durham
-
Abstract
Despite a history now over 30 years long, the adoption of generalised linear
models (GLMs) remains patchy: they are well-known in several fields, but used
little, if at all, in many others. One major advantage of GLMs is that they
return predictions on the scale of the response. The use of link functions
avoids the need for prior transformation of the response, for
back-transformation of predictions, and above all for bias corrections to
back-transformations, whether systematic or ad hoc. Case studies from
environmental applications (suspended sediment concentrations of rivers,
heights of forest trees) are introduced in which predictions on the response
scale are of paramount scientific and practical interest. Heavy use is made of
a suite of Stata programs written by the author producing graphic and numeric
diagnostics after regression-type models, which extend and complement commands
in official Stata. Most of these programs have uses beyond GLMs and they will
also be discussed directly.
Joseph H. Newton,
Texas A&M
Nicholas J. Cox,
University of Durham
-
Abstract
We will report briefly on the introduction of the Stata Journal.
Roberto G. Gutierrez and Chinh Nguyen,
StataCorp
-
Abstract
Dynamically linked libraries, DLL's as they are commonly referred, can
serve as useful and integral parts of Stata user-written commands. Since they
consist of compiled code, DLL's can speed up the execution of
computationally-intensive portions of commands which are otherwise written
using Stata's ado language. In this talk, we outline a simple and
easily-callable interface between Stata ado code and DLL's written in the C
programming language. An example of this process, as applied to a command
which performs local polynomial smoothing, will also be presented.
Scientific organizers
Ronan Conroy
Alan Kelly
Logistics organizers
Timberlake Consultants,
the official distributor of Stata in the UK, Ireland, Spain and Portugal.