2014 Spanish Stata Users Group meeting
23 October 2014
Facultad de Medicina
Universitat de Barcelona
Barcelona, Spain
Proceedings
Development of the nomolog program and its evolution: Toward the implementation of a nomogram generator for the Cox regression
Alexander Zlotnik
Hospital Universitario Ramón y Cajal, Unidad de Bioestadística, IRYCIS, Universidad Politécnica de Madrid, Dpto. Ingeniería Electrónica
Víctor Abraira
Hospital Universitario Ramón y Cajal, Unidad de Bioestadística, IRYCIS, CIBERESP
We have developed the
nomolog program for the generation of
logistic regression nomograms in Stata. It has been recently
accepted for publication in the
Stata Journal and will soon be
published on the SSC repository. Some of the challenges we
encountered during its development were i) inclusion of main
effects and interaction factors, ii) continuous # continuous
interactions, and iii) development of an automated testing environment.
We present the solutions to these, the most relevant implementation
details, and the development practices, which may benefit persons
interested in building their own programs based on Stata.
During the development of this program, we were independently
contacted by several researchers interested in the generation
of Cox regression nomograms with Stata. We discuss the differences
and similarities between logistic and Cox regression nomograms as
well as the limitations and expected capabilities of a modification of
nomolog that will introduce this feature.
Additional information
es14_zlotnik_abraira.pdf
Further explanations, examples and download links for nomogram
generators for logistic and Cox regressions are available at:
www.zlotnik.net/stata/nomograms
Margins reloaded
Enrique Pinzón
StataCorp
The
margins command in Stata allows us to get a wide array of
results using coefficient estimates. I will illustrate the use
of margins in some commonly used models. I will then illustrate
a new result. I will show how we can use
margins to obtain,
after fixed-effects panel-data estimation, average marginal effects
and average treatment effects that incorporate the effect of the unobserved
time-invariant component.
Additional information
es14_pinzon.pdf
Studying coincidences with network analysis and other statistical tools
Modesto Escobar
Universidad de Salamanca, Dpto. de Sociología y Comunicación
The aim of this talk is to introduce a new framework to study
data structures that is based on a combination of statistical
and social network analysis and is called coincidence
analysis. The purpose of this procedure is to ascertain the most
frequent events in a given set of scenarios and to study the
relationships between them. In accordance with this procedure,
the concurrence of persons, objects, attributes, characteristics,
or events within the same temporally or spatially limited set can
be classified in the following manner:
a) simple, if both occur at least once in the same set;
b) likely, where the level of concurrence must be more than a
single coincidence and more probable than a concurrence produced
by mere chance; and
c) statistically probable, that is, in cases where samples of events
are the subject of analysis, a confidence interval should be
established to determine the statistical meaning of the
combination of events.
This mode of analysis can be applied to the exploratory analysis
of questionnaires, the study of textual networks, the review of
the content of databases, and the comparison of different
statistical analysis of interdependence because the following
techniques can be used with the same data: multidimensional
scaling, principal component analysis, correspondence analysis,
biplot representations, agglomeration techniques, and network
analysis algorithms.
The statistical bases of this analysis are described, as is
the program written in Stata (coin) that allows the analysis
to be executed. As an example of its use, the photograph albums
of the following people who were famous in the early twentieth
century are described: Miguel de Unamuno (1864–1936), Rafael
Masó (1880–1935), Joaquín Turina (1882–1949), and
Antonia Mercé (1890–1936), stage name la Argentina.
Additional information
es14_escobar.pdf
Demand for drugs for childhood malaria in rural Mozmbique
Elisa Sicuri, Sergio Alonso
CRESIB
Malaria is one of the leading causes of death in Sub-Saharan
Africa. Artemisinin-combination therapies (ACTs) are used as
first-line drugs for treatment, but their market is far from
competitive. Important supply issues include limited availability
and low quality, while on the demand side, market failures are
more related to the lack of information and accessibility to
the treatment.
To estimate the actual willingness-to-pay (WTP) for
ACTs among children with malaria in rural Mozambique, researchers
conducted a survey among patients at a district hospital. Data
collected through the survey were merged with demographic
surveillance data and the hospital passive case detection
systems in place in the area. A negative binomial (NB) regression
was used to identify the determinants of the demand for ACTs.
Results showed that WTP is negatively associated with the
number of malaria episodes the child has previously suffered
during the same malaria season and with the socio-economic
position. Age and occupation of the family head were also
positively correlated with the WTP. This study also discussed
the appropriateness of using contingent valuation methods for
estimating WTP. Respondents stated a higher willing-to-pay than
expected, but they revealed a much realistic demand price when
asked for ability-to-pay. These results provide evidence that ACT
subsidies to the private sector are needed to improve access to
malaria treatment in rural Mozambique.
Additional information
es14_alonso.pdf
Analysis of variations in medical practice using Stata
Cristian Tebé
Agencia de Calidad y Evaluación Sanitaria de Cataluña (AQuAS)
Variations in medical practice are defined as systematic variations
(not due to chance) of adjusted rates of clinical procedures for a
given level of aggregation of the population. The aim of this talk
is to explore and describe variations in different clinical conditions
and surgical procedures from a population perspective to offer a richer
perspective for the assessment of health services in a complex public
health care environment. The basic strategy of analysis is to make
comparisons among rates of activity (numerator: hospital admissions)
of inhabitants of a territory (denominator: basic health area).
Results are presented in tables of standardized rates (using dstdize)
and ratios of activity using small-area analysis (calling R from
Stata). Most results are presented in maps (using spmap) for better
visualization. Variation analysis can be a good monitoring tool for
any health system. Published atlases have received attention from both
clinical and healthcare audiences.
Automated harmonization of variable names and values from several datasets prior to conducting batch statistical analyses
Xavier Bosch-Capblanc
Centro Suizo para la Salud Internacional, Salud Pública
Data requirements by governments, donors, and the international
community to measure health and development achievements have
increased in the last decade. Datasets produced in surveys
conducted in several countries and years are often combined to
analyze time trends and geographical patterns of demographic
and health-related indicators. However, because not all datasets
have the same structure, variable definitions, and codes, they
have to be harmonized prior to submitting them to statistical
analyses. Manually searching, renaming, and recoding variables are
extremely tedious and prone to errors when the
number of datasets and variables are large. This article presents
an automated approach to harmonizing variable names across several
datasets, which optimizes the search of variables, minimizes manual
inputs, and reduces the risk of error.
Results:
Three consecutive algorithms are applied iteratively to search
for each variable of interest for the analyses in all datasets.
The first search (A) captures particular cases that could not be
solved in an automated way in the search iterations; the second
search (B) is run if search A produced no hits and identifies
variables of which the labels contain certain key terms defined
by the user. If this search produces no hits, a third one (C)
is run to retrieve variables that have been identified in other
surveys. For each variable of interest, the
outputs of these engines can be the following: 1, a single best matching
variable is found; 2, more than one matching variable is found;
or 3, no matching variables are found. Output 2 is solved by
user judgement. Examples using 4 variables are presented and show
that the searches have a 100% sensitivity and specificity after a
second iteration.
Additional information
es14_bosch.pdf
Using Stata features to interpret and visualize regression results with examples for binary models
Isabel Cañette
StataCorp
A lot has been said about presenting and interpreting results
from binary models. Policy makers are usually interested in
population effects, while health providers are mostly interested
in individual predicted effects.
This presentation has two aims. First, I will discuss
different measures of interest for these kinds of models, such
as probabilities, odds ratios, risk ratios,and marginal effects,
and how they relate to each other. Second, I will
show different ways to use Stata resources to interpret and
present results from regression models in general. These
approaches can be useful also in the teaching environment.
Additional information
es14_canette.pdf
Paquete de comandos de usuarios para Estadística y Epidemiología
(Package of commands for statistics and epidemiology users)
Josep M. Domenech-Massons, Roberto Sesma-Morales
Universidad Autónoma de Barcelona, Laboratorio de Estadística
A lo largo de varias décadas los estudios de postgrado en “Diseño y Estadística en Ciencias de la Salud” han impartido la docencia con SPSS Statistics, lo que comportó programar una serie de Macros y Scripts que implementaban los análisis necesarios para docencia e investigación no disponibles en dicho paquete.
Recientemente hemos finalizado la reconversión de todos los cursos a Stata y transformado los macros y scripts SPSS que realizan procedimientos no disponibles en Stata en comandos de usuario con sus correspondientes cuadros de diálogo y versiones inmediatas.
Additional information
es14_domenech.pdf
A formal methodology for the comparison of results from different software packages: A case study of estimation of Hosmer–Lemeshow “deciles of risk” for a logistic regression with Stata and with a custom Java program
Alexander Zlotnik, Juan Manuel Montero
Universidad Politécnica de Madrid, Dpto. Ingeniería Electrónica
Ascensión Gallardo-Antolín
Universidad Carlos III de Madrid. Dpto. Teoría de la Señal y Comunicaciones
Statistical software packages are frequently developed in
general-purpose programming languages (such as Java, C, and C++)
that do not include statistical operations in their core
libraries. Software developers are therefore forced to create
their own statistical subroutines, use third-party libraries, or
follow a hybrid approach. This produces a fairly rich variety
of implementations even for the simplest operations, such as the
estimation of percentiles. In most cases, there is no gold
standard and different approaches, which may yield different
results with identical inputs, are acceptable. These differences
are often not obvious and usually not documented, and references
to alternative approaches are most often omitted. Although this
is widely known by people with some experience in statistical
software development, most users of statistical software ignore
these subtle differences and may spend considerable
time comparing results from seemingly identical operations in
different software packages. This may become especially daunting
when this comparison is made between custom-developed software
and an industry-standard statistical software package, such as Stata.
In this presentation, we explain a formal methodology for the
comparison of final and partial results of statistical operations
between Stata and other software packages. As a case study, we
discuss the differences between the calculation of logistic
regression coefficients, Hosmer–Lemeshow “deciles of risk”, and
null hypothesis testing for the comparison between observed an
expected deciles performed with Stata and a custom-developed Java program.
Additional information
es14_zlotnik_montero.pdf
Integration between Stata and LaTex to create hospital reports for the Catalan arthroplasty register (RACat): Summary results for the period 2005–2013
Marcela Marinelli, Cristian Tebé
Agencia de Calidad y Evaluación Sanitaria de Cataluña (AQuAS)
The Catalan arthroplasty register (RACat) produces annual
clinical reports per center (52 hospitals). These reports
are typically generated manually by some analysts.
Stata and LaTeX integration permits the automation of such reports.
LaTeX can directly execute a Stata do-file that uses different
commands. The aim of the present study was to produce automatic
reports in an integrated STATA–LaTeX system using the
foreach
command to generate the structure of the 52 hospital reports;
the
listtex and
tabout commands to produce tables of
the hospital characteristics
of the operated patients, types of surgical procedures, and
prostheses survival at 1, 3, and 5 years after primary surgery;
and the
graph2tex command to generate a LaTeX graph code to
be included in a LaTeX file.
Hospital risks of revision of the implant following knee and
hip arthroplasty were measured considering Fine and Gray’s
model and using stcrreg (death as competing event) and were
compared using funnel plot graphs. Stata–LaTeX integration
permits a dynamic do-file and saves a lot of time
when changes in the analysis are necessary.
Additional information
es14_marinelli.pptx
Cutpoint determination in continous predictive variables in survival analysis
Santiago Pérez-Hoyos
Instituto de Investigación Vall d’Hebrón, Unidad de Bioestadística y Bioinformática
In survival analysis involving data from clinical or epidemiological
studies, increasing interest is given to transforming a continous
variable into a categorical one, usually binary. The main objective
of this transformation is to build a predictive score of a follow-up
event. We present a combination of stata and adhoc
functions based on profile likelihood comparisons. Results are
presented in html format, including a top-ten cutpoint, an optimal
cutpoint, a Kaplan–Meier estimation in graphical and list output, a
likelihood and Hazar ratio profile, and a Cox regression
model. Results are compared with those obtained by R library maxstat
in real data examples. Changing some initial parameters, users can extend
the process to other regression models.
Additional information
es14_perez.pdf
Scientific organizers
Llorenç Quinto, Barcelona Centre for International Health Research (CRESIB)
Sergi Sanz, Barcelona Centre for International Health Research (CRESIB)
Sergio Alonso, Barcelona Centre for International Health Research (CRESIB)
Elisa Sicuri, Barcelona Centre for International Health Research (CRESIB)
Logistics organizers
Timberlake Consulting S.L.,
the official distributor of Stata in Spain.