2015 Spanish Stata Users Group meeting
22 October 2015
Instituto de Empresa
Calle de María de Molina, 13
28006 Madrid
Spain
Proceedings
Revisiting generalized method of moments
Enrique Pinzon
StataCorp
The generalized method of moments (GMM) estimator, an economist's favorite, was
introduced in Stata 11. GMM is useful in many other disciplines, however, and
we have used it extensively in the treatment-effects commands released in
Stata 13 and Stata 14. I will briefly discuss some relevant properties of GMM
and then show how it is used in treatment-effects estimation. I will conclude
with a simple application of GMM that is new in the literature.
Additional information
spain15_pinzon.pdf
A low CD4/CD8 ratio during effective ART predicts immunosenescence and morbidity/mortality
Sergio Serrano-Villar
University Hospital Ramón Cajal
Santiago Moreno
University Hospital Ramón Cajal
Talia Sainz
University Hospital La Paz
April L. Ferre
University of California, Davis
Sulggi A. Lee
University of California, San Francisco
Peter W. Hunt
University of California, San Francisco
Elizabeth Sinclair
University of California, San Francisco
Vivek Jain
University of California, San Francisco
Frederick M. Hecht
University of California, San Francisco
Steven G. Deeks
University of California, San Francisco
A low CD4/CD8 ratio in elderly HIV-uninfected adults is associated with
increased mortality. A subset of HIV-infected adults receiving effective
antiretroviral therapy (ART) fails to normalize this ratio, even after
they achieve normal CD4+ T-cell counts. The immunologic and clinical
characteristics of this clinical remain undefined. Using data from four
distinct clinical cohorts, we show that a low CD4/CD8 ratio in HIV-infected
adults during otherwise effective ART (CD4+ T-cell counts >500 cells/mm3)
is associated with a number of immunological abnormalities. Longitudinal
changes in CD4+ and CD8+ T-cell counts and in the CD4/CD8 ratio were assessed
using linear mixed models with random intercepts. Age, gender, and pre-ART CD4+
T-cell count were included in multivariate analyses as fixed effects. Interaction
terms were created to assess whether these changes over time differed significantly
between the early and later ART initiators. Changes in slopes before and after ART
time points were assessed using linear splines. Individuals who initiated ART within
6 months of infection had greater CD4/CD8 ratio increase compared with later
initiators (>2 years). Conditional logistic regression analysis showed that a low
CD4/CD8 ratio predicted higher risk on morbidity and mortality. Hence, this
clinically accessible measurement may prove useful in monitoring response to ART
and could identify a unique subset of individuals in need of novel therapeutic interventions.
Additional information
spain15_serrano.pdf
Assessing convergent and discriminant validity in the ADHD-R IV rating scale: User-written commands for average variance extracted (AVE), composite reliability (CR), and heterotrait-monotrait ratio of correlations (HTMT)
David Alarcón Rubio
Universidad Pablo de Olavide
José Antonio Sánchez Medina
Universidad Pablo de Olavide
Convergent and discriminant validity examines the extent to which a latent
variable is different from others in a variance-based SEM. The criterion of
Fornell-Larcker (1981) has been commonly used to assess the degree of shared
variance between the latent variables of the model. According to this criterion,
convergent validity can be assessed by composite reliability (CR) and average
variance extracted (AVE). CR is a less biased estimate of reliability than Chonbach's
alpha; the acceptable value of CR is 0.7 and above. AVE measures the level of variance
captured by a construct versus the level due to measurement error; values above 0.7
are considered very good, whereas a level of 0.5 is acceptable. Discriminant validity
is assessed by comparing AVE and the squared correlation between two constructs. The
level of square root of AVE should be greater than the correlations involving the
constructs. Recently, the heterotrait-monotrait ratio of the correlations (HTMT)
approach has been proposed to assess discriminant validity. HTMT is the average of
the heterotrait-heteromethod correlations relative to the average of the
monotrait-heteromethod correlations. The present work presents a series of user-written
commands to obtain these indicators of convergent and discriminant validity for
confirmatory factor-analysis models and to calculate their confidence
intervals using the bootstrap method. To demonstrate the use of these commands, we use
data from a sample of high school students who have been administered the ADHD-R IV rating scale.
Additional information
spain15_alarcon.pdf
Differences in perinatal health among immigrant and native-origin children: Evidence from differentials in weight at birth in Spain
Hector Cebolla-Boado
Universidad Nacional de Educación Distancia
Leire Salazar
Universidad Nacional de Educación Distancia
This presentation explores differences in perinatal inequality between migrants and natives
in Spain and, more specifically, differences in the weight at birth.
In line with the logic of the "healthy immigrant paradox", the children of immigrant
mothers are known for having a lower risk of low weight at birth (LBW; <2,500).
Using the universe of births in Spain in 2013 (excluding preterm and multiple births),
we go beyond the standard approach of using a dichotomous variable for estimating the
risk of LBW.
Using Stata, we estimate quantile regression to explore migrant-native differentials
in weight at birth across the range of observed values and also concentrate on the
impact of migrant status among babies weighing above 4,000 grams, a threshold that,
similarly to LBW, is associated with certain pathological characteristics and a
problematic future development.
Our research not only confirms that the well-known epidemiological regularity of
healthier babies among migrants in advanced democracies also applies to Spain, namely,
an advantage of immigrant-origin babies in terms of avoiding LBW, but also confirms
that in the other extreme, when the baby's weight is above 4,000 grams,
migrant-origin babies weigh over 110 grams more than native-origin ones. In sum,
we contribute to the literature by showing that the higher average weight of newly
born babies from immigrant mothers is not always a source of perinatal advantage.
aries: An implementation of CART in Stata
Ricardo Mora
Universidad Carlos III de Madrid
Tree-structured models use two-dimensional binary trees as a predictive model.
Tree models where the target variable can take a finite set of values are called
classification trees. Decision trees where the target variable can take continuous
values (typically real numbers) are called regression trees. Estimation of the tree
is trivial in both classification and in regression trees if the structure of the tree
is known. Otherwise, several algorithms have been proposed, and several software packages
implement these algorithms, notably the classification and regression trees (CART)
algorithm by Breiman et al (1984) (that is, Salford Systems CART, Matlab, and R). In Stata,
the module cart, developed by Wim van Putten, performs a CART analysis but only for
failure time data. In this presentation, I discuss a new module, aries, that performs the basic
CART algorithm for both binary and continuous dependent variables.
Additional information
spain15_mora.pdf
Stata web services: Toward Stata-based healthcare informatics applications integrated in a service-oriented architecture (SOA)
Alexander Zlotnik
Technical University of Madrid
University Hospital Ramón y Cajal
Modesto Escobar
Universidad de Salamanca
Ascensión Gallardo-Antolín
Universidad Carlos III de Madrid
Juan Manuel Montero Martínez
Technical University of Madrid
Stata has many functions that can be used in decision support systems, forecasting
systems, and, generally, applications that use analytical or modeling
functionalities. A web interface with an HTML/JS graphical user interface or an
XML-based web service are convenient approaches for exposing Stata-based programs
on public and private computer networks. However, using Stata through a web interface
or integrating it into a corporate software environment such as a service-oriented
architecture can be challenging. Usually, Stata-based programs need to be translated
(reimplemented) in a different programming language to be used through the
aforementioned interfaces. These reimplementations can be problematic, time consuming,
and error prone.
We describe an approach for using Stata-based applications directly through a web
interface, the requirements for such applications, and the limitations of this approach.
We then discuss modern software engineering solutions for software integration scenarios
in healthcare informatics and potential use for Stata-based decision support systems
in this field.
Additional information
spain15_zlotnik.pdf
Introduction to Markov-switching regression models using the mswitch command
Gustavo Sánchez
StataCorp
A considerable number of time series can be characterized by data-generating
processes (DGP) that may be affected by particular events that lead to changes
in the parameters. The new conditions for the DGP may remain in place for a
period of time until the change is reversed to the previous state or until a
new event leads to a new state, with the corresponding change in the parameters.
In Stata 14, we introduce the
mswitch command to model those kinds of time series
by characterizing the transitions between unobserved states with a Markov chain.
I will briefly introduce the basic concepts of Markov-switching models, and I
will use a couple of examples to illustrate the implementation provided by
mswitch.
Additional information
spain15_sanchez.pdf
Modeling multilevel data: The estimated dependent variable approach
Antonio M. Jaime-Castillo
Universidad de Málaga
Multilevel data have become very popular in the social sciences. Several
international research projects (such as the European Social Survey, the
International Social Survey Programme, and the World Value Survey) have produced
a large amount of comparative data in recent decades. The dominant approach to
analyze multilevel data structures uses multilevel models (a mixture of fixed
and random effects), and major statistical packages have incorporated routines
for estimating these kinds of models. This analytical strategy has several
advantages over most naïve pooling strategies. However, it also has some drawbacks
on both theoretical and practical grounds. The statistical theory behind multilevel
models is still under development, and the computational burden to estimate nonlinear
models, as well as convergence issues, can be challenging in some cases. An
alternative is the estimated dependent variable (EDV) approach, in which the researcher
estimates a separate model for individual variables in each level 2 unit in the first
step. In the second step, the estimated coefficients in the first step become the dependent
variables to be explained by a set of aggregate predictors. In this presentation, I focus
on the potential applications of this approach using Stata.
Additional information
spain15_jaime.pdf
A simple procedure to correct for measurement errors in survey research
Anna DeCastellarnau
Universitat Pompeu Fabra
Although there is much literature on the existence of measurement errors, few
researchers are correcting them in their analyses. In this presentation, I will
show that correction for measurement errors in survey research is not only necessary
but also possible and actually rather simple. Using the quality estimates obtained
from the free online software Survey Quality Predictor (SQP), one can easily correct
and use correlation and covariance matrices as input for your analysis. This procedure
was described for Stata, LISREL, and R in the ESS EduNet module "A simple procedure to
correct for measurement errors in survey research". This presentation will focus on the
correction of measurement errors in regression analysis and causal models using Stata.
Additional information
spain15_decastellarnau.pdf
Content analysis with Stata
Modesto Escobar
Universidad de Salamanca
José L. Alonso Berrocal
Universidad de Salamanca
Content analysis is a technique used in the social sciences for the systematic study
of the contents of the communication. In this presentation, we discuss a couple of useful
programs for statistical analysis of texts. The first (precoin) splits the
text into words or groups of words to form an incidence matrix. The second (coin)
works with this matrix and produces frequencies, co-occurrences, multivariate statistical
measures of centrality and distance, and various types of graphs. We present, as examples of
its use, an analysis of a sample of tweets and another analysis of open-ended
answers from a questionnaire.
Additional information
spain15_escobar.pdf
Wishes and grumbles
StataCorp
StataCorp staff will be happy to receive wishes for developments in Stata and almost
as happy to receive grumbles about the software.
Scientific organizers
Modesto Escobar, Universidad de Salamanca
Alexander Zlotnik, Polytechnic University of Madrid and Hospital Universitario Ramón Cajal
Logistics organizers
Timberlake Consulting S.L.,
the official distributor of Stata in Spain.