The 2016 Swiss Stata Users Group meeting was November 17, but you can still interact with the user community even after the meeting and learn more about the presentations shared.
Proceedings
8:45–9:10 |
Abstract:
matchit is a user-written command allowing one to
combine two datasets based on similar but not
necessarily equal text strings and to compare the
text similarity between two string variables from the
same dataset. These features make matchit a handy and
powerful tool in the preparation of data for statistical
and econometric analysis as well as in the creation of
metrics based on text similarity.
A nonexhaustive list of typical uses for matchit includes duplicate record consolidation within a nonstandardized dataset (for example, cleaning a list of patient names including multiple spellings), combination of two datasets with non-standardized keys (for example, merging hospital and insurance data based on treatment names), or creating quantitative measures based on string similarity (for example, comparing the scientific proximity between two medical schools based on their scientific publications and patents). matchit can perform a wide range of string similarity algorithms—such as ngram, token, soundex, nysiis, or hybrid ones—that, combined with different weighting and scoring functions, allow users to perfect the resulting dataset. Moreover, it also allows for coding custom algorithms and functions benefiting from indexation and other built-in functionalities.
Additional information
raffo-switzerland16.pdf Julio Raffo
WIPO Economics and Statistics Division
|
9:10–9:35 |
Abstract:
Researchers working with observational data are often
faced with the problem of finding the nearest measurement
around a particular date. For instance, a medical
researcher may be interested in finding the least
proximal CD4+ Tcell count measurement prior to
initiation of an antiretroviral therapy against HIV,
which is known to be predictive for treatment success.
While this is not an overly difficult programming task,
it takes several lines of potentially error‐prone code
for implementation.
The fmatch command offers a versatile tool that can achieve such tasks with a single line of code. fmatch is a "wrapper" program for the well‐known mmerge command by J. Weesie. It offers multiple options for controlling merging options via specification of date ranges to define eligible measurements as well as for finding smallest or largest values among all eligible measurements. The functionality of fmatch will be illustrated with examples from HIV research.
Additional information
vonwyl-switzerland16.pdf Viktor von Wyl
Epidemiology, Biostatistics, and Prevention Institute, University of Zurich
|
9:50–10:35 |
Abstract:
Missing values are common in many fields. If analyses do
not properly account for missing values, the resulting
estimates may be biased. Stata gives the user access to
multiple principled methods of handling missing values
in a dataset. This talk will focus on two methods,
multiple imputation (MI) and full information maximum
likelihood (FIML). After an introduction to important
concepts in the analysis of missing data, this
presentation will provide an overview of how to perform
analyses using MI and FIML in Stata. A comparison of the
techniques and their advantages and disadvantages will
be included.
Additional information
medeiros-switzerland16.pdf Rose Medeiros
StataCorp LP
|
10:35–11:20 |
Abstract:
MarkDoc is a general-purpose literate programming
package for Stata that can serve a variety of purposes
such as creating dynamic documents, dynamic presentation
slides, Stata package help files, and Stata package
documentation in various formats. The presentation
introduces the package and its overall workflow as well
as the recent improvements in the package. Moreover, the
applications of the package for data analysis, teaching
statistics, and documenting new Stata packages are
discussed.
Additional information
haghish-switzerland16.pdf E.F. Haghish
Université de Fribourg
|
11:20–11:45 |
Abstract:
At the 2009 meeting in Bonn, I presented a new Stata
command called texdoc. The command allowed weaving
Stata code into a LaTeX document, but its functionality
and its usefulness for larger projects was limited. In
the meantime, I heavily revised the texdoc command to
simplify the workflow and improve support for complex
documents. The command is now well suited, for example,
to generate automatic documentation of data analyses or
even to write an entire book. In this talk, I will
present the new features of texdoc and provide
examples of their application. Furthermore, I will
present a newly released companion command called
webdoc that can be used to produce HTML or Markdown
documents.
Additional information
jann-switzerland16.pdf jann_example1-switzerland16.pdf jann_example2-switzerland16.pdf jann_example3-switzerland16.pdf Ben Jann
Institute of Sociology, University of Bern
|
11:45–12:10 |
Abstract:
Since the early nineties, logistic regression for
binary, ordinal, and nominal dependent variables has
become widely spread in the social sciences.
Nevertheless, there is no consensus on how to assess the fit
of these models corresponding to practical significance.
A lot of pseudocoefficients of
determination have been proposed but seldom used in
applied research. Most of these pseudo-R2 follow the
principle of the proportional reduction of error
comparing the likelihood, the log-likelihood, or the precision
of prediction with those of a baseline model including
the constant only.
Alternatively, McKelvey and Zavoina (1975) have proposed a different one estimating the proportion of explained variance of the underlying latent dependent variable. Summarizing the Monte Carlo studies of Hagle and Mitchell (1992), Veall and Zimmermann (1992, 1994) and Windmeijer (1995) show that the McKelvey and Zavoina pseudo-R2 is the best one to evaluate the fit of binary and ordinal logit or probit models. Applying the assumption of identical independent distributed errors, I also propose a generalization of the McKelvey and Zavoina pseudo-R2 to the multinomial logistic regression, assessing the fit of each binary comparison simultaneously. The usefulness of this concept is demonstrated by applied data analysis of an election study with Stata using the self-developed mzr2 command.
Additional information
langer-switzerland16.pdf Wolfgang Langer
Institute of Sociology, University of Halle-Wittenberg
|
1:10–1:35 |
Abstract:
Counterfactual distributions are important ingredients
for policy and decomposition analysis. For example, we
might be interested in what the outcome distribution for
the treated units would be had they not received the
treatment or in what the distribution of wages for
female workers would be in the absence of gender
discrimination in the labor market (that is, if female
workers are paid the same as male workers with the same
characteristics) or in what the distribution of housing
prices would be if we clean up a local hazardous waste
site. More generally, we can think of a policy
intervention either as a change in the distribution of a
set of explanatory variables X that determine the
outcome variable of interest Y or as a change in the
conditional distribution of Y given X. The Stata
commands counterfactual and cdeco implement estimation
and inference procedures for these two types of
applications. The estimation of the conditional
distribution can be based on the main regression methods,
including classical, quantile, duration, and
distribution regressions. The commands provide not only
pointwise but also functional confidence bands, which
cover the entire functions with prespecified
probability and can be used to test functional
hypotheses such as no effect, positive effect, or
stochastic dominance.
Additional information
melly-switzerland16.pdf Blaise Melly
Department of Economics, University of Bern
|
1:35–2:00 |
Abstract:
Incorporating covariates in (income or wage)
distribution analysis typically involves estimating
conditional distribution models, that is, models for the
cumulative distribution of the outcome of interest
conditionally on the value of a set of covariates. A
simple strategy is to estimate a series of binary
outcome regression models for
F(z|xi)=Pr(yi≤z|xi)F(z|xi)=Pr(yi≤z|xi) for a grid of
values for z (Peracchi and Foresi, 1995, Journal of the
American Statistical Association; Chernozhukov et al.,
2013, Econometrica). This approach, now often referred to
as "distribution regression", is attractive and easy to
implement. This talk illustrates how the Stata
commands margins and suest can be useful for inference
here and suggests various tips and tricks to speed up
the process and solve potential computational issues. It
also shows how to use conditional distribution model
estimates to analyze various aspects of unconditional
distributions.
Additional information
vankerm-switzerland16.pdf Philippe van Kerm
Luxembourg Institute of Socio-Economic Research
|
2:00–2:25 |
Abstract:
Bland and Altman’s limits of agreement (LoA) have
traditionally been used in clinical research to assess
the agreement between different methods of measurement
for quantitative variables. However, when the variances
of the measurement errors of the two methods are
different, Bland and Altman’s plot may be misleading;
there are settings where the regression line shows an
upward or a downward trend but there is no bias or a
zero slope and there is a bias.
Therefore, the goal of this presentation is to clearly illustrate why and when a bias arises, particularly when heteroskedastic measurement errors are expected, and to propose new plots to help the investigator visually and clinically appraise the performance of the new method. These plots do not have the above-mentioned defect and are still easy to interpret, in the spirit of Bland and Altman’s LoA. To achieve this goal, we rely on the modeling framework recently developed by Nawarathna and Choudhary, which allows the measurement errors to be heteroskedastic and depend on the underlying latent trait. Their estimation procedure, however, is complex and rather daunting to implement. Therefore, we have developed a new estimation procedure that is much simpler to implement and yet performs very well, as illustrated by our simulations. The methodology requires several measurements with the reference standard and possibly only one with the new method for each individual.
Additional information
taffe-switzerland16.pdf Patrick Taffé
Institute of Social and Preventive Medicine, University of Lausanne
|
2:40–3:05 |
Abstract:
While Stata’s computational capabilities have
intensively increased over the last decade, the quality
of its default figure schemes is still a matter of
debate among users. Clearly, some of the arguments
speaking against Stata figures are subject to individual
taste, but others are not, such as
horizontal labeling, unnecessary background tinting,
missing gridlines, and oversized markers. The two schemes
introduced here attempt to solve the major shortcomings
of Stata’s default figure schemes. Furthermore, the
schemes come with 21 new colors, of which 7 colors
are distinguishable for people suffering from color
blindness.
Additional information
bischof-switzerland16.pdf Daniel Bischof
Department of Political Science, University of Zurich
|
3:05–3:30 |
Abstract:
grcomb is a user-written wrapper for Stata's graph
combine. It makes quick-and-dirty multipanel plotting
easy.
Additional information
gamma-switzerland16.pdf Alex Gamma
Psychiatric University Hospital, Zurich
|
Organizers
Scientific committee
Ben Jann
Institute of Sociology, University of Bern
Radoslaw Panczak
Institute of Social and Preventive Medicine, University of Bern
Marcel Zwahlen
Institute of Social and Preventive Medicine, University of Bern
Logistics organizer
The logistics organizer for the 2016 Swiss Stata Users Group meeting is Ritme, scientific solutions, the distributor of Stata in Switzerland, France, and Belgium.
View the proceedings of previous Stata Users Group meetings.