The German Stata Users Group Meeting was held on 22 June 2018 at Universität Konstanz, but you can view the program and presentation slides below.
Proceedings
9:30–10:30 |
Abstract:
Being a Stata User since Stata 3, I have witnessed a number of developments
over the years. Some of them, such as Stage or the gph commands, turned out
to be dead ends, while others, such as syntax, have been hidden for
many users, but shaped Stata strongly. Users still use some dead ends
("for"). Some developments made
buzz in the public but never gained much attention in (my own)
practice. Some developments were introduced in passing, but took off immediately
as a workhorse in my daily work (web awareness). I give a subjective
review of Stata's development by listing the dead ends and the milestones.
I speculate about reasons why dead ends became dead ends, and
why milestones became milestones. My intention is to start
a discussion about what German users like and dislike about Stata.
Additional information: germany18_Kohler.pdf
Ulrich Kohler
Universität Potsdam
|
10:30–11:00 |
Abstract:
The overall look of Stata's graphs is determined by so-called scheme files.
Scheme files are system components, that is, part of the local
Stata installation. In this presentation, I will argue that style settings deviating
from default schemes should be part of the script producing the graphs rather
than being kept in separate scheme files, and I will present software that
supports such a practice. In particular, I will present a command, grstyle,
that allows users to quickly change the overall look of graphs without having
to fiddle around with external scheme files. I will also present a command,
colorpalette, that provides a wide variety of color schemes for use in
Stata graphics.
Additional information: germany18_Jann.pdf
Ben Jann
University of Bern
|
11:00–11:30 |
Abstract:
Structural equation modeling is well established in the statistician's
standard toolkit. To establish how well latent constructs are measured
by their respective observed indicators, many applications entail confirmatory
factor analysis (CFA). The appropriateness of a particular CFA model in
turn is assessed by various statistics such as chi-squared or so-called fit
indices. What these indices have in common is their reliance on a comparison
with the estimated model with a baseline or null model that imposes various
restrictions. While the default baseline model (for example, the "independence model")
is appropriate for common single-group and single-time-point situations,
several authors argue that researchers should specify alternative baseline
models in multiple-group or longitudinal applications (for example, Little, 2013;
Widaman & Thompson, 2003). Focusing on longitudinal data, this presentation
accordingly illustrates how to specify appropriate baseline models and compute
corresponding goodness-of-fit statistics in Stata.
References: Little, T. D. 2013. Longitudinal structural equation modeling. New York, NY: Guilford Press. Widaman, K. F., and Thompson, J. S. 2003. On specifying the null model for incremental fit indices in structural equation modeling. Psychological methods 8,1: 16–37. Additional information: germany18_Spieß.pdf germany18_Spieß_example.do
Sven O. Spieß
Dittrich & Partner Consulting
|
11:45–12:15 |
Abstract:
swapgpsxy interchanges GPS coordinates given that both the xvar and yvar
variables representing the longitude and latitude respectively are of
numeric data types. swapgpsxy is useful whenever summary statistics of
the GPS coordinates suggest coordinates are interchanged. swapgpsxy can
be applied unconditionally, when the geographical area is relatively
uniform and small, for example, the State of Qatar. On the other hand,
swapgpsxy can be applied conditionally using either if or in, but both
cannot be included in a single expression. This is useful when the geographical
area is large and the terrain differs per province or zone, for example,
the Republic of South Africa. Given the presence of interchanged GPS
coordinates in our data, we apply swapgpsxy to correct the error. Using
the median absolute deviation (MAD) method, we find that outliers in GPS
coordinates are detected and interchanged correctly. Based on the results,
we suggest swapgpsxy as a useful tool for improving data quality, particularly
when data management is prone to human error.
Brian W. Mandikiana
Qatar University
|
12:15–12:45 |
Abstract:
Text data, such as answers to open-ended questions, are sometimes ignored
because they are hard to analyze. Our community-contributed Stata command,
ngram, turns text into hundreds of variables using the "bag of words"
approach. Broadly speaking, each variable records how often the
corresponding word or word sequence occurs in a given text. This is more
useful than it sounds. The program supports text in 12 European languages.
Additional information: germany18_Schonlau.pdf
Matthias Schonlau
University of Waterloo
|
1:45–2:15 |
Abstract:
At the 2017 meeting, I talked about efficient programming with regards to
optimal lag selection for autoregressive distributed lag (ARDL) models as
implemented in the community-contributed Stata command ardl (Kripfganz and Schneider
2016). I will expand on last year's presentation by focusing on a
second nontrivial computational aspect of ardl: the simulation of critical
values for the Pesaran, Shin, and Smith (2001)
bounds-testing procedure for a long-run relationship. Up until recently, only
a limited set of critical values was available. I will illustrate the
programming behind Kripfganz and Schneider's (2018) comprehensive and more
precise set of critical values and approximate p-values, which have been made
available in Stata as a postestimation feature of ardl. I explain the
calculation, storage, and processing of 160 billion simulated F or t-statistics.
Topics covered will include pointer variables, LAPACK functions in Mata, using
variable transformations in conjunction with Stata's various numeric data types
for efficient storage, random number streams, and strategies for using several
instances of Stata simultaneously.
References: Kripfganz, S, and D. C. Schneider. 2016. ardl: Stata module to estimate autoregressive distributed lag models. paper presented at the Stata Conference, Chicago, Il, July 2016. Kripfganz, S, and D. C. Schneider. 2017. A case study in efficient programming in Stata and Mata: Speeding up the ardl estimation command. Paper presented at the German Stata Users Group Meeting, Berlin, June 2017. Kripfganz, S, and D. C. Schneider. 2018. Response surface regressions for critical value bounds and approximate p-values in equilibrium correction models. Manuscript, University of Exeter and Max Planck Institute for Demographic Research. Available at http://www.kripfganz.de/research/Kripfganz_Schneider_ec.html. Pesaran, M. H., Y. Shin, and R. J. Smith. 2001. Bounds testing approaches to the analysis of level relationships. Journal of Applied Econometrics 16: 289–326. Additional information: germany18_Schneider.pdf
Daniel C. Schneider
Max Planck Institute for Demographic Research
|
2:15–2:45 |
Abstract:
Traditional fit measues based on noncentral chi-square distribution (RMSEA,
TLI, or CFI) tend to overreject acceptable models when the sample size is small
(n <g; 100). My ado-file, swain_gof.ado, corrects the likelihood ratio chi-square
goodness-of-fit test statistic for structural equation models. This chi-square statistic
is asymptotically correct, but it does not behave as expected in small samples
or when the model is complex (Herzog, Boomsma, and Reinecke 2007). Particularly
in situations where the ratio of sample size to the number of parameters estimated
is relatively small, such as 5:1 (Bentler and Chou 1987), the chi-square test will
tend to overreject correctly specified models. To obtain a closer approximation
to the distribution of the chi-square statistic, Swain (1975) developed a correction.
His scaling factor, which converges asymptotically to 1 by increasing sample size,
is multiplied with the chi-square statistic. This correction better approximates the
noncentral chi-square distribution resulting in more appropriate type 1 reject error
rates (see Herzog & Boomsma, 2009; Herzog, et al. 2007). This works reliabale just
to a sample size-parameter ratio of 2:1.
My swan_gof.ado calculates the root mean squared error of approximation (RMSEA), the Tucker-Lewis Index (TLI), and comparative fit index (CFI) using the Swain-corrected chi-square values assuming multinormal distribution of the observed indicators. Violating this assumption, it calculates the fit additionally indices using the Sattora-Bentler correction. Therefore, you have to use the vce(sbentler) option of the sem command. My swain_gof.ado can be executed after the sem and estat gof, stats(all) as a postestimation command by simply typing swain_gof. It returns the estimated fit indices and scalars as r containers.
A survey example of Islamophobia will be presented to demonstrate the usefulness
of my swain_gof.ado.
Bentler, P.M., and C.P. Chou. 1987. Practical issues in structural equation modeling. Sociological Methods &aamp; Research 16: 78–117. Bentler, P.M., and K.H. Yuan. 1999. Structural equation modeling with small samples: Test statistics. Multivariate Behavioral Research 34: 181–197. Curran, P.J., K.A. Bollen, P. Paxton, J. Kirby, and F.N. Chen. 2002. The noncentral chi-square distribution in misspecified structural equation models: Finite sample results from a Monte Carlo simulation. Multivariate Behavioral Research 37: 1–36. Herzog, W., and W. Boomsma. 2009. Small-sample robust estimators of noncentrality-based and incremental model fit. Structural Equation Modeling 16: 1–27. Herzog, W., W. Boomsma, and S. Reinecke. 2007. The model-size effect on traditional and modified tests of covariance structures. Structural Equation Modeling 14: 361–90. Satorra, A., and P.M. Bentler. 1994. Corrections to test statistics and standard errors in covariance structure analysis. In Latent variables analysis: Applications for developmental research, edited by Alexander Von Eye and Clifford Clogg, 399–419. Newbury Park, CA: Sage, 1994. Swain, A.J. 1975. Analysis of parametric structures for variance matrices (Doctoral thesis). University of Adelaide, Adelaide. Additional information: germany18_Langer.pdf
Wolfgang Langer
Martin-Luther-Universität Halle-Wittenberg
|
2:45–3:15 |
Abstract:
In this presentation, I will go through the workflow of creating an interactive
presentation in Stata (a .smcl presentation) with smclpres based on a
small example presentation.
Some talks are primarily on how to do things in Stata, like a lecture on graphs in Stata or a talk at a Stata Users' Group meeting. In those cases, a .smcl presentation can be useful. A .smcl presentation is a series of linked .smcl files that open in the viewer inside Stata (like help files). The strength of a .smcl presentation is that it can contain links that execute examples, open help files, open do-files, etc. A .smcl presentation is all about illustrating how to do something in Stata, so preparing for such a talk typically starts with preparing a set of examples in a do-file. By adding specific comments to that do-file, for example, to indicate when a slide starts and when it ends, what the title of the slide is, etc., the smclpres command can turn that do-file into a .smcl presentation. Moreover, the pres2html command can turn that .smcl presentation into an HTML handout so that participants can easily access the content after the presentation. Additional information: germany18_Buis.zip
Maarten Buis
University of Konstanz
|
3:30–4:00 |
Abstract:
The autoexam ado package allows one to automatically generate multiple-choice tests from
a database of items. The tests are optimized with regard to the distribution
of difficulties and the representative coverage of course topics. The tests
can be written as LaTeX or HTML files. Accompanying ado-files help to analyze
items using IRT models and to manage or update the item database. The system
can also be used to generate mock exams to allow students to prepare for the
exam. When creating such mock exams, the user can choose what percentage, if
any, of the real test questions is allowed to occur in the mock exams.
Finally, autoexam allows one to include mathematical or statistical questions in
the item database that are randomly generated with respect to the specific
numbers in the questions. The autoexam ado-package aims to help teachers with
creating and correcting exams more efficiently and with better quality. It is
particularly helpful for large basic courses that are repeated in regular intervals.
Alexander Schmidt-Catran
Goethe-University Frankfurt
|
4:00–5:00 |
Abstract:
Stata 15 includes three new commands for producing dynamic documents:
dyndoc, putdocx, and putpdf. These commands have
generated much interest in the user community; this has led to a large
amount of community-contributed software. In this talk, I'll give some
tips about how to use the commands efficiently both with official Stata
software and with some of these community-contributed tools.
Additional information: germany18_Rising.pdf Examples (.zip)
Bill Rising
StataCorp
|
5:15–6:00 |
Abstract:
Stata developers present will carefully and cautiously
consider wishes and grumbles from Stata users in the audience.
Questions, and possibly answers, may concern reports of
present bugs and limitations or requests for new features in
future releases of the software.
StataCorp personnel
StataCorp
|
Workshops: Thursday, 21 June
Graphics with Stata
Maarten Buis, Universität Konstanz, 9:00 a.m. to 1:00 p.m.
Description
This workshop is intended for participants who want to make the most out of graphs in Stata. Stata has very powerful graphics language, but with power comes an elaborate syntax with a lot of options. This makes it easy to get lost and overlook useful possibilities. In this workshop we will focus on building your graph step by step, and tips and tricks to create a wide range of informative graphs.
Prerequisites
Basic knowledge of Stata.
Bayesian analysis using Stata
Yulia Marchenko, Executive Director of Statistics, StataCorp, 2:00 p.m. to 6:00 p.m.
Description
This workshop covers the use of Stata to perform Bayesian analysis. Bayesian analysis is a statistical paradigm that answers research questions about unknown parameters using probability statements. For example, what is the probability that a person accused of a crime is guilty? What is the probability that the odds ratio is between 0.3 and 0.5? And many more. Such probabilistic statements are natural to Bayesian analysis because of the underlying assumption that all parameters are random quantities. In Bayesian analysis, a parameter is summarized by an entire distribution of values instead of one fixed value as in classical frequentist analysis. Estimating this distribution, a posterior distribution of a parameter of interest, is at the heart of Bayesian analysis. This workshop will demonstrate the use of Bayesian analysis in various applications and will introduce Stata's suite of commands for conducting Bayesian analysis.
Prerequisites
Basic knowledge of Stata.