Last updated: 8 September 2014
 2014 Nordic and Baltic Stata Users Group meeting 
 5 September 2014 
  
  Department of Political Science, Aarhus University
  Bartholins Allé 7
  DK-8000 Aarhus C
  Denmark
            
                
Proceedings
             
                
 
 		Stata as a numerical tool for scientific thought experiments: A tutorial with worked examples
 	Henrik Støvring
 	Department of Public Health–Department of Biostatistics, Aarhus University
Thought experiments based on simulation can be used to explain the impact of 
the chosen study design, statistical analysis strategy, or the sensitivity of 
results to fellow researchers. In this talk, I will present two examples 
showing how quantitative thought experiments may be implemented in Stata. The 
first example uses a large-sample approach to study the impact on the estimated
effect size of dichotomizing an exposure variable at different values. The 
second example uses simulations of realistic-size datasets to illustrate the 
necessity of using sampling fractions as inverse probability weights in the 
statistical analysis for protection against bias in a complex sampling design. 
I will also briefly outline the general steps needed for implementing 
quantitative thought experiments in Stata. The main purpose is to highlight 
that Stata provides programming facilities for conveniently implementing such 
thought experiments, and exploiting those may save researchers precious time, 
futile speculation, and disruptive debates and thus improve communication in 
interdisciplinary research groups.
  
   Additional information
   dk14_stovring.pptx
          Studying coincidences with network analysis and other statistical tools
 Modesto Escobar 
  Department of Sociology and Communication, Universidad de Salamanca 
The aim of this paper is to introduce a new framework to study data structures 
that is based on a combination of statistical and social network analysis and 
that is called coincidence analysis. The purpose of this procedure is to 
ascertain the most frequent events in a given set of scenarios and to study the
relationships between them. In accordance with this procedure, the concurrence 
of persons, objects, attributes, characteristics, or events within the same 
temporally or spatially delineated set can be classified in the following 
manner: 
(a) as simple, if both occur at least once in the same set; 
(b) as likely if there is more than a single coincidence and if it is more 
probable than a concurrence produced by mere chance; and 
(c) as statistically probable. 
In cases where samples of events are the subject of analysis, a confidence 
interval should be established to determine the statistical meaning of the 
combination of events. 
This mode of analysis can be applied to the exploratory analysis of 
questionnaires, the study of textual networks, the review of the content of 
databases, and the comparison of different statistical analyses of 
interdependence. The following techniques can be used for analyzing the same 
data: multidimensional scaling, principal component analysis, correspondence 
analysis, biplot representations, agglomeration techniques, and network 
analysis algorithms. 
The statistical bases of this analysis are described, as is the Stata program 
that performs the analyses. As an example of its use, the photograph albums of 
the following people who were famous in the early twentieth century are 
analyzed: Miguel de Unamuno (1864–1936), Rafael Masó 
(1880–1935), Joaquín Turina (1882–1949), and Antonia 
Mercé (1890–1936), stage name la Argentina. 
  
   
Additional information
   dk14_escobar2.pdf
            Social network analysis using Stata
 Thomas Grund and Peter Hedström 
  Institute of Analytical Sociology, Linköping University 
Social network analyses investigate the relationships (arcs/edges) between 
individuals or organizations, such as friendship, advice, or trust. In contrast
to many other statistical approaches, one models the interdependencies between 
entities explicitly. Such a perspective allows the visualization and study of 
structural features of network structures such as centrality of network nodes. 
This talk introduces the 
nwcommands—a software suite of over 40 
Stata commands—for social network analyses in Stata.  The software 
includes programs for importing and exporting, loading and saving, handling, 
manipulating and replacing, generating, and visualizing and animating networks.
It also includes commands for measuring the importance of network nodes, the 
detection of network patterns and features, the similarity of multiple networks,
node attributes, and the advanced statistical analysis of networks 
(
nwqap, 
nwergm). This presentation gives several examples using 
these programs, provides instructions for the installation, use, and support 
of the software 
(
http://www.nwcommands.org), and 
introduces a platform for developers for additional programs to perform social 
network analyses using Stata.
  
   
Additional information
   dk14_grund.pdf
 		Floating point numbers: A visit through the looking glass
 	Bill Gould 
 	StataCorp 
Researchers do not adequately appreciate that floating-point (FP) numbers are a
simulation of real numbers and that, as with all simulations, some features are
preserved and others are not. Writing code, or even do-files, and treating the 
computer's floating numbers as if they were real numbers can lead to 
substantive problems and to numerical inaccuracy. In this, the relationship 
between computers and real numbers is not entirely unlike the relationship 
between tea and Douglas Adams's Nutrimatic drink dispenser. The Nutrimatic 
produces a concoction that is "almost, but not quite, entirely unlike tea".
In this presentation, I will show what the universe would be like if it were 
implemented in FP rather than real numbers. The FP universe turns out to be 
nothing like the real universe and probably could not be made to function. The 
point of the talk is to build your intuition about the floating-point world so 
that you as a researcher can predict when calculations might go awry, know how 
to think about the problem, and determine how to fix it.
  
   
Additional information
   dk14_gould.pdf
          Tweaking -khb- to control for post treatment confounders in mediation analysis
 Kristian Karlson
  Department of Sociology, University of Copenhagen 
Mediation analyses and their ensuing effect decompositions are widespread in 
the social sciences. For example, in stratification research, researchers may 
be interested in gauging the extent to which the black-white gap in earnings 
can be attributed to the unequal distribution of schooling among the races.  
However, methodological research shows that such mediation analyses often fail 
to control for the potential endogeneity of the mediator. In the example, 
academic ability may be a confounder of the education-earnings association. Yet
controlling for such confounders to eliminate the endogeneity bias of the 
mediator is not as straightforward as it may appear. Whenever these control  
variables are a function of the predictor variable of interest (race in the 
example), standard regression methods for the calculation of direct and 
indirect effects no longer apply. Put differently, standard methods cannot 
control for post treatment confounders. 
In this presentation, I show how to tweak the Stata command 
khb 
(implementing the decomposition method developed by Karlson, Holm, and Breen 
[
2012, Sociological Methodology 42:274-301]) to control for these 
confounders in the estimation of direct and indirect effects in regression 
models using 
logit or 
probit. Under the assumption of linearity, 
I exploit the residualization or orthogonalization approach that underlies 
khb to derive the bias of omitted post treatment confounders, and I 
show how to control for them by tweaking the use of 
khb. I also discuss 
how to obtain standard errors of the effects. To illustrate the approach, I 
give an example of the role of education in social mobility.
  
   
Additional information
   dk14_karlson.pptx
            Working sideways in Stata
 Jakob Hjort
  Department of Cardiology, Aarhus University Hospital
Conceptually, Stata is commendably simple; dealing with only one rectangular 
data-grid at a time (variables column-wise and observations row-wise). Within 
this simple concept, statistics are (usually) operations performed on the 
vertical axis, that is; column-wise, e.g. when obtaining the mean value of age 
in a number of subjects/observations. Data management (besides loading-, 
appending-, merging data, etc.) is the discipline of preparing the rectangular 
data-grid for the statistics e.g. by creating derived variables; that is, 
working row-wise (or sideways) in the data-grid. Mainly, derived variables are 
recodings or simple calculations based on existing variables - all nicely 
supported by easily used build-in stand alone Stata commands/functions. 
Sometimes however, when a mix of conditions and calculations are required in 
the creation of derived variables, things tend to get slightly more complicated
and may require customized “loops” to be able to traverse and handle selected 
variables individually row-wise. Various aspects of working sideways in the 
Stata data-grid will be presented and discussed with a strict focus on 
transparent, safe and robust data-handling.
  
   Additional information
   dk14_hjort.ppt
             A short story about Danish register research and Statalist
 Svend Juul 
  Department of Public Health, Aarhus University
A PhD student is studying health problems among children born to mothers with 
type 1 diabetes. In a clinical database, the student identified 1,300 such 
children (index children), and Statistics Denmark delivered information 
concerning 100 control children per index child, matched by gender and date of 
birth. Health outcomes are mortality, hospital admissions (by diagnosis), and 
medications (by ATC groups).
We used a mixed-effects negative binomial regression (Stata's 
menbreg 
command) to analyze hospital admissions. 
menbreg is computationally 
intensive, and we wanted some 200 analyses (5 age groups, 20 diagnostic 
groups, etc.). Some analyses would take several hours. I tried to find out if 
there was a way to automatically stop an analysis that took too long and 
proceed with the next analysis. Some of the SUG participants will know how to 
do that, but I didn't know at the time.
I sent the question to Statalist, and within five minutes, I had two good
answers: Use the 
iterate() option. See 
help maximize.  It works, 
and the analyses are proceeding.
  
   
Additional information
   dk14_juul.pdf
           Reproducible research in Stata 
 Bill Rising 
  StataCorp
Writing a document that contains statistical results in its narrative, 
including inline results, can take too much effort. Typically, users have a 
separate series of do-files whose results must then be pulled into the 
document. This is a very high-maintenance way to work in because updates to 
the data, changes to the do-files, updates to the statistical software, and, 
especially, updates to inline results all require work and careful checking of 
results.
Reproducible research greatly lessens document-maintenance chores by putting 
code and results directly into the document; this means that only one document 
is used; thus it remains consistent and is easily maintained.
In this presentation, I will show you how to put Stata code directly into a 
LaTeX or HTML document and run it through a preprocessor to create the document
containing results. While this is useful for creating self-contained documents, 
it is very useful for creating periodic reports, class notes, solution sets, 
and other documents that get used over a long period of time.
  
   
Additional information
   dk14_rising.pdf
   dk14_rising_examples.zip
Scientific organizers
Peter Hedström, Institute for Futures Studies
Kim Mannemar Sønderskovskov, Aarhus University
Svend Juul, Aarhus University
Logistics organizers
  Metrika Consulting,
  the official distributor of Stata in the Nordic and Baltic regions, and the
  Karolinska Institutet.