Last updated: 1 December 2011
 2011 Nordic and Baltic Stata Users Group meeting 
 11 November 2011 
 
  Karolinska Institutet
  CMB, Berzelius väg 21
  Solna Campus
  Stockholm, Sweden
        
Proceedings
Quantile imputation of missing data
Matteo Bottai
Unit of Biostatistics, Institute of Environmental Medicine,
Karolinska Institutet, Sweden
  Multiple imputation is an increasingly popular approach for the analysis of
  data with missing observations. It is implemented in Stata's
  
mi suite of commands. I present a new Stata command for
  imputation of missing values based on prediction of conditional quantiles of
  missing observations given the observed data. The command does not require
  making distributional assumptions and can be applied to impute dependent,
  bounded, censored, and count data.
  
  
Additional information
  bottai_nordic11.pdf
 
Comparing observed and theoretical distributions
Maarten L. Buis
 Institut fuer Soziologie, Universitaet Tuebingen, Germany
  In this presentation, I aim to introduce graphical tools for comparing the
  distribution of a variable in your dataset with a theoretical probability
  distribution, like the normal distribution or the Poisson distribution. The
  presentation will consist of two parts. In the first part, I will consider
  univariate distributions, with a particular emphasis on hanging and suspended
  rootograms (
hangroot).  Looking at univariate distributions
  is not very common in a lot of (sub-(sub-))disciplines, but there are
  situations where this can be very useful: For example, if we have a count of
  accidents and we want to know whether these are occurring randomly, then we
  can compare this variable with a Poisson distribution. Another example would
  be simulations, where it is often the case that parameters or test statistics
  should follow a certain distribution when the model that is being checked is
  working as expected. 
  
  In the second part of the talk, I will focus on the more common situation
  where models assume a certain distribution for the
  explained/dependent/
y variable, and I will estimate how one or more
  parameters, often the mean, change when one or more
  explanatory/independent/
x variables change. The challenge now is
  that the dependent variable no longer follows the theoretical distribution,
  but rather a mixture of these theoretical distributions. In the case of a
  linear regression, we can circumvent this difficulty by looking at the
  residuals, which should follow a normal distribution. However, this
  circumvention does not generalize to other models. I will show how to
  graphically compare the distribution of the dependent variable with the
  theoretical mixture distribution. The focus will be on a trick to sample new
  dependent variables under the assumption that the model is true. Graphing the
  distribution of the actual dependent variable together with these sampled
  variables will give an idea of whether deviations from the theoretical
  distribution could have occurred by chance. This idea will be applied to
  checking the distributional assumption in beta regression
  (
betafit) and to choosing between different parametric
  survival models (
streg).
  
  
Additional information
  buis_nordic11.pdf
 
Simulating complex survival data
Michael J. Crowther
 Department of Health Sciences,
University of Leicester, Leicester, United Kingdom
Paul C. Lambert
 Department of Health Sciences, University of
Leicester, Leicester, United Kingdom and Department of Medical Epidemiology
and Biostatistics, Karolinska Institutet, Stockholm, Sweden
  Simulation studies are essential for understanding and evaluating both
  current and new statistical models. When simulating survival times, often an
  exponential or Weibull distribution is assumed for the baseline hazard
  function, but these distributions can be considered too simplistic and lack
  biological plausibility in many situations. We will describe a new
  user-written command, 
survsim, that allows the user to
  simulate survival times from two-component mixture models, allowing much more
  flexibility in the underlying hazard.  Standard parametric models can also be
  used, including the exponential, Weibull, and Gompertz models. Furthermore,
  survival times can be simulated from the all-cause distribution of
  cause-specific hazards for competing risks.  A multinomial distribution is
  used to create the event indicator, whereby the probability of experiencing
  each event at a simulated time, 
t, is the cause-specific hazard
  divided by the all-cause hazard evaluated at time 
t.  Baseline
  covariates and non-proportional hazards can be included in all scenarios.
  Finally, we will discuss the complex extension of simulating joint
  longitudinal and survival data.
  
  
Additional information
  crowther_nordic11.pdf
 
Quantiles of the survival time from inverse probability
weighted Kaplan–Meier estimates 
Andrea Discacciati
 Unit of Biostatistics and Nutritional Epidemiology,
Institute of Environmental Medicine, Karolinska Institutet, Sweden
  The 
stci official Stata command indirectly estimates
  quantiles of the survival time for different exposure levels from the
  Kaplan–Meier estimates. However, 
stci does not take
  into account possible confounding effects. Therefore, we introduce a new
  Stata command, 
stqkm, that indirectly estimates quantiles of
  the survival time from inverse probability weighted Kaplan–Meier
  estimates.  Confidence intervals for the quantile estimates are obtained
  using the bootstrap method. We present a simulation study to assess the
  performances of the 
stqkm command in the presence of
  confounding and we present a case study.
  
  
Additional information
  discacciati_nordic11.pdf
 
An example of competing-risks analysis using Stata
Christel Häggström
 Umeâ University, Sweden
  Competing-risks analysis in epidemiology is of special importance in survival
  analysis when studying the elderly and also when the exposure is related to
  early death. In a cohort study, I investigated the association between
  metabolic factors (obesity, hypertension, high glucose levels, etc.) and
  prostate cancer (with mean age of diagnosis 70 years).  Using this data, I
  will present the analysis where I plotted cumulative incidence curves to
  visualize the risk of prostate cancer in comparison with the competing-risks,
  all-cause mortality for different levels of metabolic factors, using the
  Stata commands 
stcompet and 
stpepemori.  I
  also used Fine and Gray regression (the 
stcrreg command) to
  calculate hazard ratios of subdistribution for both prostate cancer incidence
  and all-cause mortality.
  
  
Additional information
  haggstrom_nordic11.pdf
 
Using Stata for agent-based simulations
Peter Hedström
 Institute for Futures Studies, Stockholm, Sweden
Thomas Grund
 ETH, Zürich, Switzerland
  Agent-based modeling (ABM) is an analytical tool that is becoming
  increasingly important in the social sciences. The core idea behind ABM is
  to use computational models to analyze the macro- or aggregate-level outcomes
  that groups of agents, in interaction with one another, bring about. In this
  presentation, we briefly discuss why ABM is important and show how Stata can
  be used for such analyses. We also present a suite of programs. Some of these
  commands are used for generating, visualizing, or measuring various
  properties of the networks within which the agents are embedded, and others
  are used for analyzing the collective outcomes that agents are likely to
  bring about when embedded in such networks.
 A command for Laplace regression
Nicola Orsini
 Unit of Biostatistics and Nutritional Epidemiology,
Institute of Environmental Medicine, Karolinska Institutet, Sweden
  I present an estimation command for Laplace regression to model conditional
  quantiles of a response variable given a set of covariates. The
  
laplace command is similar to the official
  
qreg command except that it can account for censored data. I
  illustrate its applicability and use through examples from health-related
  fields.
  
  
Additional information
  orsini_nordic11.pdf
 
Using meta-analysis to inform the design of subsequent studies
Sally R. Hinchliffe, Michael J. Crowther, Alison
Donald, and Alex J. Sutton
 Department of Health Sciences, University of
Leicester, Leicester, United Kingdom
  In this presentation, we describe a suite of programs (
metasim,
  
metapow, 
metapowplot) that enable the
  user to estimate the probability that the conclusions of a meta-analysis will
  change with the inclusion of a new study(ies), as described previously by
  Sutton et al. (2007). Using the
  
metasim program, we take a simulation approach to estimating the effects in
  future studies. The method assumes that the effect sizes of future
  studies are consistent with those observed previously, as represented by
  the current meta-analysis. The contexts of both two-arm randomized
  controlled trials and studies of diagnostic test accuracy are considered for
  a variety of outcome measures.  Calculations are possible under both fixed-
  and random-effect assumptions, and several approaches to inference, including
  statistical significance and limits of clinical significance, are possible.
  Calculations for specific sample sizes can be conducted (using
  
metapow), and plots, akin to traditional power curves,
  indicating the probability a new study(ies) will change inferences for a
  range of sample sizes can be produced (using 
metapowplot).
  Finally, plots of the simulation results are overlaid on a previously
  described macro, 
extfunnel, which can help to intuitively
  explain the results of such calculations of sample size.  We hope the macro
  will be useful to trialists who want to assess the impact potential new
  trials will have on the overall evidence base and meta-analysts who want to
  assess the robustness of the current meta-analysis to the inclusion of
  future data.
  
Reference: 
Sutton, A. J., N. J. Cooper, D. R. Jones, P. C. Lambert, J. R. Thompson, and K.
R. Abrams. 2007. Evidence-based sample size calculations based upon updated
meta-analysis. 
Statistics in Medicine 27: 471–490.  
  
  
Additional information
  hinchcliffe_nordic11.pdf
 
Taking the pain out of looping and storing
Patrick Royston
 MRC Clinical Trials Unit, United Kingdom
  Quite a common task in Stata is to run some sequence of commands under the
  control of a looping parameter and store the corresponding results in one
  or more new variables. Over the years, I have written many such loops, some
  of greater complexity than others. I finally became fed up with it and
  decided to write a simple command to automate the repetitive parts. The
  result is 
looprun, which I shall describe in this
  presentation.
  
  
Additional information
  royston_nordic11.ppt
 
Projecting cancer incidence using restricted cubic splines
Mark J. Rutherford, Paul C. Lambert, and John R. Thompson
 Department of Health Sciences, University of
Leicester, Leicester, United Kingdom 
  Age–period–cohort models provide a useful method for modeling
  cancer incidence and mortality rates. There is great interest in estimating
  the rates of disease at given future time points so that plans can be made
  for the provision of the required future services. In the setting of using
  age–period–cohort models incorporating restricted cubic splines,
  we propose a new technique for projecting incidence. The method is validated
  via a comparison with existing methods in the setting of Finnish Cancer
  Registry data. The reasons for the improvements seen in the newly proposed
  method are twofold. First, improvements are seen because of the finer
  splitting of the timescale to give a more continuous estimate of the
  incidence rate.  Second, the new method uses more-recent trends to dictate
  the future projections than previously proposed methods. The output will be
  produced via the user-written command 
apcfit. The
  functionality of the command will be illustrated throughout the talk.  
  
  The talk will comprise an introduction of the use of restricted cubic splines
  for model fitting before describing their use for
  age–period–cohort models. A description of the new method for
  projecting cancer incidence will be given prior to showing the results of the
  application of the method to Finnish Cancer Registry data. The talk will
  conclude with a description of the potential problems and issues when making
  projections.
  
  
Additional information
  rutherford_nordic11.pdf
 
Time to dementia onset: Competing-risks analysis with Laplace regression
Giola Santoni, Debora Rizzuto, and Laura Fratiglioni
 Aging Research Center, Karolinska Institutet, Sweden
  We want to quantify the protective effect of education on time to dementia
  onset using a longitudinal data from a population study. We consider dropout
  due to death of the subject as a competing event of the outcome of interest.
  We show an adaptation of the Laplace regression method to the case of
  competing-risks analysis. The first 20% percent of highly educated people will
  develop dementia 2.5 years (p<.01) later than those with a lower education
  level. The effect on all cause of mortality is negligible. We show that the
  results derived through Laplace regression are comparable with those derived
  with the Stata command 
stcrreg.
  
  
Additional information
  santoni_nordic11.pdf
 
Doubly robust estimation in generalized linear models with Stata
Arvid Sjölander
 Department of Medical Epidemiology and
Biostatistics, Karolinska Institutet, Sweden 
Nicola Orsini
 Units of Biostatistics and Nutritional Epidemiology,
Institute of Environmental Medicine, Karolinska Institutet, Sweden 
  The aim of epidemiological research is typically to estimate the association
  between a particular exposure on a particular outcome, adjusted for a set of
  additional covariates. This is commonly done by fitting a regression model
  for the outcome, given exposure and covariates. If the regression model is
  misspecified, then the resulting estimator may be inconsistent. Recently, a
  new class of estimators has been developed, so called “doubly
  robust” (DR) estimators. These estimators use two regression models:
  one for the outcome and one for the exposure. A DR estimator is consistent if
  either model is correct, not necessarily both. Thus DR estimators give the
  analyst two chances instead of only one to make valid inference. In this
  presentation, we describe a new package for Stata that implements the most
  common DR estimators.
  
  Additional information
  sjolander_nordic11.pdf
 
Chained equations and more in multiple imputation in Stata 12
Yulia Marchenko
StataCorp LP
  I present the new Stata 12 command, 
mi impute chained, to
  perform multivariate imputation using chained equations (ICE), also known as
  sequential regression imputation.  ICE is a flexible imputation technique
  for imputing various types of data.  The variable-by-variable specification
  of ICE allows you to impute variables of different types by choosing the
  appropriate method for each variable from several univariate imputation
  methods.  Variables can have an arbitrary missing-data pattern.  By
  specifying a separate model for each variable, you can incorporate certain
  important characteristics, such as ranges and restrictions within a subset,
  specific to each variable.  I also describe other new features in multiple
  imputation in Stata 12.
  
  
Additional information
  marchenko_nordic11.pdf
 
SEM for those who think they don’t care
Vince Wiggins
StataCorp LP
  We will discuss SEM (structural equation modeling), not from the perspective
  of the models for which it is most often used—measurement models,
  confirmatory factor analysis, and the like—but from the perspective of
  how it can extend other estimators.  From a wide range of choices, we will
  focus on extensions of mixed models (random and fixed-effects regression).
  Extensions include conditional effects (not completely random), endogenous
  covariates, and others.
  
  Additional information
  wiggins_nordic11.pdf
 
Scientific organizers
Peter Hedström, Metrika Consulting, Nuffield College and Oxford University
Nicola Orsini, Karolinska Institutet
Matteo Bottai, Karolinska Institutet
Logistics organizers
  Metrika Consulting,
  the official distributor of Stata in the Nordic and Baltic regions, and the
  Karolinska Institutet.