Home  /  Users Group meetings  /  2016 London

Proceedings

9:15–9:45
The role of Somers's D in propensity modeling
Abstract: The Rubin method of confounder adjustment, in its 21st-century version, is a two-phase method for using observational data to estimate a causal treatment effect on an outcome variable. It involves first finding a propensity model in the joint distribution of a treatment variable and its confounders (the design phase), and then estimating the treatment effect from the conditional distribution of the outcome, given the treatments and confounders (the analysis phase). In the design phase, we want to limit the level of spurious treatment effect that might be caused by any residual imbalance between treatment and confounders that may remain, after adjusting for the propensity score by propensity matching and weighting and/or stratification.

A good measure of this is Somers's D(W|X), where W is a confounder or a propensity score and X is the treatment variable. The SSC package somersd calculates Somers's D for a wide range of sampling schemes, allowing matching and weighting and restriction to comparisons within strata. Somers's D has the feature that if Y is an outcome, then a higher-magnitude D(Y|X) cannot be secondary to a lower-magnitude D(W|X), implying that D(W|X) can be used to set an upper bound to the size of a spurious treatment effect on an outcome. For a binary treatment variable X, D(W|X) gives an upper bound to the size of a difference between the proportions, in the two treatment groups, that can be caused for a binary outcome. If D(W|X) is less than 0.5, then it can be doubled to give an upper bound to the size of a difference between the means, in the two treatment groups, that can be caused for an equal-variance normal outcome, expressed in units of the common standard deviation for the two treatment groups.

We illustrate this method using a familiar dataset, with examples using propensity matching, weighting, and stratification. We use the SSC package haif in the design phase to check for variance inflation caused by propensity adjustment and use the SSC package scenttest (an addition to the punaf family) to estimate the treatment effect in the analysis phase.

Additional information:
Newson_uk16.pdf
Newson_examples1.do

Roger B. Newson
Imperial College London
9:45–10:15
Multistate survival analysis in Stata
Abstract: Multistate models are increasingly being used to model complex disease profiles. By modeling transitions between disease states, accounting for competing events at each transition, we can gain a much richer understanding of patient trajectories and how risk factors impact over the entire disease pathway. In this talk, we will introduce some new Stata commands for the analysis of multistate survival data. This includes msset, a data preparation tool that converts a dataset from wide (one observation per subject, multiple time and status variables) to long (one observation for each transition for which a subject is at risk for). We develop a new estimation command, stms, that allows the user to fit different parametric distributions for different transitions, simultaneously, while allowing for sharing of covariate effects across transitions. Finally, predictms calculates transition probabilities, and many other useful measures of absolute risk, following the fit of any model using streg, stms, or stcox, using either a simulation approach or the Aalen–Johansen estimator. We illustrate the software using a dataset of patients with primary breast cancer.

Additional information:
Crowther_uk16.pdf

Michael J. Crowther and Paul C. Lambert
University of Leicester and Karolinska Institutet
10:15–10:45
Quantile plots: New planks in an old campaign
Abstract: Quantile plots show ordered values (raw data, estimates, residuals, whatever) against rank or cumulative probability or a one-to-one function of the same. Even in a strict sense, they are almost 200 years old. In Stata, quantile, qqplot, and qnorm go back to 1985 and 1986. So why any fuss?

The presentation is built on a long-considered view that quantile plots are the best single plot for univariate distributions. No other kind of plot shows so many features so well across a range of sample sizes with so few arbitrary decisions. Both official and user-written programs appear in a review that includes side-by-side and superimposed comparisons of quantiles for different groups and comparable variables. Emphasis is on newer, previously unpublished work, with focus on the compatibility of quantiles with transformations; fitting and testing of brand-name distributions; quantile-box plots as proposed by Emanuel Parzen (1929–2016); equivalents for ordinal categorical data; and the question of which graphics best support paired and two-sample t and other tests.

Commands mentioned include distplot, multqplot, and qplot (Stata Journal) and mylabels, stripplot, and hdquantile (SSC).

References:

Cox, N.J. 1999a. Distribution function plots. Stata Technical Bulletin 51: 12–16. Updates Stata Journal 3-2, 3-4, 5-3, 10-1.

1999b. Quantile plots, generalized. Stata Technical Bulletin 51: 16–18. Updates Stata Technical Bulletin 61; Stata Journal 4-1, 5-3, 6-4, 10-4, 12-1.

2005. The protean quantile plot. Stata Journal 5: 442–460.

2007. Quantile–quantile plots without programming. Stata Journal 7: 275–279.

2012. Axis practice, or what goes where on a graph. Stata Journal 12: 549–561.


Additional information:
Cox_uk16.pdf

Nicholas J. Cox
Durham University
11:15–11:45
texdoc 2.0: An update on creating LaTeX documents from within Stata
Abstract: At the 2009 meeting in Bonn, I presented a new Stata command called texdoc. The command allowed weaving Stata code into a LaTeX document, but its functionality and its usefulness for larger projects were limited. In the meantime, I heavily revised the texdoc command to simplify the workflow and improve support for complex documents. The command is now well suited, for example, to generate automatic documentation of data analyses or even to write an entire book. In this talk, I will present the new features of texdoc and provide examples of their application.

Additional information:
Jann_uk16.pdf
Jann_example1.pdf
Jann_example2.pdf

Ben Jann
University of Bern
11:45–12:15
Creating summary tables using the sumtable command
Abstract: In many fields of statistics, summary tables are used to describe characteristics within a study population. Moreover, such tables are often used to compare characteristics of two or more groups, for example, treatment groups in a clinical trial or different cohorts in an observational study. This talk introduces the sumtable command, a user-written command that can be used to produce such summary tables, allowing for different summary measures within one table. Summary measures available include means and standard deviations, medians and interquartile ranges, and numbers and percentages. The command removes any manual aspect of creating these tables (for example, copying and pasting from the Stata output window) and therefore eliminates transposition errors. It also makes creating a summary table quick and easy and is especially useful if data are updated and tables subsequently need to change. The end result is an Excel spreadsheet that can be easily manipulated for reports or other documents. Although this command was written in the context of medical statistics, it would be equally useful in many other settings.

Additional information:
Scott_uk16.pdf

Lauren J. Scott and Chris A. Rogers
Clinical Trials and Evaluation Unit, Bristol
12:15–12:45
Partial effects in fixed-effects models
Abstract: One of the main reasons for the popularity of panel data is that they make it possible to account for the presence of time-invariant unobserved individual characteristics, the so-called fixed effects. Consistent estimation of the fixed effects is only possible if the number of time periods is allowed to pass to infinity, a condition that is often unreasonable in practice. However, in a small number of cases, it is possible to find methods that allow consistent estimation of the remaining parameters of the model, even when the number of time periods is fixed. These methods are based on transformations of the problem that effectively eliminate the fixed effects from the model.

A drawback of these estimators is that they do not provide consistent estimates of the fixed effects, and this limits the kind of inference that can be performed. For example, in linear models, it is not possible to use the estimates obtained in this way to make predictions of the variate of interest. This problem is particularly acute in nonlinear models, where often the parameters have little meaning, and it is more interesting to evaluate partial effects on quantities of interest.

In this presentation, we show that although it is indeed generally impossible to evaluate the partial effects at points of interest, it is sometimes possible to consistently estimate quantities that are informative and easy to interpret. The problem will be discussed using Stata, centered on a new ado-file for calculating the average logit elasticities.

Additional information:
Santos_uk16.pdf

João M.C. Santos Silva
University of Surrey
Gordon Kemp
University of Essex
1:45–2:45
What does your model say? It may depend on who is asking
Abstract: Doctors and consultants want to know the effect of a covariate for a given covariate pattern. Policy analysts want to know a population-level effect of a covariate. I discuss how to estimate and interpret these effects using factor variables and margins.

Additional information:
Drukker_uk16.pdf

David M. Drukker
StataCorp
2:45–3:15
Analyzing volatility shocks to Eurozone CDS spreads with a multicountry GMM model in Stata
Abstract: We model the time series of credit default swap (CDS) spreads on sovereign debt in the Eurozone, allowing for stochastic volatility and examining the effects of country-specific and systemic shocks. A weekly volatility series is produced from daily quotations on 11 Eurozone countries: CDS for 2009–2010. Using Stata's gmm command, we construct a highly nonlinear model of the evolution of realized volatility when subjected to both idiosyncratic and systemic shocks. Evaluation of the quality of the fit for the 24 moment conditions is produced by a Mata auxiliary routine. This model captures many of the features of these financial markets during a turbulent period in the recent history of the single currency. We find that systemic volatility shocks increase returns on "virtuous" borrowers' CDS while reducing returns for the most troubled countries' obligations.

Additional information:
Baum_uk16.pdf

Christopher F. Baum
Boston College and DIW Berlin
Paola Zerilli
University of York
3:15–3:30
xtdcce2: Estimating dynamic common correlated effects in Stata
Abstract: This presentation introduces a new Stata command, xtdcce, to estimate a dynamic common correlated effects model with heterogeneous coefficients. The estimation procedure mainly follows Chudik and Pesaran (2015); in addition, the common correlated effects estimator (Pesaran 2006) as well as the mean group (Pesaran and Smith 1995) and the pooled mean group estimator (Shin, Pearson, and Smith 1999) are supported. Coefficients are allowed to be heterogeneous or homogeneous. In addition, instrumental variable regressions and unbalanced panels are supported. The cross-sectional dependence test (CD test) is automatically calculated and presented in the estimation output. Examples for empirical applications of all estimation methods mentioned above are given.

References:

Chudik, A., and M. H. Pesaran. 2015. Large panel data models with cross-sectional dependence: A survey. In The Oxford Handbook of Panel Data, ed. B. H. Baltagi, 3–45. New York: Oxford University Press.

Pesaran, M. 2006. Estimation and inference in large heterogeneous panels with a multifactor error structure. Econometrica 74: 967–1012.

Pesaran, M.H., and R. Smith. 1995. Estimating long-run relationships from dynamic heterogeneous panels. Journal of Econometrics 68: 79–113.

Shin, Y., M. H. Pesaran M.H., and R. P. Smith. 1999. Pooled mean group estimation of dynamic heterogeneous panels. Journal of the American Statistical Association 94: 621–634.


Additional information:
Ditzen_uk16.pdf

Jan Ditzen
Spatial Economics and Econometrics Centre, Heriot-Watt University, Edinburgh
4:00–4:30
Analyzing repeated measurements while accounting for derivative tracking varying within-subject variance and autocorrelation: the xtiou command
Abstract: Linear mixed-effects models are commonly used for the analysis of longitudinal biomarkers of disease. Taylor, Cumberland, and Sy/In (1994) proposed modeling biomarkers with a linear mixed-effects model with an added integrated Ornstein–Uhlenbeck (IOU) process (linear mixed-effects IOU model). This allows for autocorrelation, changing within-subject variance, and the incorporation of derivative tracking, that is, how much a subject tends to maintain the same trajectory for extended periods of time. Taylor, Cumberland, and Sy argued that the covariance structure induced by the stochastic process in this model was interpretable and more biologically plausible than the standard linear mixed-effects model. However, their model is rarely used, partly because of the lack of available software. We present a new Stata command, xtiou, that fits the linear mixed-effects IOU model and its special case, the linear mixed-effects Brownian motion model. The model can be fit to balanced and unbalanced data, using restricted maximum-likelihood estimation, where the optimization algorithm is either the Newton–Raphson, Fisher scoring, or average information algorithm, or any combination of these. To aid convergence, the command allows the user to change the method for deriving the starting values for optimization, the optimization algorithm, and the parameterization of the IOU process. We also provide a predict command to generate predictions under the model. We illustrate xtiou and predict with an example of repeated biomarker measurements from HIV-positive patients.

Reference:

Taylor, J., W. Cumberland, and J. Sy. 1994. A stochastic model for analysis of longitudinal AIDS data. Journal of the American Statistical Association 89: 727–736.


Additional information:
Hughes_uk16.pdf

Rachael A. Hughes, Jonathan A.C. Sterne, and Kate Tilling
University of Bristol
Michael G. Kenward
Luton
4:30–5:00
statacpp: An interface between Stata and C++, with big data and machine-learning applications
Abstract: Stata and Mata are very powerful and flexible for data processing and analysis, but there are some problems that can be fixed faster or more easily by using a lower-level programming language. statacpp is a command that allows users to write a C++ program, have Stata add your data, matrices, or globals into it, compile it to an executable program, run it, and return the results back into Stata as more variables, matrices, or globals in a do-file. The most important use cases are likely to be around big data and MapReduce (where data can be filtered and processed according to parameters from Stata and reduced results passed into Stata) and machine learning (where existing powerful libraries such as TensorFlow can be utilised). Short examples will be shown of both these aspects. Future directions for development will also be outlined, in particular calling Stata from C++ (useful for real-time responsive analysis) and calling CUDA from Stata (useful for massively parallel processing on GPU chips).

Work in progress at https://github.com/robertgrant/statacpp

Additional information:
Grant_uk16.pdf

Robert L. Grant
Kingston and St George's, London
5:00–5:30
Using pattern mixture modeling to account for informative attrition in the Whitehall II study: A simulation study
Abstract: Attrition is one potential bias that occurs in longitudinal studies when participants drop out and is informative when the reason for attrition is associated with the study outcome. However, this is impossible to check because the data we need to confirm informative attrition are missing. When data are missing at random (MAR), the probability of missingness not being associated with the missing values conditional on the observed data, one appropriate approach for handling missing data is multiple imputation (MI). However, when attrition results in the data being missing not at random (MNAR), the probability of missing data is associated with the values missing, so we cannot use MI directly. An alternative approach is pattern mixture modeling, which specifies the distribution of the observed data, which we know, and the missing data, which we don’t know. We can estimate the missing data models, using observations about the data, and average the estimates of the two models using MI. Many longitudinal clinical trials have a monotone missing pattern (once participants drop out, they do not return), which simplifies MI, so use pattern mixture modeling as a sensitivity analysis. However, in observational studies, data are missing because of nonresponses and attrition, which is a more complex setting for handling attrition compared with clinical trials.

For this study, we used data from the Whitehall II study. Data were first collected on over 10,000 civil servants in 1985 and data collection phases are repeated every 2-3 years. Participants complete a health and lifestyle questionnaire and, at alternate , odd-numbered phases, attend a screening clinic.

Over 30 years, many epidemiological studies used these data. One study investigated how smoking status at baseline (Phase 5) was associated with a 10-year cognitive decline using a mixed model with random intercept and slope. In these analyses, the authors replaced missing values in non-responders with last observed values. However, participants with reduced cognitive function may be unable to continue participation in the Whitehall II study, which may bias the statistical analysis.

Using Stata, we will simulate 1,000 datasets with the same distributions and associations as Whitehall II to perform the statistical analysis described above. First, we will develop a MAR missingness mechanism (conditional on previously observed values) and change cognitive function values to missing. Next, for attrition, we will use a MNAR missingness mechanism (conditional on measurements at the same phase). For both MAR and MNAR missingness mechanisms, we will compare the bias and precision from an analysis of simulated datasets without any missing data with a complete case analysis and an analysis of data imputed using MI; additionally, for the MNAR missingness mechanism, we will use pattern mixture modeling. We will use the twofold fully conditional specification (FCS) algorithm to impute missing values for nonresponders and to average estimates when using pattern mixture modeling. The twofold FCS algorithm imputes each phase sequentially conditional on observed information at adjacent phases, so is a suitable approach for imputing missing values in longitudinal data. The user-written package for this approach, twofold, is available on the Statistical Software Components (SSC) archive. We will present the methods used to perform the study and results from these comparisons.

Additional information:
Welch_uk16.pdf

Catherine Welch, Martin Shipley, Eric Brunner, and Mika Kivim
Research Department of Epidemiology and Public Health, UCL
Séverine Sabia
INSERM U1018, Centre for Research in Epidemiology and Population Health, Villejuif, France
9:30–10:00
xtdpdqml: Quasi-maximum likelihood estimation of linear dynamic short-T panel-data models
Abstract: In this presentation, I discuss the new Stata command xtdpdqml, which implements the unconditional quasi-maximum likelihood estimators of Bhargava and Sargan (1983, Econometrica 51: 1635–1659) for linear dynamic panel models with random effects and of Hsiao, Pesaran, and Tahmiscioglu (2002, Journal of Econometrics 109: 107–150) for linear dynamic panel models with fixed effects when the number of cross-sections is large and the time dimension is fixed.

The marginal distribution of the initial observations is modeled as a function of the observed variables to circumvent a short-T dynamic panel-data bias. Robust standard errors are available following the arguments of Hayakawa and Pesaran (2015, Journal of Econometrics 188: 111–134). xtdpdqml also supports standard postestimation commands, including suest, which can be used for a generalized Hausman test to discriminate between the dynamic random-effects and the dynamic fixed-effects model.

Additional information:
Kripfganz_uk16.pdf

Sebastian Kripfganz
University of Exeter Business School
10:00–10:30
Distribution regression made easy
Abstract: Incorporating covariates in (income or wage) distribution analysis typically involves estimating conditional distribution models, that is, models for the cumulative distribution of the outcome of interest conditionally on the value of a set of covariates. A simple strategy is to estimate a series of binary outcome regression models for \(F(z|x_i)= {\rm Pr}(y_i \le z |x_i)\) for a grid of values for \(z\) (Peracchi and Foresi, 1995, Journal of the American Statistical Association; Chernozhukov et al., 2013, Econometrica) This approach now often referred to as "distribution regression" is attractive and easy to implement. This talk illustrates how the Stata commands margins and suest can be useful for inference here and suggests various tips and tricks to speed up the process and solve potential computational issues. It also shows how to use conditional distribution model estimates to analyze various aspects of unconditional distributions.
Philippe Van Kerm
Luxembourg Institute of Socio-Economic Research
10:30–10:45
sdmxuse: Program to import statistical data within Stata using the SDMX standard
Abstract: SDMX, which stands for Statistical Data and Metadata eXchange, is a standard developed by seven international organizations (BIS, ECB, Eurostat, IMF, OECD, the United Nations, and the World Bank) to facilitate the exchange of statistical data (https://sdmx.org/). The package sdmxuse aims at helping Stata users to download SDMX data directly within their favorite software. The program builds and sends a query to the statistical agency (using RESTful web services), then imports and formats the downloaded dataset (in XML format). Some initiatives, notably the SDMX connector by Attilio Mattiocco at the Bank of Italy (https://github.com/amattioc/SDMX), have already been implemented to facilitate the use of SDMX data for external users, but they all rely on the Java programming language. Formatting the data directly within Stata has proved to be quicker for large datasets, but it also offers a simpler way for users to address potential bugs. The last argument is of particular importance for a standard that is evolving relatively fast.

The presentation will include an explanation of the functioning of the sdmxuse program as well as an illustration of its usefulness in the context of macroeconomic forecasting. Since the seminal work of Stock and Watson (2002), factor models have become widely used to compute early estimates (now-casting) of macroeconomic series (for example, Gross Domestic Product). More recent works (for example, Angelini et al. 2011) have shown that regressions on factors extracted from a large panel of time series outperform traditional bridge equations. But this trend has increased the need for datasets with many time series (often more than 100) that are updated immediately after new releases are made available (that is, almost daily). The package sdmxuse should be of interest for users wanting to work on the development of such models.

Angelini, E., G. Camba-Mendez, D. Giannone, L. Reichlin, and G. Rünstler. 2011. Short-term forecasts of euro area GDP growth. Econometrics Journal 14: 25–44.

Stock, J. H., and M. W. Watson. 2002. Forecasting using principal components from a large number of predictors. Journal of the American Statistical Association 97: 1167–1179.


Additional information:
Fontenay_uk16.pdf

Sébastien Fontenay
Institut de Recherches Économiques et Sociales, Université catholique de Louvain
11:15–12:15
Joint modeling of longitudinal and survival data
Abstract: Joint modeling of longitudinal and survival-time data has been gaining more and more attention in recent years. Many studies collect both longitudinal and survival-time data. Longitudinal, panel, or repeated-measures data record data measured repeatedly at different time points. Survival-time or event history data record times to an event of interest such as death or onset of a disease. The longitudinal and survival-time outcomes are often related and should thus be analyzed jointly. Three types of joint analysis may be considered: 1) evaluation of the effects of time-dependent covariates on the survival time; 2) adjustment for informative dropout in the analysis of longitudinal data; and 3) joint assessment of the effects of baseline covariates on the two types of outcomes. In this presentation, I will provide a brief introduction to the methodology and demonstrate how to perform these three types of joint analysis in Stata.

Additional information:
Marchenko_uk16.pdf

Yulia Marchenko
StataCorp
12:15–12:45
stpm2cr: A Stata module for direct likelihood inference on the cause-specific cumulative incidence function within the flexible parametric modeling frame work
Abstract: Modeling within competing risks is increasing in prominence as researchers are becoming more interested in real-world probabilities of a patient's risk of dying from a disease while also being at risk of dying from other causes. Interest lies in the cause-specific cumulative incidence function (CIF), which can be calculated by (1) transforming on the cause-specific hazards (CSH) or (2) through its direct relationship with the subdistribution hazards (SDH).

We expand on current competing risks methodology within the flexible parametric survival modeling framework and focus on approach (2), which is more useful when we look to questions on prognosis. These can be parameterized through direct likelihood inference on the cause-specific CIF (Jeong and Fine 2006), which offers a number of advantages over the more popular Fine and Gray (1999) modeling approach. Models have also been adapted for cure models using a similar approach described by Andersson et al. (2011) for flexible parametric relative survival models.

An estimation command, stpm2cr, has been written in Stata that is used to model all cause-specific CIFs simultaneously. Using SEER data, we compare and contrast our approach with standard methods and show that many useful out-of-sample predictions can be made after fitting a flexible parametric SDH model, for example, CIF ratios and CSH. Alternative link functions may also be incorporated such as the logit link leading to proportional odds models and models can be easily extended for time-dependent effects. We also show that an advantage of our approach is that it is less computationally intensive, which is important, particularly when analyzing larger datasets.

References:

Andersson, T. M-L., P. W.Dickman, S. Eloranta, and P. C. Lambert. 2011. Estimating and modelling cure in population-based cancer studies within the framework of flexible parametric survival models. BMC Medical Research Methodology 11(1): 96. doi: 10.1186/1471-2288-11-96.

Fine, J. P., and R. J. Gray. 1999. A proportional hazards model for the subdistribution of a competing risk. Journal of the American Statistical Association 446: 496–509.

Jeong, J-H., and J. P. Fine. 2006. Direct parametric inference for the cumulative incidence function. Applied Statistics 55: 187–200.


Additional information:
Islam_uk16.pdf

Sarwar Islam and Mark J. Rutherford
University of Leicester
Paul C. Lambert
University of Leicester and Karolinska Institutet, Stockholm
1:45–3:00
Using simulation studies to evaluate statistical methods in Stata: A tutorial
Abstract: Simulation studies are an invaluable tool for statistical research, particularly for the evaluation of a new method or comparison of competing methods. Simulations are well used by methodologists but often conducted or reported poorly, and are underused by applied statisticians. It's easy to execute a simulation study in Stata, but it's at least as easy to do it wrong.

We will describe a systematic approach to getting it right, visiting the following:
  • Types of simulation study
  • An approach to planning yours
  • Setting seeds and storing states
  • Saving estimates with simulate and postfile
  • Preparing for failed runs and trapping errors
  • The three types of dataset involved in simulations
  • Analysis of simulation studies
  • Presentation of results (including Monte Carlo error)
This tutorial will visit concepts, code, tips, tricks, and potholes, with the aim of giving the uninitiated the necessary understanding to start tackling simulation studies.

Additional information:
Morris_uk16.pdf

Tim Morris
MRC Clinical Trials Unit at UCL
Ian White
MRC Biostatistics Unit, Cambridge
Michael Crowther
University of Leicester
3:00–3:30
Reference-based multiple imputation for sensitivity analysis of clinical trials with missing data
Abstract: The statistical analysis of longitudinal randomized clinical trials is frequently complicated by the occurrence of protocol deviations that result in incomplete datasets for analysis. However one approaches analysis, an untestable assumption about the distribution of the unobserved postdeviation data must be made. In such circumstances, it is important to assess the robustness of trial results from primary analysis to different credible assumptions about the distribution of the unobserved data.

Reference-based multiple-imputation procedures allow trialists to assess the impact of contextually relevant qualitative missing data assumptions (Carpenter, Roger, and Kenward 2013). For example, in a trial of an active versus placebo treatment, missing data for active patients can be imputed following the distribution of the data in the placebo arm. I present the mimix command, which implements the reference-based multiple-imputation procedures in Stata, enabling relevant accessible sensitivity analysis of trial datasets.

Carpenter, J.R., J. H. Roger, and M. G. Kenward. 2013. Analysis of longitudinal trials with protocol deviation: a framework for relevant, accessible assumptions, and inference via multiple imputation. Journal of Biopharmaceutical Statistics 23(6):1352–71.


Additional information:
Cro_uk16.pdf

Suzie Cro
MRC Clinical Trials Unit at UCL and London School of Hygiene and Tropical Medicine
4:00–4:30
Parallel computing in Stata: Making the most out of your desktop
Abstract: Parallel computing has promised to deliver faster computing for everyone using off-the-shelf multicore computers. Despite proprietary implementation of new routines in Stata/MP, the time required to conduct computationally intensive tasks such as bootstrapping, simulation, and multiple imputation hasn't dramatically improved.

One strategy to speed up computationally intensive tasks is to use distributed high performance computer clusters (HPC). Using HPCs to speed up computationally intensive tasks typically involves a divide and conquer approach. This simply divides repetitive tasks and distributes them across multiple processors and combines the results independently at the end of the process.

The ability to access such clusters is limited; however, a similar system can be implemented on your desktop PC using the user-written command qsub.

qsub provides a wrapper that writes, submits, and monitors jobs submitted to your desktop PC and that may dramatically improve the speed in which frequent computationally intensive tasks are achieved.
Adrian Sayers
Musculoskeletal Research Unit, University of Bristol
4:30–close
Wishes and grumbles
StataCorp

Organizers

Scientific committee

Nicholas J. Cox
Durham University

Patrick Royston
MRC Clinical Trials Unit at UCL

Tim Morris
MRC Clinical Trials Unit at UCL

Logistics organizer

The logistics organizer for the 2016 London Stata Users Group meeting is Timberlake Consultants, the distributor of Stata of Stata in the UK, Ireland, and Eire.

View the proceedings of previous Stata Users Group meetings.