Proceedings
9:30–9:45 |
Abstract:
Scientists frequently work with pairs of alternative variables intended
to measure the same quantity. Examples include measured and predicted
disease prevalences in primary-care practices and marks awarded to student
exam scripts by two different teachers. Statistical methods developed for
use with such pairs of variables (A and B) may aim to measure components
of disagreement between the variables (like discordance, bias, and scale
differential), or they may aim to estimate one variable from the other
(calibration). The Bland–Altman plot is the standard way of presenting a
pair of alternative measures and allows us to visualize discordance, bias,
and scale differential at the same time. However, it lacks parameters with
confidence limits. The SSC packages somersd, scsomersd, and rcentile
can be used to estimate rank parameters. They can measure discordance using Kendall's τa
between A and B, bias using the mean sign and percentiles of A-B, and scale
differential using Kendall's τa between A-B and A+B.
For calibration (predicting A from B),
we can use the SSC packages wridit and polyspline
to define a ridit spline of A with respect to B.
We can then plot the observed B and the predicted A (with confidence limits)
against the ridit of B
to create a continuous alternative to the standard decile plot commonly used for calibration.
Additional information: UK19_Newson.pdf
Roger Newson
Department of Primary Care and Public Health, Imperial College London
|
9:45–10:00 |
Abstract:
Joint longitudinal-survival models are now increasingly utilized to quantify the association
between a repeatedly measured biomarker and time-to-event outcome.
Where singular methods ignore the dependency between the biomarker and time-to-event outcome,
joint models describe the association
while accounting for possible measurement error and the intermittent nature of observations.
Furthermore, extensions to these models can allow estimation of survival probabilities
that are conditional on measurements to date and individual characteristic information.
These probabilities give an up-to-date risk estimate for event occurrence tailored to the individual.
Currently, there are two commands available in Stata, that are designed to fit these models. The command stjm was first on the scene and was specifically written to fit joint models. However, as the new kid on the block, merlin has greater flexibility than its predecessor. As a fairly recently established command, however, the postestimation command options are still a work in progress. The aim is to establish a command using both ado and Mata programming that will be able to produce a graphical illustration of individualized conditional survival probabilities. In this presentation, I will be talking about my coding journey to this end. Additional information: UK19_Ashra.pdf
Nuzhat B. Ashra
Michael J. Crowther
Biostatistics Research Group, University of Leicester
|
10:00–10:15 |
Abstract:
There are many economic variables such as prices or wages that exhibit infrequent or lumpy adjustments.
These outcomes occur when there are costs associated with making such changes,
which lead agents to adopt an (S,s) decision rule.
These rules are characterized by a band of inaction,
where agents tolerate some deviation from an optimal frictionless outcome,
provided that the deviation is within the (S,s) interval thresholds.
The purpose of this presentation is to describe a new command, xtss,
that estimates the parameters of a simple (S,s) rule model, for panel-data applications.
This extends the specification developed by Dhyne et al. (2011) for modelling sticky prices
by allowing the thresholds to have truncated Normal distributions
and depend on regressors that vary over time and across individuals.
Dhyne, E., C. Fuss, M.H. Pesaran, and P. Sevestre. (2011). Lumpy price adjustments: A microeconometric analysis. Journal of Business & Economic Statistics, 29: 529–540. Additional information: UK19_Vincent.pdf
David Vincent
Independent (self-employed)
|
10:15–10:30 |
Abstract:
The emergence of GIS data offers a plethora of analytical approaches to
investigate societal phenomena or policies in a spatial context. However, not
all policies are implemented on the level of clearly delineated administrative
areas. Some interventions might be active in imprecisely specified or only
partially known geographic sectors. As a direct consequence, the resulting
uncertainty regarding the area-of-effect (AOE) impacts on estimates of the
effectiveness of a related policy.
In this research, I present a new Stata tool to investigate the robustness of area-specific effectiveness estimates when the observed area might suffer from unknown degrees of misspecification relative to the actual AOE of a policy. In this regard, there are several dimensions of uncertainty regarding the observed AOE that relate to a potential misspecification of the observed intervention area in three dimensions—its position, orientation, and scale. The impact of these forms of area misspecification can be assessed in the aoeplacebo program, either by generating a number of AOE placebo test diagnostics or conducting AOE permutation simulations. Project webpage: https://sites.google.com/site/weisserresearch/home/research-work/aoeplacebo Additional information: UK19_Weisser.pdf
Reinhard A. Weisser
Nottingham Business School, Nottingham Trent University
|
10:30–10:45 |
Abstract:
Large healthcare databases are increasingly used for research investigating the effects of medications.
However, adequate adjustment for confounding remains a key issue
because incorrect conclusions can be drawn amid concerns of residual or unmeasured confounding.
The high-dimensional propensity score (hd-PS) has been proposed as a solution to improve confounder adjustment and was developed in the context of US claims data by Schneeweiss et al. (2009). This approach treats information, stored as codes, within healthcare databases as proxies for key underlying confounders. Some proxies are likely to be strongly correlated with the variables typically included in a traditional propensity score or multivariable analysis and others may represent information about patients that is otherwise unmeasured, such as frailty. By including many of these proxies in the analysis, the hd-PS aims to adjust for both measured and unmeasured confounding. I present hdps, a command implementing this approach in Stata. Having defined data dimensions and the level of code truncation, hdps allows the user to set several tuning parameters: the number of codes to retain per dimension (d), the prespecified time frame, and the number of variables to include in the final model (k). The command generates proxy variables and performs a variable selection step to identify important variables for confounder adjustment. I illustrate hdps using a study from the Clinical Practice Research Datalink (CPRD). Additional information: UK19_Tazare.pdf
John Tazare
Ian Douglas
Elizabeth Williamson
London School of Hygiene and Tropical Medicine
|
10:45–11:00 |
Abstract:
Stata has provided putdocx, which is an excellent suite of commands that can create XML documents.
Often in clinical trials there is a fair amount of summary statistics and frequency tables
when producing final study documents or data monitoring committee reports.
Using putdocx commands works reasonably well at producing these tables
but requires many lines of code to produce a reasonable table
and often requires every cell of the table to be specified by the user.
I will introduce a new command, report,
that takes the pain out of producing summary statistics tables and frequency tables.
This should ease the burden on statisticians who have to do this work
and can also therefore avoid the cut and paste culture to produce table outputs.
Additional information: UK19_Mander.pptx
Adrian Mander
Cardiff University
|
11:30–12:00 |
Abstract:
There are two broad approaches to coding a simulation study in Stata.
The first is to write an rclass program that simulates and analyzes data
before using the simulate command to repeat the process and store summaries of results.
The second is to loop through repetitions and use the postfile family to store results.
One favors the simulate approach because the code is much cleaner, so it is easier to spot mistakes.
The other favors the postfile approach because it delivers a superior dataset
summarizing simulation results. Both are good reasons.
During yet another argument,
we spotted a third approach that is unambiguously right
because it uses cleanly structured code and delivers a useful dataset.
This presentation will describe the issues with the simulate and postfile approaches
before showing the correct approach.
Simulation studies are an important element of statistical research, but they can be derailed,
sometimes badly, by coding errors.
The approach that gives both clean code and a usable dataset
is worthwhile for all but the simplest simulation studies.
Additional information: UK19_Morris.pptx
Tim P. Morris
MRC Clinical Trials Unit, University College London
Michael J. Crowther
Department of Health Sciences, University of Leicester
|
12:00–12:20 |
Abstract:
Nearly 40,000 people in the U.S. die from firearm-related causes annually.
Of these, about 1% are intentionally shot and killed while at work;
work-related homicides account for about 10% of all workplace fatalities.
While firearm policies have remained essentially unchanged at the national level,
there is greater variation in state-level gun control legislation.
Moreover, the gun control landscape between and within states has changed considerably over the past 10 years.
Little recent work has focused on determinants or epidemiology of workplace homicide.
The purpose of this study is to test whether changes in state-level gun control policies
are associated with changes in state-level workplace homicide rates.
Our analysis shows that stronger gun-control policies,
particularly around concealed carry permitting, background checks,
and domestic violence, may be effective means of reducing work-related homicide.
Additional information: UK19_Baum.pdf
Kit Baum
Erika Sabbath
Summer Sherburne Hawkins
Boston College
|
12:20–12:40 |
Abstract:
dbnomics provides a suite of tools to search, browse, and import time-series data from DBnomics,
the world's economic database (https://db.nomics.world).
DBnomics is a web-based platform that aggregates and maintains time-series data
from various statistical agencies across the world.
dbnomics works only with Stata 14 or higher,
because it relies on the secure HTTP protocol (https).
dbnomics provides an interface to DBnomics' RESTful API allowing for advanced filtering of data using Stata's native options syntax. To achieve this, the command relies on Erik Lindsley's libjson backend (ssc install libjson). Additional information: UK19_Signore.pdf
Simone Signore
European Investment Fund (European Investment Bank Group)
|
12:40–1:00 |
Abstract:
Many procedures in statistical science benefit from working on a
transformed scale, either with or without a later return to the original
scale. Using a logarithmic axis scale for a graph and taking logarithms
of a response or predictor are common if not elementary examples.
Transformations provide a theme for reviewing small Stata tips and
tricks and larger Stata commands for using a transformation known to be
a good idea or choosing a transformation that might be a good idea.
Terrain covered includes (1) using and labeling standard and not-so-standard graph scales&emdash;not just logarithm, but also root, cube root, reciprocal, neglog, asinh, logit, and other folded transformations; (2) log-ratio transformations for compositional data; (3) density estimation on transformed scales; (4) user-chosen link functions for generalized linear models; (5) choice of transformations given distributions and relationships. Some recent and new Stata commands will be among the illustrations. Additional information: UK19_Cox.pptx
Nicholas J. Cox
Department of Geography, University of Durham
|
2:00–2:15 |
Abstract:
I present seven quality of life improvements for everyday Stata usage.
The first three send messages to your smartphone,
for example, to tell you the do-file encountered an error or the end of its journey.
The fourth allows for low level task parallelization, which saves effort, frustration, and time.
The fifth is a straightforward single-line timer.
The sixth lets you write do-files in a highly organized way with minimal effort
(and it writes code, which is both amazing and a little scary).
Finally, the seventh one makes it easy to access the US Census API.
Additional information: UK19_Wursten.pdf
Jesse Wursten
KU Leuven FEB
|
2:15–2:30 |
Abstract:
I set out to describe the origins, development, and current status of a Stata program suite
I have developed to handle requests for up-to-date tables and graphs showing the
demographic distribution and outcomes of registry data.
Stata's tabulation and graphical features continue to develop and become more flexible, and with the putdocx functions making it straightforward to generate reports, it is easier than ever to create publication-quality output. However, it is also important to make sure when creating graphs and tables that the headings, axis labels, legend, etc. match the content. As a statistician with the British Society of Blood and Marrow Transplantation (BSBMT), demands on my time include specific retrospective studies. In these cases, data are double checked, cleaned, and returned to me at a prespecified time point. Other analysis requests also increasingly include "up-to-date" reports on the whole registry or large subsections of it. These frequently involve repetitive graphs, tables, or both, for instance, cycling over diagnosis or over centers where the procedures were performed. This drove the creation of the suite of programs I will describe to generate tables of demographics, outcomes, and graphs (mostly survival curves). Additional information: UK19_Pearce.pptx
Rachel Pearce
BSBMT Statistician, Guy's Hospital
|
2:30–3:00 |
Abstract:
This presentation explains how to estimate long-run coefficients and bootstrap standard errors
in a dynamic panel with heterogeneous coefficients,
common factors, and many observations over cross-sectional units and time periods.
The common factors cause cross-sectional dependence,
which is approximated by cross-sectional averages.
Heterogeneity of the coefficients is accounted by taking the unweighted averages
of the unit-specific estimates.
Following Chudik, Mohaddes, Pesaran and Raissi (2016, Advances in Econometrics 36: 85-135)
I consider three models to estimate long-run coefficients:
a simple dynamic model (CS-DL),
an error-correction model,
and an ARDL model (CS-ARDL).
I explain how to estimate all three models using the Stata community-contributed command xtdcce2.
Secondly, I compare the nonparametric standard errors and bootstrapped standard errors.
The bootstrap follows on the lines of Goncalves and Perron (2016)
and the community-contributed command boottest by Roodman, Nielsen, Webb, and Mackinnon (2018).
The challenges are to maintain the error structure across time and cross-sectional units
and to encompass the dynamic structure of the model.
Additional information: UK19_Ditzen.pdf
Jan Ditzen
Heriot-Watt University
|
3:00–3:15 |
Abstract:
In the 2017 Spanish Stata Users Group meeting, held in Madrid on 19 October,
we introduced some functions for generating random samples from continuous and
discrete distributions using Stata 13.
In this presentation, I will show new extensions of such functions updated for Stata 15. I will describe their syntax and show different examples of use. I will also compare the new developed functions with the built-in Stata ones and with the function rsample. The goodness of the generated samples will be checked using the mean squared error (MSE) of the differences between the frequencies of the sample and the theoretical expected ones. I will also provide bar charts that will allow the user to graphically compare the sample with the exact distribution function of the random distribution being sampled.
Graphics capabilities are included in the new developed functions so that the distribution of the generated sample can be displayed.
This fact is useful in the teaching and learning process in subjects, which deals with statistics.
Specifically, this educational approach has been considered when teaching statistics
in the Health Engineering degree of the University of Málaga (Spain).
Aguilera-Venegas, G., J.L. Galán-García, M.A. Galán-García, Y. Padilla-Domínguez, P. Rodríguez-Cielos, R. Rodríguez-Cielos. 2017. Random samples generation with Stata from continuous and discrete distributions. 2017 Spanish Stata Users Group meeting, Madrid (Spain). Lukácsy, K. 2011. Generating random samples from user-defined distributions. Stata Journal 11: 299–304. Additional information: UK19_Galan.pdf
Gabriel Aguilera-Venegas
José Luis Galán-García
María Ángeles Galán-García
Yolanda Padilla-Domínguez
Pedro Rodríguez-Cielos
Departamento de Matemática Aplicada, Universidad de Málaga
|
3:15–3:30 |
Abstract:
Background: Analysis of pre- and post-intervention change in observational studies using
Patient Reported Outcome Measures (PROMs) is often believed to be a trivial exercise,
and guidance for analysis of data from randomized control trials is often applied.
This is often inappropriate, and that analysis of change scores may be preferable.
However, it is unclear if this is suitable in outcomes with floor and ceiling effects.
I investigate the association between body mass index (BMI) and the efficacy of primary hip replacement.
Methods: Using a Monte Carlo simulation study and data from a national joint replacement register (162,513 patients with pre- and post-surgery PROMs) I investigate simple approaches for the analysis of outcomes with floor and ceiling effects that are measured at two occasions: linear and tobit regression (baseline adjusted ANCOVA, change-score analysis, postscore analysis) in addition to linear and multilevel Tobit models. Results: Analysis of data with floor and ceiling effects with models that fail to account for these features induce substantial bias. Single-level tobit models correct only for floor or ceiling effects when the exposure of interest is not associated with the baseline score. In observational data scenarios, only multilevel tobit models are capable of providing unbiased inferences. Conclusions: Inferences from pre- and post-studies that fail to account for floor and ceiling effects may induce spurious associations with substantial risk of bias. Multilevel tobit models indicate the efficacy of total hip replacement is independent of BMI. Restricting access to total hip replacement based on a patient's BMI cannot be supported by the data. Adrian Sayers
Michael R. Whitehouse
Andrew Judge
Musculoskeletal Research Unit, University of Bristol
Alexander MacGregor
Musculoskeletal Medicine Research Group, University of East Anglia
Ashley W. Blom
Musculoskeletal Research Unit, University of Bristol
Yoav Ben-Shlomo
Bristol Population Health Science Institute, University of Bristol
|
4:00–4:30 |
Abstract:
In this presentation,
I will go through the workflow of creating an interactive presentation in Stata
(a .smcl presentation) with smclpres based on a small example presentation.
Some talks are primarily on how to do things in Stata, like a lecture on graphs in Stata or a talk at a Stata Users' Group meeting. In those cases, a .smcl presentation can be useful. A .smcl presentation is a series of linked .smcl files that open in the viewer inside Stata (like help files). The strength of a .smcl presentation is that it can contain links that execute examples, open help files, open do-files, etc. A .smcl presentation is all about illustrating how to do something in Stata, so preparing for such a talk typically starts with preparing a set of examples in a do-file. By adding specific comments to that do-file, for example, to indicate when a slide starts and when it ends, what the title of the slide is, etc., the smclpres command can turn that do-file into a .smcl presentation. Moreover, the pres2html command can turn that .smcl presentation into an HTML handout so that participants can easily access the content after the presentation. Additional information: UK19_Buis.zip
Maarten Buis
Department of History and Sociology, University of Konstanz
|
4:30–5:30 |
Abstract:
In dynamic models with unobserved group-specific effects,
the lagged dependent variable is an endogenous regressor by construction.
The conventional fixed-effects estimator is biased and inconsistent under fixed-T asymptotics.
To deal with this problem, "difference GMM" and "system GMM" estimators
in the spirit of Arellano and Bond (1991, Review of Economic Studies),
Arellano and Bover (1995, Journal of Econometrics),
and Blundell and Bond (1998, Journal of Econometrics)
are predominantly applied in practice.
While Stata has the official commands xtabond and xtdpdsys—both are wrappers for
xtdpd—the Stata community widely associates these methods
with the xtabond2 command provided by Roodman (2009, Stata Journal).
10 years after Roodman's award winning Stata Journal article, this presentation revisits the GMM estimation of dynamic panel-data models in Stata. I present the new command, xtdpdgmm, that addresses some shortcomings of xtabond2 and adds further flexibility to the specification of the estimators. In particular, it allows to incorporate the Ahn and Schmidt (1995, Journal of Econometrics) nonlinear moment conditions that can improve the efficiency and robustness of the estimation. Besides the familiar one-step and two-step estimators, xtdpdgmm also provides the Hansen, Heaton, and Yaron (1996, Journal of Business & Economic Statistics) iterated GMM estimator. While it can be pedagogically useful to think about "system GMM" as a system of a level equation and an equation in first differences or forward-orthogonal deviations, I explain that the resulting estimator can still be regarded as a "level GMM" estimator with a set of transformed instruments. These transformed instruments can be obtained as a postestimation feature and used for subsequent specification tests, for example with the ivreg2 command suite of Baum, Schaffer, and Stillman (2003 and 2007, Stata Journal). I further address common pitfalls and frequently asked questions about the estimation of linear dynamic panel-data models. Additional information: UK19_Kripfganz.pdf
Sebastian Kripfganz
University of Exeter Business School
|
9:00–10:00 |
Abstract:
Meta-analysis combines results of multiple similar studies to provide an estimate of the overall effect.
This overall estimate may not always be representative of a true effect.
Often, studies report results that vary in magnitude and even direction of the effect,
which leads to between-study heterogeneity.
And sometimes the actual studies selected in a meta-analysis are not representative of the population of interest,
which happens, for instance, in the presence of publication bias.
Meta-analysis provides the tools to investigate and address these complications.
Stata has a long history of meta-analysis methods contributed by Stata researchers.
In my presentation, I will introduce Stata's new suite of commands, meta,
and demonstrate it using real-world examples.
Additional information: UK19_Marchenko.pdf
Yulia Marchenko
StataCorp
|
10:00–10:15 |
Abstract:
ultimatch implements various score and distance based matching methods,
for example, nearest neighbor, radius, coarsened exact, percentile rank and Mahalanobis distance matching.
It implements an efficient method for distance-based matching like Mahalanobis matching,
preventing the quadratic increment of the runtime.
Matched observations are marked individually allowing interactions between treated and counterfactuals.
Different methods can be combined to improve the results or to impose external requirements on the matched.
Among other control variables, it creates mandatory weights to provide balanced matching results,
preventing distortions caused by skewed counterfactual candidate distributions,
for instance, overabundance of candidates with the same score or within the same coarsened group.
Additional information: UK19_Doherr.pptx
Thorsten Doherr
Leibniz Centre for European Economic Research
|
10:15–10:30 |
Abstract:
The Instrumental Variable (IV) method is a standard econometric approach to address endogeneity issues
(for example, when an explanatory variable is correlated with the error term).
It relies on finding an instrument, excluded from the outcome equation (second stage),
but which is a determinant of the endogenous variable of interest (first stage).
Many instruments rely on cross-sectional variation produced by a dummy variable,
which is discretized from a continuous variable.
There might be several reasons for converting a continuous variable into a binary instrument.
First, continuous instruments recoded as dummies have been shown to provide
a parsimonious nonparametric model for the underlying first-stage relation
(Angrist and Pischke 2009).
Second, it provides a simple tool to evaluate the IV strategy and the identification assumptions.
Unfortunately, the construction of the binary instrument often appears to be arbitrary,
which may raise concerns about the robustness of the second-stage results.
I propose a data-driven procedure to build this discrete instrument, implemented in a command called discretize. The boundaries of the discrete variable are chosen to maximize the F-statistic in the first stage. This procedure has two main advantages. First, it minimizes the weak instrument problem, which can arise in case of incorrect functional specification in the first stage. Second, it offers a transparent, data-driven, procedure to select an instrument that does not depend on arbitrary decisions made by the researcher. Several options are available with the command to check graphically the robustness of the first- and second-stage parameters.
The presentation includes an explanation of the functioning of the discretize command,
and an illustration of its usefulness with an example
that relates the raise of violent crime in city centers and the process of suburbanization.
The endogenous relation is solved using lead poisoning as instrument.
Angrist, J. D., and J.-S.Pischke. 2009. Mostly harmless econometrics: An empiricist's companion. Princeton: Princeton University Press. Additional information: UK19_Fontenay.pdf
Sébastian Fontenay
ECON-IRES, Université Catholique de Louvain
|
10:30–10:45 |
Abstract:
A well-known result is that exactly identified IV has no moments, including in
the ideal case of an experimental design (that is, a randomized control trial
with imperfect compliance). However, this result no longer holds when the sign
of the first stage is known. I describe a Stata implementation of an unbiased
estimator for instrumental-variable models with a single endogenous regressor
where the sign of one or more first-stage coefficients is known (Andrews and
Armstrong 2017) and its finite sample properties under alternative error
structures.
Additional information: UK19_Nichols.pdf
Austin Nichols
Abt Associates
|
10:45–11:00 |
Abstract:
A prototype command, d3, that makes interactive graphs with the JavaScript
library D3 (Data-Driven Documents) was presented at the 2014 London
Stata Users' Group meeting.
Since then, the SVG graph export format in Stata versions 14 and up has made this task much simpler. I present a new version of d3 and its supporting commands in a package called Stata2D3, which takes advantage of the innate link between SVG and web browsers. Together, they export any Stata graph to SVG, wrap it in an HTML file, tag components of the graph such as markers and lines, append data from other variables of interest, and use the D3 library to add interactivity. The result is a familiar Stata graph in the web browser, the style of which can be controlled in the usual ways, but with the option of interactivity such as pop-up information when the mouse hovers over a marker or line, highlighting one line on click or hover, or tickboxes to show and hide groups of data. Any Stata graph is immediately available, and different forms of interactivity can be added. Additional information: UK19_Grant.pdf
Robert Grant
BayesCamp
|
11:30–11:50 |
Abstract:
Network meta-analysis is a statistical approach to combining evidence from multiple studies comparing multiple treatments.
It may be "two-stage",
where treatment effects and their variances are estimated separately for each study and then combined using a normal approximation
or "one stage", where summary statistics at treatment group level (for example, number of events and number of individuals) are directly analyzed.
My network suite currently provides various tools for exploring network meta-analysis data
and analyzing them in a two-stage frequentist approach
(C. White 2015, The Stata Journal 15: 1–34).
I will describe arguments for preferring a one-stage Bayesian approach
and recent work implementing it.
The one-stage approach amounts to fitting a generalized linear mixed model,
but I was unable to achieve adequate mixing using bayes: meglm.
I will describe my alternative approach of automating the writing and running of a WinBUGS program.
This process is implemented in the new network bayes and allows substantial modelling flexibility,
including normal or binomial data; various contrast-based and arm-based models;
various heterogeneity structures; and the option to sample from the prior.
Features not yet implemented are inconsistency models and meta-regression.
Additional information: UK19_White.pptx
Ian R. White
MRC Clinical Trials Unit at University College London
|
11:50–12:10 |
Abstract:
In this presentation, I will present two new Stata commands to produce heat plots.
Generally speaking, a heat plot is a graph in which one of the dimensions of the data is visualized using a color gradient.
One example of such a plot is a two-dimensional histogram
in which the frequencies of combinations of binned X and Y are displayed
as rectangular (or hexagonal) fields using a color gradient.
Another example is a plot of a trivariate distribution
where the color gradient is used to visualize the (average) value of Z within bins of X and Y.
Yet another example is a plot that displays the contents of a matrix,
say, a correlation matrix or a spacial weights matrix using a color gradient.
The two commands I will present are called heatplot and hexplot.
Additional information: UK19_Jann.pdf
Ben Jann
University of Bern
|
12:10–1:00 |
Abstract:
The increasing availability of high-dimensional data and increasing interest
in more realistic functional forms have sparked a renewed interest in
automated methods for selecting the covariates to include in a model.
I discuss the promises and perils of model selection and pay special attention
to estimators that provide reliable inference after model selection.
I will demonstrate how to use Stata 16's new features for double selection,
partialing out, and cross-fit partialing out to estimate the effects
of variables of interest while using lasso methods to select control variables.
Additional information: UK19_Drukker.pdf
David Drukker
StataCorp
|
2:00–2:30 |
Abstract:
I present the commands twexp and twgravity, which implement the
estimators developed in Jochmans (2017) for exponential regression
models with two-way fixed effects. twexp is applicable to generic nxm
panel data. twgravity is written for the special case where the data are
a cross-section on dyadic interactions between n agents. A prime example
of the latter is cross-sectional bilateral trade data, where the model
of interest is a gravity equation with importer and exporter effects.
Both twexp and twgravity can deal with data where n and m are large,
that is, the case of many fixed effects.
Idea: The pseudo-Poisson approach suffers from two drawbacks. The first is a numerical one. Indeed, the large amount of fixed effects implies that a simple approach that combines, say, poisson with n+m dummy variables will be infeasible in many datasets. The routines poi2hdfe (Guimaraes 2016) or ppmlhdfe (Correia et al. 2019) are designed especially to deal with this problem and are useful alternatives here. The second drawback is that the plug-in estimator of the covariance matrix of the above moment conditions is severly biased. The origin of the problem is the estimation of the incidental parameters Additional information: UK19_Jochmans.pdf
Koen Jochmans
Faculty of Economics, University of Cambridge
Vincenzo Verardi
FUNDP, Université Libre de Bruxelles
|
2:30–3:30 |
Abstract:
This presentation starts with a general introduction to quantile regression
(see qreg and related commands)
and then addresses two topics from recent research,
specifically quantile regression with time-invariant individual ("fixed") effects,
and structural quantile function estimation. After summarizing the main results in these areas,
I present the approach to these problems proposed by Machado and Santos Silva
(Quantiles via moments, Journal of Econometrics 2019, forthcoming),
and illustrate the use of the corresponding Stata commands xtqreg and ivqreg2
(downloadable from SSC).
Additional information: UK19_Santos.pdf
João Santos Silva
School of Economics, University of Surrey
|
4:00–5:15 |
Report to users and open panel discussion with Stata developers
David Drukker, William Gould, Yulia Marchenko, and Alan Riley
|
Scientific committee
University of Leicester
London School of Economics and Political Science
Imperial College London
Logistics organizer
The logistics organizer for the 2019 London Stata Conference is Timberlake Consultants, the Stata distributors to the United Kingdom and Ireland, Spain, Portugal, the Middle East and North Africa, Brazil, and Poland.
View the proceedings of previous Stata Users Group meetings.