The Nordic and Baltic Stata Users Group meeting was held on 12 September 2018 at the Oslo Cancer Cluster Innovation Park, but you can view the program and presentation slides below.
Proceedings
Session 1:
| |
9:05–9:40 |
Abstract:
During the last decade, there have been several attempts
to integrate comments and statistical outputs in Stata
indicating the importance of this with respect to, for instance,
literate programming.
I present a later development based on three integrated packages:
log2markup, basetable, and matrixtools.
log2markup transforms a commented log file into a document based on markup languages of the users' choice like LaTeX, HTML or Markdown. One of the features of log2markup is that it reads output from Stata commands as part of the markup language itself. One command where this is beneficial is basetable, which is one of several interactive commands in which it is easy to build the typical first or base table for data summaries, for example, in articles. The output can set to have the style of the markup language used in the comments. I briefly demonstrate its usability. Another set of Stata commands I will present are in the Stata package matrixtools. Here, the basic command matprint makes it easy to print the matrix content in the wanted markup style. Several other matrixtools commands use matprint, such as sumat, which is an extension of the Stata command summarize. Summary statistics, including new ones like "unique values". sumat returns all results in a matrix (also for text variables). It is possible to group statistics by a categorical variable. Another such command is crossmat, which is a wrapper for the Stata command tabulate, returning all outputs in matrices. Further, there is the command metadata, which collects metadata from the current dataset, a noncurrent dataset, or all datasets in a folder (if requested, including subfolders as well). Additional information: nordic-and-baltic18_Bruun.pdf nordic-and-baltic18_toWord.do.pdf nordic-and-baltic18_toWord.docx
Niels Henrik Bruun
Aarhus University
|
9:40–10:15 |
Abstract:
Well-known instrumental variables (IV) estimators identify
treatment effects in settings with selection on levels. In
settings that also exhibit selection on gains, the treatment
effects for the compliers identified by IV might be very
different from other populations of interest. Under stronger
separability assumptions, the marginal treatment effects (MTE)
framework allows us to estimate the whole distribution of
treatment effects. I introduce the framework and theory behind
MTE, and I introduce the new package mtefe, which uses several estimation
methods to fit MTE models in Stata. This package provides
important improvements and flexibility over existing packages
such as margte (Brave and Walstrum 2014) and calculates
various treatment-effect parameters based on the results.
Additional information: nordic-and-baltic18_Andresen.pdf
Martin Eckhoff Andresen
Statistics Norway
|
10:15–10:40 |
Abstract:
In recent years, more attention has been focused
on the effects of economic growth and inequality changes
on income polarization, as well as on the changes in the
middle income class fraction. Most of the literature that
deals with this issue is focused on polarization indices.
However, the polarization indices proposed by researchers
allow only for an assessment of polarization in the whole
population and does not actually explain reasons for the
decline of middle class fractions in certain countries.
This presentation proposes a class of median relative
polarization (MRP) partial indices, which allows for a
comprehensive assessment of income distribution changes
(its polarization or convergence) in any given sub-population,
particularly the lower-, middle-, and upper- income class
groups. Moreover, a class of proposed indices is further
generalized to allow for assessment of polarization in
certain cohort groups while operating on panel-data sources.
I wrote a new Stata command that operationalizes
the proposed polarization indices. Polarization indices for
lower-, middle-, and upper-income groups in the 2005–2015 period
have been calculated using panel data for Poland (Social
Diagnosis Panel Survey Dataset). It has been shown that
despite the lack of polarization in the whole population,
there was a slight convergence of incomes in the lower- and
middle-income groups and a significant polarization of incomes
in the upper-income group. This means that on average, incomes
of the lowest and middle earners tend to converge toward the
median, while the incomes of the richest
part of the population are growing even higher.
Additional information: nordic-and-baltic18_Zwierzchowski.pptx
Jan Zwierzchowski
SGH Warsaw School of Economics
|
10:55–11:35 |
Abstract:
Calibration is a method for adjusting the sampling weights,
often to account for nonresponse and underrepresented groups
in the population. Another benefit of calibration is smaller
variance estimates compared with estimates using unadjusted weights.
Stata implements two methods for calibration: the raking-ratio
method and the generalized regression method. Stata supports
calibration for the estimation of totals, ratios, and regression
models. Calibration is also supported by each survey variance
estimation method implemented in Stata.
In this presentation, I will show how to use calibration in
survey data analysis using Stata.
Additional information: nordic-and-baltic18_Pitblado.pdf
Jeff Pitblado
StataCorp
|
11:35–12:15 |
Abstract:
Stata is the software of choice for many analysts of household
surveys, in particular for poverty and inequality analysis. No dedicated
suite of commands comes bundled with the software, but many community-contributed
commands are freely available for the estimation of various types of
indices. This presentation will present a set of new tools that complement and
significantly upgrade some existing packages. The key feature of the new
packages is their ability to leverage Stata's built-in capacity for
dealing with survey design features (via the svy prefix), resampling
methods (via the bootstrap, jackknife, or permute prefix),
multiply imputed data (via mi), and various postestimation commands for testing
purposes.
Additional information: nordic-and-baltic18_Van_Kerm.pdf
Philippe Van Kerm
Luxembourg Institute for Social and Economic Research
|
Session 2
| |
1:15–2:15 |
Abstract:
Bayesian analysis has become a popular tool for many
statistical applications. Yet many data analysts have
little training in the theory of Bayesian analysis and
software used to fit Bayesian models. This presentation
will provide an intuitive introduction to the concepts of
Bayesian analysis and demonstrate how to fit Bayesian models
using Stata. No prior knowledge of Bayesian analysis is
necessary, and specific topics will include the relationship
between likelihood functions, prior, and posterior distributions,
Markov Chain Monte Carlo (MCMC) using the Metropolis–Hastings
algorithm, and how to use Stata's Bayes prefix to fit Bayesian models.
Additional information: nordic-and-baltic18_Huber.pptx
Chuck Huber
StataCorp
|
2:30–3:30 |
Abstract:
merlin can do a lot of things: linear regression,
a Weibull survival model, a three-level logistic model,
a multivariate joint model of multiple longitudinal outcomes,
a recurrent event, and survival. merlin can do things I
haven't even thought of yet. I will take a single dataset,
attempt to show you the full range of capabilities of merlin,
and present some of the new features following its rise from
the ashes of megenreg. There will even be some surprises.
Additional information: nordic-and-baltic18_Crowther.pdf
Michael J. Crowther
University of Leicester
|
3:30–3:55 |
Abstract:
Period analysis is a method used in survival analysis that uses delayed
entry techniques in order to include only the most recent data. Period
analysis has been shown to produce more up-to-date survival predictions
compared with using the standard method of cohort analysis. However, using
period analysis reduces the sample size, which leads to greater uncertainty
in the parameter estimates.
Temporal recalibration combines the advantages of cohort and period analysis. A cohort model is fit and then recalibrated using a period analysis sample. The parameter estimates are constrained to be the same, but the baseline hazard function can vary, which allows any improvements in survival to be captured. Therefore, this method could be useful for prognostic models because it enables more up-to-date survival predictions to be produced.
In this presentation, I'll show the differences between the cohort, recalibrated,
and period-analysis models and compare the produced survival estimates.
This involves using stset to define the period analysis
sample and stpm2 to fit and recalibrate flexible parametric survival models.
Brenner, H., and O. Gefeller. 1996. An alternative approach to monitoring cancer patient survival. Cancer 78: 2004-2010. Brenner, H., B. Söderman, and T. Hakulinen. 2002. Use of period analysis for providing more up-to-date estimates of long-term survival rates: empirical evaluation among 370,000 cancer patients in Finland. International Journal of Epidemiology, 31: 456-462. Additional information: nordic-and-baltic18_Booth.pdf
Sarah Booth
University of Leicester
|
4:10–4:35 |
Abstract:
In a typical survival analysis, the time to an event of
interest is studied. For example, in cancer studies,
researchers often wish to analyze a patient's time to
death since diagnosis. Similar applications also exist
in economics and engineering. In any case, the event of
interest is often not distinguished between different causes.
Although this may sometimes be useful, in many situations
this will not paint the entire picture and restricts analysis.
More commonly, the event may occur because of different causes,
which better reflects real-world scenarios. For instance, if
the event of interest is death due to cancer, it is also possible
for the patient to die because of other causes. This means that the
time at which the patient would have died because of cancer is never
observed. These are known as competing causes of death or
competing risks.
In a competing risks analysis, interest lies in the cause-specific cumulative incidence function (CIF). This can be calculated by either:
(1) transforming on (all) cause-specific hazards, or Obtaining cause-specific CIFs within the flexible parametric modeling framework by adopting approach (1) is possible by using the stpm2 postestimation command, stpm2cif. Alternatively, since competing risks is a special case of a multistate model, an equivalent model can be fitted using the multistate package. To estimate cause-specific CIFs using approach (2), stpm2 can be used by applying time-dependent censoring weights, which are calculated on restructured data using stcrprep. The above methods involve some form of data augmentation. Instead, estimation on individual-level data may be preferred because of computational advantages. This is possible using either approach (1) or (2) with stpm2cr. In this presentation, I provide an overview of these various tools, and I discuss which of these to use and when. Additional information: nordic-and-baltic18_Mozumder.pdf
Sarwar Islam Mozumder
University of Leicester
|
4:35–5:00 |
Abstract:
In observational studies with time-to-event outcomes, we
expect that there will be confounding and would usually
adjust for confounders in a survival model. From
such models, an adjusted hazard ratio comparing exposed
and unexposed subjects is often reported. This is fine,
but hazard ratios can be difficult to interpret, are not
collapsible, and there are further problems when trying to
interpret hazard ratios as causal effects. Risks are much
easier to interpret than rates, so quantifying the
difference on the survival scale can be desirable.
In Stata, stcurve gives survival curves after fitting a model where certain covariates can be given specific values, but those not specified are given mean values. Thus, it gives a prediction for an individual who happens to have the mean values of each covariate and may not reflect the average survival in the population. An alternative is to use standardization to estimate marginal effects, where the regression model is used to predict the survival curve for unexposed and exposed subjects at all combinations of other covariates included in the model. These predictions are then averaged to give marginal effects. I present stpm2_standsurv to obtain various standardized measures after fitting a flexible parametric survival model. As well as estimating standardized survival curves, the command can estimate the marginal hazard function, the standardized restricted mean survival time, and centiles of the standardized survival curve. Contrasts can be made between any of these measures (differences, ratios). A user-defined function can be given for more complex contrasts. Additional information: nordic-and-baltic18_Lambert.pdf
Paul C. Lambert
University of Leicester and Karolinska Institutet
|
5:00–5:30 |
Abstract:
Stata developers present will carefully and cautiously
consider wishes and grumbles from Stata users in the audience.
Questions, and possibly answers, may concern reports of
present bugs and limitations or requests for new features in
future releases of the software.
StataCorp personnel
StataCorp
|