2015 Portuguese Stata Users Group meeting
18 September 2015
Nova School of Business and Economics
Campus de Campolide
1099-032 Lisboa
Portugal
Proceedings
Big Data in Stata
Paulo Guimarães
Bank of Portugal
Datasets are becoming increasingly larger, and their use poses new
challenges. In this presentation, I draw on my experience with managing and
analyzing large datasets and will offer some advice for Stata users.
Besides providing some practical tips, I also discuss several recent
user-written commands that are particularly suited for dealing with
large datasets. Finally, I will also talk about issues regarding
estimation of high-dimensional models.
Additional information
portugal15_guimaraes.pdf
Impact of credit ratings in crisis-hit countries: An application with Markov chains
Nicoletta Rosati
University of Lisbon
Vasco Oliveira
University of Lisbon
Credit ratings have been fairly discussed in recent years, primarily
because of the possible impacts they have on the economy. After the
financial crisis of 2008, and with no autonomy to pursue an expansionary
monetary policy, crisis-hit countries such as Portugal and Spain are
still struggling to control their public debt and reviving the economy
simultaneously while trying to be upgraded in their sovereign credit
ratings. In this presentation, we propose a different approach in analysing the impact of
changes in sovereign credit ratings on stock markets. We study the
evolution of a segmented form of the stock market index for several
crisis-hit countries, including both European and Asian markets. Such
evolution is initially modeled by a homogeneous Markov chain, where the
transition probabilities from one starting level of the index to a new
(lower or higher) level in the next period depend on some explanatory
variables, which include the country's rating, GDP, and interest rate,
through an ordered probit model. We then inspect the model's reaction to
changes of credit ratings at different percentiles of their
distribution. Finally, we suggest some possible extensions of research
and applications.
Additional information
portugal15_rosati.pdf
eurouse: A Stata command to import data from the Eurostat bulk facility
David Leite Neves
University of Lisbon
Isabel Porença
University of Lisbon
The Eurostat bulk facility contains about 5,800 datasets from more than 30
European countries. Some datasets also include the United States and Japan. The
datasets are reported to Eurostat by the national statistical offices and include
monetary and financial statistics, national accounts, labor market statistics, social statistics,
etc. Eurostat updates the datasets twice a day. In this presentation, I will present a command
that I developed to automatically download and import these datasets into Stata. The
user only needs to type the dataset code in the command line, and eurouse will
automatically build a panel with the latest records from all the countries that report to
Eurostat. The motivating example comes from the need of building a panel dataset
for European Union countries and being able to efficiently (1) identify the data, (2) have
access to their description and meta-information, and (3) feed the database with the
latest updates. The command eurouse does all of these automatically.
Using ODBC with Stata
Rita Sousa
Bank of Portugal
Open DataBase Connectivity (ODBC) is a standardized set of function
calls that can be used to access data stored in database management
systems. Stata's
odbc command allows us to load, write, and view data
from ODBC sources. My presentation will be based on general
considerations of issues related to the management of large datasets on
practical examples using ODBC.
Additional information
portugal15_sousa.pdf
Two powerful tools: gsem and margins
Isabel Canette
StataCorp
gsem is a versatile command that fits generalized structural
equation models, and it can be used to fit customized models without the
need of programming. I will introduce the different aspects of
generalized structural equation models: family and link, latent
variables, and random effects. These elements can be combined to build
complex models that might not otherwise be available as a stand-alone
command. Another useful tool is
margins, which allows us to
compute marginal means and marginal effects, among other statistics. We
will discuss how to use these features to interpret a nonlinear model,
and we will also discuss a feature introduced in Stata 14, marginal
predictions on the random effects for random-effects models.
Additional information
portugal15_canette.pdf
Stata in the everyday life of health economists
Pedro Pita Barros
Nova School of Business & Economics, Universidade Nova de Lisbon
I am a health economist, and my activities with data and Stata cover data
management (small and large datasets), simple estimation and
graphs and figures production, estimation of standard and nonstandard
models, and writing both scientific papers and a blog. I will cover
how I use the features of Stata for these activities, highlighting both
the commands I find more useful and a wish list for things for which I would like
someone to build commands.
Additional information
portugal15_barros.pdf
Lerman: A Stata module to decompose inequality using sampling weights
Bruno Damásio
University of Lisbon
David Leite Neves
University of Lisbon
The Gini index is the most widely used measure of income inequality. Lerman and Yitzhaki (1985)
proposed a method to decompose and compute the marginal impact of each income source in the
Gini index.
Ló-Feldman (2006) presented a Stata module to operationalize Lerman and Yitzhaki's method;
however, it does not allow the use of sampling weights, which considerably narrows its
application to household surveys.
In this presentation, we will present
lerman, a user-written command that incorporates
sampling weights in the
Lerman and Yitzhaki (1985) methodology. To illustrate the usefulness of the command in income
inequality studies, we will provide an empirical application to the USA, using data from the Panel
Study of Income Dynamics.
Additional information
portugal15_damasio.pdf
Technology, skills, and job duration
Hugo Castro Silva
University of Lisbon
Francisco Lima
University of Lisbon
We study technology-skill complementarities in manufacturing and their
influence on job duration by analyzing hazard functions for different
levels of technology intensity. Using a Portuguese matched
employer-employee longitudinal dataset and a robust identification
strategy of displaced workers, we estimate discrete-time duration models
allowing for unobserved heterogeneity. We find that the accumulation of
specific human capital plays a stronger role on reducing the hazard of
job separation in more technology-intensive sectors. Also the returns to
firm-specific skills and to general human capital increase with
technology intensity. Our results suggest that technology-skills
complementarity is observable in terms of job duration.
Additional information
portugal15_silva.pdf
Stata in health research: From everyday questions to major studies
Sofia Baptista
Porto University
Stata comes with multiple advantages in comparision with its direct competitors: better oriented for health sciences research and it is a robust and versatile software. Stata is easy to use with the advantage of allowing user-written commands. The price is competitive and the access to documents and help is good. Stata has been used in the major clinical and experimental studies as shown before. However, my point today is that Stata can be a powerful tool for everyday clinical questions, for those doctors who are not researchers but aim to understand statistics to improve their practice and understand tendencies about their patients' diseases and treatments. The truth is that doctors have nowadays, at the distance of a click, the most important thing to start a research: large databases.
Additional information
portugal15_baptista.pdf
Using pointers and structures in Stata to estimate panel-data models with attrition
Pierre Hoonhout
University of Lisbon
Panel datasets usually have missing data: some of the units that are
approached in the first wave fail to respond in later waves. It is well
known that this panel-data attrition leads to unreliable inferences.
Hoonhout and Ridder (2016) show that the sequential additively
nonignorable (SAN) attrition model nonparametrically just-identifies
the population distribution if refreshment samples are
available. Hoonhout (2016) proposes a weighted GMM-estimator for
this problem. The estimator corrects for the potentially biasing
effects of nonignorable attrition. This presentation will focus on the
implementation of this estimator in Stata. In particular, it will use
this context to highlight the potential benefits of using structures and
pointers in Mata.
Additional information
portugal15_hoonhout.pdf
Wishes and grumbles
Bill Rising & Isabel Canette
StataCorp
StataCorp staff will be happy to
receive wishes for developments in Stata and almost as happy to
receive grumbles about the software.
Scientific organizers
Pedro Pita Barros, Universidade Nova de Lisboa
João Cerejeira, Universidade do Minho
Anabela Carneiro, Universidade do Porto
Miguel Portela, Universidade do Minho
Paulo Guimarães, Bank of Portugal
Pierre Hoonhout, Universidade de Lisbon
Nicoletta Rosati, Universidade de Lisbon
Logistics organizers
Timberlake Consultores,
the official distributor of Stata in Portugal.