Regresió de series temporales epidemiológicas con Stata
Speakers: Aurelio Tobías and Mike J. Campbell
Introduction
Time series regression models are specially suitable in epidemiology for
evaluating short-term effects of time-varying exposures. In epidemiological
time-series studies, a single population is assessed with reference to its
change over the time in the ra te of any health outcome and the corresponding
changes in the exposure factors during the same period. Time-series regression
have been applied in a wide range of situations. Examples might include the
study of the short-term effects of air pollution on health [1]; sudden infant
death syndrome and environmental temperature [2]; or infectious
gastrointestinal illness related to drinking water [3].
Stata manuals are alphabetically ordered by the command name instead of
topics. This implies that reviews of commands can be useful for users.
However, specific commands for time-series regression of counts are not
available in Stata by default. Usually ado files came from three different
sources; official Stata commands, Stata Technical Bulletin, and
Boston College of Economics website. We present a review of useful
commands developed by different users to deal with this topic. These commands
can be divided in four categories; data management, graphics, statistical
analysis, and model fit.
Data management
First step is check for duplicate records
(dups). In a
time-series analysis we should also generate sinusoidal terms
(gensin), lags, and moving summary statistics of
variables
(movsumm). When the analysis has
been done, we can transform results from regression models as an increase of
the relative risk for
k units of the
x variable
(getrr), and also keep th e parameter estimates in a
new data set.
Graphics
The graph command forms the core of Stata
graphics. To produce scatterplots with y versus multiple x, or
with multiple y versus multiple x variables, the
muxplot and muxyplot
are available. We can study the distribution of a variable over time
with tsplot, or using the more powerful
sssplot. The cross-correlation plot
(xcorr) can be used to study lag structures.
Statistical analysis
Time series data usually contain autocorrelation between observations. We must
check, graphically, for residual autocorrelation thorough the ACF
(ac) and PACF (pac) plot s. Another problem is
the overdispersion, it can be tested calculating the overdispersion parameter
through the sum of the chi-square residuals (odp). The
solution to both problems proposed in the APHE A project [1] was to include a
specification of the autocorrelation in the model. The command
arpois fits a log-linear model allowing for autocorrelation
and overdispersion using Iterative Weighted Least Squares. This ado file is
based in the Schwartz's SAS macro. Generalised Additive Models [5] have been
suggested as a better alternative to analyse epidemiological time-series data
[6]. The gam command is based in the GAMFIT program [7].
Finally, robust regression methods can also be fitted; rglm
calculates a Huber (sandwich) estimate of the variance-covariance matrix of
estimates.
Model fit
Dealing with nested models the loglikelihood ratio test is preferable (
lrtest or
lrtest2). Whilist non-nested models
the Akaike's Information Criteria is suggested
(mlfit).
Alternatively, for any maximum likelihood estimation Stata provides the
pseudo-R
2.
References
- Schwartz, J., C. Spix, G. Touloumi, et al. 1996.
- Methodological issues in studies of air pollution and daily counts of
deaths or hospital admissions. J Epidemiol Community Health 50 (suppl
1); S3–S11.
- Buchdahl, R, A. Parker, T. Stebbings, et al. 1996.
- Association between air pollution and acute childhood wheezy episodes:
prospective observational study. Br Med J 312: 661–665.
- Campbell, M. J. 1994.
- Time series regression for counts: an investigation into the relationship
between Sudden Infant Death Syndrome and environmental temperature. J Royal
Stat Soc A 157: 191–208.
- Schwartz, J., R. Levin, K. Hodge. 1997.
- Drinking water turbidity and pediatric hospital use for gastrointestinal
illness in Philadelphia. Epidemiology 8: 615–620.
- Hastie, T. J., R. J. Tibshiriani. 1990.
- Generalized Additive Models. London: Chapman and Hall.
- Schwartz, J. 1994.
- Non-parametric smoothing in the analysis of air pollution and respiratory
illness. Can J Stat 4: 471–487.
- Hastie, T. J., R. J. Tibshirani.
- GAMFIT software. (http://lib.stat.cmu.edu/general/).