
Epidemiological tables
Want to analyze data from a prospective study, cohort study,
case–control study, or matched case–control study? Stata's tables for
epidemiologists make it easy to summarize your data and compute statistics
such as incidence-rate ratios, incidence-rate differences, risk ratios, risk
differences, odds ratios, and attributable fractions. You can analyze
stratified data too—compute Mantel–Haenszel combined estimates, perform
tests of homogeneity, and standardize estimates. If you have an ordinal
rather than binary exposure, you can perform a test for a trend.
Survival analysis
Analyze duration outcomes—outcomes measuring the time to an event
such as failure or death—using Stata's specialized tools for
survival analysis. Account for the complications inherent in survival
data, such as sometimes not observing the event (right-, left-, and
interval-censoring), individuals entering the study at differing times
(delayed entry), and individuals who are not continuously observed
throughout the study (gaps). You can estimate and plot the probability
of survival over time. Or model survival as a function of covariates
using Cox, Weibull, lognormal, and other regression models. Predict
hazard ratios, mean survival time, and survival probabilities. Do you
have groups of individuals in your study? Adjust for within-group
correlation with a random-effects or shared-frailty model. When you have
interval-censored multiple-event data, you can fit a marginal Cox model.
If you have many potential covariates, use lasso
cox and elasticnet cox for model selection
and prediction.
Linear, binary, and count regressions
Fit classical ANOVA and linear regression models of the relationship
between a continuous outcome, such as weight, and the determinants of
weight, such as height, diet, and level of exercise. If your response
is binary, ordinal, categorical, or count, don't worry. Stata has estimators
for these types of outcomes too. Use logistic regression to adjust odds
ratios for confounding variables. Estimate incidence rates using a Poisson
model. Analyze matched case–control data with conditional logistic
regression. A vast array of tools is available after fitting such models.
Predict outcomes and their confidence intervals. Test equality of parameters.
Compute linear and nonlinear combinations of parameters.
Survey methods
Whether your data require a simple weighted adjustment because of differential
sampling rates or you have data from a complex multistage survey, Stata's
survey features can provide you with correct standard errors and confidence
intervals for your inferences. Simply specify the relevant characteristics of
your sampling design, such as sampling weights (including weights at multiple
stages), clustering (at one, two, or more stages), stratification, and
poststratification. After that, most of Stata's estimation commands can adjust
their estimates to correct for your sampling design.
Marginal means, contrasts, and interactions
Marginal means and contrasts let you analyze the relationships between
your outcome variable and your predictors, even when your outcome is binary,
count, ordinal, or categorical. For instance, after you fit a logistic
regression of a disease on an exposure variable and other covariates,
your marginal means may be population-averaged risks. Or you can set the
covariates to interesting values to compute adjusted risks and then use
contrasts to get adjusted risk differences. After
fitting almost any model in Stata, you can analyze the effect of covariate
interactions and easily create plots to visualize those interactions.
Power, precision, and sample size
Before you conduct your experiment, determine the sample size needed to detect
meaningful effects without wasting resources. Do you intend to compute CIs for
means or variances or perform tests for proportions or correlations? Do you
plan to fit a Cox proportional hazards model or compare survivor functions
using a log-rank test? Do you want to use a Cochran—Mantel—Haenszel test of
association or a Cochran—Armitage trend test? Use Stata's
power command to
compute power and sample size, create customized tables, and automatically
graph the relationships between power, sample size, and effect size for your
planned study. Or use the ciwidth
command to do the same but for CIs instead
of hypothesis tests by computing the required sample size for the desired CI
precision. Or use gsdesign
to compute stopping boundaries and the required sample sizes for group sequential
designs. Instead of commands, use the interactive Control Panel to perform your analysis.
Meta-analysis
Combine results of multiple studies to estimate an overall effect. Use
forest plots to visualize results. Use subgroup analysis and
meta-regression to explore study heterogeneity. Use funnel plots and
formal tests to explore publication bias and small-study effects. Use
trim-and-fill analysis to assess the impact of publication bias on
results. Perform cumulative and leave-one-out meta-analysis. Perform
univariate, multilevel, and multivariate meta-analysis. Use the meta suite, or let the Control Panel interface
guide you through your entire meta-analysis.
Causal inference
Estimate experimental-style causal effects from observational data. With
Stata's treatment-effects estimators, you can use a potential-outcomes
(counterfactuals) framework to estimate, for instance, the effect of
family structure on child development or the effect of unemployment on
anxiety. Fit models for continuous, binary, count, fractional, and
survival outcomes with binary or multivalued treatments using
inverse-probability weighting (IPW), propensity-score matching,
nearest-neighbor matching, regression adjustment, or doubly robust
estimators. Explore treatment-effect heterogeneity across individuals or
across groups with conditional average treatment effects (CATEs). If the
assignment to a treatment is not independent of the outcome, you can use
an endogenous treatment-effects estimator. In the presence of group and
time effects, you can use difference-in-differences (DID) and
triple-differences (DDD) estimators. In the presence of high-dimensional
covariates, you can use lasso. If causal effects are mediated through
another variable, use causal mediation with mediate to disentangle direct and indirect effects.
Multiple imputation
Account for missing data in your sample using multiple imputation. Choose
from univariate and multivariate methods to impute missing values in
continuous, censored, truncated, binary, ordinal, categorical, and count
variables. Then, in a single step, estimate parameters using the imputed
datasets, and combine results. Fit a linear model, logit model, Poisson model,
multilevel model, survival model, or one of the many other supported models.
Use the mi command, or let the Control Panel interface guide you through your
entire MI analysis.
Multilevel mixed-effects models
Whether the groupings in your data arise in a nested fashion (patients nested
in clinics and clinics nested in regions) or in a nonnested fashion (regions
crossed with occupations), you can fit a multilevel model to account for the
lack of independence within these groups. Fit models for continuous, binary,
count, ordinal, and survival outcomes. Estimate variances of random
intercepts and random coefficients. Compute intraclass correlations. Predict
random effects. Estimate relationships that are population averaged over the
random effects.
Bayesian analysis
Fit Bayesian regression models using one of the Markov chain Monte Carlo
(MCMC) methods. You can choose from various supported models or even
program your own. Extensive tools are available to check convergence,
including multiple chains. Compute posterior mean estimates and credible
intervals for model parameters and functions of model parameters. You
can perform both interval- and model-based hypothesis testing. Compare
models using Bayes factors. Compute model fit using posterior predictive
values and generate predictions. If you want to account for model
uncertainty in your regression model, use Bayesian model averaging.
Use Bayesian variable selection for linear regression to identify predictors
important to your outcome and perform Bayesian inference.
Additive models of relative risk
Determine how exposures interact to put subjects at a higher risk of
experiencing an outcome of interest. For example, you might be
investigating how exposure to cigarette smoke and asbestos interact to
increase the risk of lung cancer. With Stata's reri command, you
can measure two–way interactions in an additive model of relative
risk, while accounting for other risk factors. Choose from various
supported models, such as binomial generalized linear, Poisson, negative
binomial, logistic, Cox, parametric survival, and
interval–censored parametric and semiparametric survival models.
Estimate the relative excess risk due to interaction (RERI),
attributable proportion (AP), and synergy index (SI).
Machine learning
With machine learning via H2O, you can use ensemble decision
trees—random forests and gradient boosting machines—for regression
and classification. Or use lasso for sparse regression and classification.
Or use Bayesian variable selection or Bayesian model averaging to select
predictors in a linear model.
For causal inference with machine learning, use double-selection lasso,
partialing-out lasso, and double machine learning. You can use PCA or
kmeans, kmedians, or hierarchical clustering for unsupervised learning.
And use search to find community-contributed
commands for neural networks, support vector machines, graphical lasso,
text mining, and more.
Automated reporting and customizable tables
Stata is designed for reproducible research, including the ability to
create dynamic documents incorporating your analysis results. Create
Word or PDF files, populate Excel worksheets with results and format
them to your liking, and mix Markdown, HTML, Stata results, and Stata
graphs, all from within Stata. Create tables that compare
regression results or summary statistics, use default styles
or apply your own, and export your tables to Word, PDF, HTML, LaTeX,
Excel, or Markdown and include them in your reports.
Jupyter Notebook with Stata
Jupyter Notebook is widely used by
researchers and scientists to share their ideas and results for collaboration
and innovation. It is an easy-to-use web application that allows you to
combine code, visualizations, mathematical formulas, narrative text, and other
rich media in a single document (a "notebook") for interactive computing and
developing. You can invoke Stata and Mata from Jupyter Notebook with the
IPython (interactive Python) kernel. This
means you can combine the capabilities of both Python and Stata in a single
environment to make your work easily reproducible and shareable with others.
Reproducibility
Stata is the only software for data science and statistical analysis featuring a comprehensive integrated versioning that ensures your code continues to run, unaltered, even after updates or new versions are released. No need to keep around multiple legacy installations to avoid breaking your system; Stata code from 40 years ago can still be run without modification. Datasets, graphs, scripts, programs, and more are 100% cross-platform and backward compatible.
There is a lot to like about Stata, but for an epidemiologist the ease of use of the svy commands is not matched in any other package.
— George Savva
School of Health Sciences, University of East Anglia
Intuitive and easy to use.
Once you learn the syntax of one estimator, graphics command,
or data manipulation tool, you will effortlessly understand the rest.
Accuracy, reliability, and reproducibility.
Stata is extensively and continually tested.
Stata's tests produce
approximately 7.2 million lines of testing code. Each of
those lines is compared against known-to-be-accurate results
across editions of Stata and every operating system Stata supports to
ensure accuracy and reproducibility, including
integrated versioning
for backwards compatibility.
One package. No modules.
When you buy Stata, you obtain
everything for your statistical,
graphical, and data analysis needs. You do not need to buy separate modules
or import your data to specialized software.
Write your own Stata programs.
You can easily write your own Stata programs and commands. Share them
with others or use them to simplify your work. Utilize Stata's
do-files, ado-files, and Mata:
Stata's own advanced programming
language that adds direct support for matrix programming. You can also
access and benefit from the thousands of existing Stata
community-contributed programs.
Extensive documentation.
Stata offers 36 manuals with more than 19,000 pages of PDF documentation
containing detailed examples, in-depth discussions, references to relevant literature,
and methods and formulas. Stata's documentation is a great place to learn about
Stata and the statistics, graphics, data manipulation, and data science tools you
are using for your research.
Top-notch technical support.
Stata's technical support is known for their prompt, accurate,
detailed, and clear responses. People answering your questions have master's
and PhD degrees in relevant areas of research.
Join us for one of our free live webinars. Ready. Set. Go Stata shows you how to quickly get started manipulating, graphing, and analyzing your data. Or, go deeper in one of our special-topics webinars.
Stata's YouTube has over 300 videos with a dedicated playlist of methodologies important to epidemiologists. And they are a convenient teaching aid in the classroom.
Get started quickly at using Stata effectively, or even learn how to perform rigorous time-series, panel-data, or survival analysis, all from the comfort of you home or office. NetCourses make it easy.
Stata Press offers books with clear, step-by-step examples that make teaching easier and that enable students to learn and epidemiologists to implement the latest best practices in analysis.