Stata Bookstore: Epidemiology: Study Design and Data Analysis, Third Edition

Home / Bookstore / Title index / Biostatistics and epidemiology / Epidemiology: Study Design and Data Analysis, Third Edition

Epidemiology: Study Design and Data Analysis, Third Edition

Click to enlarge
See the back cover

Buy from Amazon

As an Amazon Associate, StataCorp earns a small referral credit from qualifying purchases made from affiliate links on our site.

Amazon Associate affiliate link

What are VitalSource eBooks?
Your access code will be emailed upon purchase.

eBook not available for this title

Author:	Mark Woodward
Publisher:	Chapman & Hall/CRC
Copyright:	2013
ISBN-13:	978-1-439-83970-6
Pages:	898; hardcover

Author:	Mark Woodward
Publisher:	Chapman & Hall/CRC
Copyright:	2013
ISBN-13:
Pages:	898; eBook
Price:	$0.00

Author:	Mark Woodward
Publisher:	Chapman & Hall/CRC
Copyright:	2013
ISBN-13:
Pages:	898; Kindle
Price:	$

Comment from the Stata technical group

Woodward’s third edition of Epidemiology: Study Design and Data Analysis has two target audiences: researchers who need statistical solutions to epidemiology problems and statisticians who wish to learn how their science applies to epidemiology. This book successfully presents statistical principles in epidemiology in a manner that is neither too theoretical nor too replete with medical jargon. It provides complete treatment of the topic, from simple contingency tables to meta-analysis. The book uses real data throughout—more than 20 large datasets are cataloged for download—and the end of each chapter has exercises. Woodward makes Stata code for working many of the examples available for download.

Topics include basic terminology, causality, descriptive statistics, testing of means, relative risks versus odds ratios, exact tests based on tables, tests for linear and nonlinear trends, confounding and interaction, direct and indirect standardization, cohort designs, case–control studies, intervention studies, power and sample size, linear models (including analysis of variance), logistic and other models for binary responses, survival analysis (including Cox regression), and meta-analysis. The third edition has been expanded to include risk scores and clinical decision rules, bootstrapping, multiple imputation, binomial regression models, competing risk, propensity scoring, and splines.

View table of contents >>

1 Fundamental issues

1.1 What is epidemiology?
1.2 Case studies: The work of Doll and Hill
1.3 Populations and samples

1.3.1 Populations
1.3.2 Samples

1.4 Measuring disease

1.4.1 Incidence and prevalence

1.5 Measuring the risk factor
1.6 Causality

1.6.1 Association
1.6.2 Problems with establishing causality
1.6.3 Principles of causality

1.7 Studies using routine data

1.7.1 Ecological data
1.7.2 National sources of data on disease
1.7.3 National sources of data on risk factors
1.7.4 International data

1.8 Study design

1.8.1 Intervention studies
1.8.2 Observational studies

1.9 Data analysis
Exercises

2 Basic analytical procedures

2.1 Introduction

2.1.1 Inferential procedures

2.2 Case study

2.2.1 The Scottish Heart Health Study

2.3 Types of variables

2.3.1 Qualitative variables
2.3.2 Quantitative variables
2.3.3 The hierarchy of type

2.4 Tables and charts

2.4.1 Tables in reports
2.4.2 Diagrams in reports

2.5 Inferential techniques for categorical variables

2.5.1 Contingency tables
2.5.2 Binary variables: proportions and percentages
2.5.3 Comparing two proportions or percentages

2.6 Descriptive Techniques for quantitative variables

2.6.1 The five-number summary
2.6.2 Quantiles
2.6.3 The two-number summary
2.6.4 Other summary statistics of spread
2.6.5 Assessing symmetry
2.6.6 Investigating shape

2.7 Inferences about means

2.7.1 Checking normality
2.7.2 Inferences for a single mean
2.7.3 Comparing two means
2.7.4 Paired data

2.8 Inferential techniques for non-normal data

2.8.1 Transformations
2.8.2 Nonparametric tests
2.8.3 Confidence intervals for medians

2.9 Measuring agreement

2.9.1 Quantitative variables
2.9.2 Categorical variables
2.9.3 Ordered categorical variables
2.9.4 Internal consistency

2.10 Assessing diagnostic tests

2.10.1 Accounting for sensitivity and specificity

Exercises

3 Assessing risk factors

3.1 Risk and relative risk
3.2 Odds and odds ratio
3.3 Relative risk or odds ratio?
3.4 Prevalence studies
3.5 Testing association

3.5.1 Equivalent tests
3.5.2 One-sided tests
3.5.3 Continuity corrections
3.5.4 Fisher's exact test
3.5.5 Limitations of tests

3.6 Risk factors measured at several levels

3.6.1 Continuous risk factors
3.6.2 A test for linear trend
3.6.3 A test for nonlinearity

3.7 Attributable risk
3.8 Rate and relative rate

3.8.1 The general epidemiological rate

3.9 Measures of difference
3.10 EPITAB commands in Stata
Exercises

4 Confounding and interaction

4.1 Introduction
4.2 The concept of confounding
4.3 Identification of confounders

4.3.1 A strategy for selection

4.4 Assessing confounding

4.4.1 Using estimation
4.4.2 Using hypothesis tests
4.4.3 Dealing with several confounding variables

4.5 Standardisation

4.5.1 Direct standardisation of event rates
4.5.2 Indirect standardisation of event rates
4.5.3 Standardisation of risks

4.6 Mantel–Haenszel methods

4.6.1 The Mantel–Haenszel relative risk
4.6.2 The Cochran–Mantel–Haenszel test
4.6.3 Further comments

4.7 The concept of interaction
4.8 Testing for interaction

4.8.1 Using the relative risk
4.8.2 Using the odds ratio
4.8.3 Using the risk difference
4.8.4 Which type of interaction to use?
4.8.5 Which interactions to test?

4.9 Dealing with interaction
4.10 EPITAB commands in Stata
Exercises

5 Cohort studies

5.1 Design considerations

5.1.1 Advantages
5.1.2 Disadvantages
5.1.3 Alternative designs with economic advantages
5.1.4 Studies with a single baseline sample

5.2 Analytical considerations

5.2.1 Concurrent follow-up
5.2.2 Moving baseline dates
5.2.3 Varying follow-up durations
5.2.4 Withdrawals

5.3 Cohort life tables

5.3.1 Allowing for sampling variation
5.3.2 Allowing for censoring
5.3.3 Comparison of two life tables
5.3.4 Limitations

5.4 Kaplan-Meier estimation

5.4.1 An empirical comparison

5.5 Comparison of two sets of survival probabilities

5.5.1 Mantel–Haenszel methods
5.5.2 The log-rank test
5.5.3 Weighted log-rank tests
5.5.4 Allowing for confounding variables
5.5.5 Comparing three of more groups

5.6 Competing risk
5.7 The person-years method

5.7.1 Age-specific rates
5.7.2 Summarisation of rates
5.7.3 Comparison of two SERs
5.7.4 Mantel–Haenszel methods
5.7.5 Further comments

5.8 Period-cohort analysis

5.8.1 Period-specific rates

Exercises

6 Case–control studies

6.1 Basic design concepts

6.1.1 Advantages
6.1.2 Disadvantages

6.2 Basic methods of analysis

6.2.1 Dichotomous exposure
6.2.2 Polytomous exposure
6.2.3 Confounding and interaction
6.2.4 Attributable risk

6.3 Selection of cases

6.3.1 Definition
6.3.2 Inclusion and exclusion criteria
6.3.3 Incident or prevalent?
6.3.4 Source
6.3.5 Consideration of bias

6.4 Selection of controls

6.4.1 General principles
6.4.2 Hospital controls
6.4.3 Community controls
6.4.4 Other sources
6.4.5 How many?

6.5 Matching

6.5.1 Advantages
6.5.2 Disadvantages
6.5.3 One-to-many matching
6.5.4 Matching in other study designs

6.6 The analysis of matched studies

6.6.1 1 : 1 Matching
6.6.2 1 : c Matching
6.6.3 1 : Variable matching
6.6.4 Many : many matching
6.6.5 A modelling approach

6.7 Nested case–control studies

6.7.1 Matched studies
6.7.2 Counter-matched studies

6.8 Case-cohort studies
6.9 Case-crossover studies
Exercises

7 Intervention studies

7.1 Introduction

7.1.1 Advantages
7.1.2 Disadvantages

7.2 Ethical considerations

7.2.1 The protocol

7.3 Avoidance of bias

7.3.1 Use of a control group
7.3.2 Blindness
7.3.3 Randomisation
7.3.4 Consent before randomisation
7.3.5 Analysis by intention-to-treat

7.4 Parallel group studies

7.4.1 Number needed to treat
7.4.2 Cluster randomised trials
7.4.3 Stepped wedge trials
7.4.4 Non-inferiority trials

7.5 Cross-over studies

7.5.1 Graphical analysis
7.5.2 Comparing means
7.5.3 Analysing preferences
7.5.4 Analysing binary data

7.6 Sequential studies

7.6.1 The Haybittle-Peto stopping rule
7.6.2 Adaptive designs

7.7 Allocation to treatment group

7.7.1 Global randomisation
7.7.2 Stratified randomization
7.7.3 Implementation

7.8 Trials as cohorts
Exercises

8 Sample size determination

8.1 Introduction
8.2 Power

8.2.1 Choice of alternative hypothesis

8.3 Testing a mean value

8.3.1 Common choices for power and significance level
8.3.2 Using a table of sample sizes
8.3.3 The minimum detectable difference
8.3.4 The assumption of known standard deviation

8.4 Testing a difference between means

8.4.1 Using a table of sample sizes
8.4.2 Power and minimum detectable difference
8.4.3 Optimum distribution of the sample
8.4.4 Paired data

8.5 Testing a proportion

8.5.1 Using a table of sample sizes

8.6 Testing a relative risk

8.6.1 Using a table of sample sizes
8.6.2 Power and minimum detectable relative risk

8.7 Case–control studies

8.7.1 Using a table of sample sizes
8.7.2 Power and minimum detectable relative risk
8.7.3 Comparison with cohort studies
8.7.4 Matched studies

8.8 Complex sampling designs
8.9 Concluding remarks
Exercises

9 Modelling quantitative outcome variables

9.1 Statistical models
9.2 One categorical explanatory variable

9.2.1 The hypotheses to be tested
9.2.2 Construction of the ANOVA table
9.2.3 How the ANOVA table is used
9.2.4 Estimation of group means
9.2.5 Comparison of group means
9.2.6 Fitted values
9.2.7 Using computer packages

9.3 One quantitative explanatory variable

9.3.1 Simple linear regression
9.3.2 Correlation
9.3.3 Nonlinear regression

9.4 Two categorical explanatory variables

9.4.1 Model specification
9.4.2 Model fitting
9.4.3 Balanced data
9.4.4 Unbalanced data
9.4.5 Fitted values
9.4.6 Least squares means
9.4.7 Interaction

9.5 Model building
9.6 General linear models
9.7 Several explanatory variables

9.7.1 Information criteria
9.7.2 Boosted regression

9.8 Model checking
9.9 Confounding

9.9.1 Adjustment using residuals

9.10 Splines

9.10.1 Choice of knots
9.10.2 Other types of splines

9.11 Panel data
9.12 Non-normal alternatives
Exercises

10 Modelling binary outcome data

10.1 Introduction
10.2 Problems with standard regression models

10.2.1 The r-x relationship may well not be linear
10.2.2 Predicted values of the risk may be outside the valid range
10.2.3 The error distribution is not normal

10.3 Logistic regression
10.4 Interpretation of logistic regression coefficients

10.4.1 Binary risk factors
10.4.2 Quantitative risk factors
10.4.3 Categorical risk factors
10.4.4 Ordinal risk factors
10.4.5 Floating absolute risks

10.5 Generic data
10.6 Multiple logistic regression models
10.7 Tests of hypotheses

10.7.1 Goodness of fit for grouped data
10.7.2 Goodness of fit for generic data
10.7.3 Effect of a risk factor
10.7.4 Information criteria
10.7.5 Tests for linearity and nonlinearity
10.7.6 Tests based upon estimates and their standard errors
10.7.7 Problems with missing values

10.8 Confounding
10.9 Interaction

10.9.1 Between two categorical variables
10.9.2 Between a quantitative and categorical variable
10.9.3 Between two quantitative variables

10.10 Dealing with a quantitative explanatory variable

10.10.1 Linear form
10.10.2 Categorical form
10.10.3 Linear spline form
10.10.4 Generalisations

10.11 Model checking

10.11.1 Residuals
10.11.2 Influential observations

10.12 Measurement error

10.12.1 Regression to the mean
10.12.2 Correcting for regression dilution

10.13 Case–control studies

10.13.1 Unmatched studies
10.13.2 Matched studies

10.14 Outcomes with several levels

10.14.1 The proportional odds assumption
10.14.2 The proportional odds model
10.14.3 Multinomial regression

10.15 Longitudinal data
10.16 Binomial regression

10.16.1 Adjusted risks
10.16.2 Risk differences
10.16.3 Problems with binomial models

10.17 Propensity scoring

10.17.1 Pair-matched propensity scores
10.17.2 Stratified propensity scores
10.17.3 Weighting by the inverse propensity score
10.17.4 Adjusting for the propensity score
10.17.5 Deriving the propensity score
10.17.6 Propensity score outliers
10.17.7 Conduct of the matched design
10.17.8 Analysis of the matched design
10.17.9 Case studies
10.17.10 Interpretation of effects
10.17.11 Problems with estimating uncertainty
10.17.12 Propensity scores in practice

Exercises

11 Modelling follow-up data

11.1 Introduction

11.1.1 Models for survival data

11.2 Basic functions of survival time

11.2.1 The survival function
11.2.2 The hazard function

11.3 Estimating the hazard function

11.3.1 Kaplan–Meier estimation
11.3.2 Person-time estimation
11.3.3 Actuarial estimation
11.3.4 The cumulative hazard

11.4 Probability models

11.4.1 The probability density and cumulative distribution functions
11.4.2 Choosing a model
11.4.3 The exponential distribution
11.4.4 The Weibull distribution
11.4.5 Other probability models

11.5 Proportional hazards regression models

11.5.1 Comparing two groups
11.5.2 Comparing several groups
11.5.3 Modelling with a quantitative variable
11.5.4 Modelling with several variables
11.5.5 Left-censoring

11.6 The Cox proportional hazards model

11.6.1 Time-dependent covariates
11.6.2 Recurrent events

11.7 The Weibull proportional hazards model
11.8 Model checking

11.8.1 Log cumulative hazard plots
11.8.2 An objective test of proportional hazards for the Cox model
11.8.3 An objective test of proportional hazards for the Weibull model
11.8.4 Residuals and influence
11.8.5 Nonproportional hazards

11.9 Competing risk

11.9.1 Joint modeling of longitudinal and survival data

11.10 Poisson regression

11.10.1 Simple regression
11.10.2 Multiple regression
11.10.3 Comparison of standardised event ratios
11.10.4 Routine or registration data
11.10.5 Generic data
11.10.6 Model checking

11.11 Pooled logistic regression
Exercises

12 Meta-analysis

12.1 Reviewing evidence

12.1.1 The Cochrane collaboration

12.2 Systematic review

12.2.1 Designing a systematic review
12.2.2 Study quality

12.3 A General approach to pooling

12.3.1 Inverse variance weighting
12.3.2 Fixed effect and random effects
12.3.3 Quantifying heterogeneity
12.3.4 Estimating the between-study variance
12.3.5 Calculating inverse variance weights
12.3.6 Calculating standard errors from confidence intervals
12.3.7 Case studies
12.3.8 Pooling risk differences
12.3.9 Pooling differences in mean values
12.3.10 Other quantities
12.3.11 Pooling mixed quantities
12.3.12 Dose-response meta-analysis

12.4 Investigating heterogeneity

12.4.1 Forest plots
12.4.2 Influence plots
12.4.3 Sensitivity analyses
12.4.4 Meta-regression

12.5 Pooling tabular data

12.5.1 Inverse variance weighting
12.5.2 Mantel–Haenszel methods
12.5.3 The Peto method
12.5.4 Dealing with zeros
12.5.5 Advantages and disadvantages of using tabular data

12.6 Individual participant data
12.7 Dealing with aspects of study quality
12.8 Publication bias

12.8.1 The funnel plot
12.8.2 Consequences of publication bias
12.8.3 Correcting for publication bias
12.8.4 Other causes of asymmetry in funnel plots

12.9 Advantages and limitations of meta-analysis
Exercises

13 Risk scores And clinical decision rules

13.1 Introduction

13.1.1 Individual and population level interventions
13.1.2 Scope of this chapter

13.2 Association and prognosis

13.2.1 The concept of discrimination
13.2.2 Risk factor thresholds
13.2.3 Risk thresholds
13.2.4 Odds ratios and discrimination

13.3 Risk scores from statistical models

13.3.1 Logistic regression
13.3.2 Multiple variable risk scores
13.3.3 Cox regression
13.3.4 Risk thresholds
13.3.5 Multiple thresholds

13.4 Quantifying discrimination

13.4.1 The area under the curve
13.4.2 Comparing AUCs
13.4.3 Survival data
13.4.4 The standardised mean effect size
13.4.5 Other measures of discrimination

13.5 Calibration

13.5.1 Overall calibration
13.5.2 Mean calibration
13.5.3 Grouped calibration
13.5.4 Calibration plots

13.6 Recalibration

13.6.1 Recalibration of the mean
13.6.2 Recalibration of scores in a fixed cohort
13.6.3 Recalibration of parameters from a Cox model
13.6.4 Recalibration and discrimination

13.7 The accuracy of predictions

13.7.1 The Brier score
13.7.2 Comparison of Brier scores

13.8 Assessing an extraneous prognostic variable
13.9 Reclassification

13.9.1 The integrated discrimination improvement from a fixed cohort
13.9.2 The net reclassification improvement from a fixed cohort
13.9.3 The integrated discrimination improvement from a variable cohort
13.9.4 The net reclassification improvement from a variable cohort
13.9.5 Software

13.10 Validation
13.11 Presentation of risk scores

13.11.1 Point scoring

13.12 Impact Studies
Exercises

14 Computer-intensive methods

14.1 Rationale
14.2 The bootstrap

14.2.1 Bootstrap distributions

14.3 Bootstrap confidence intervals

14.3.1 Bootstrap normal intervals
14.3.2 Bootstrap percentile intervals
14.3.3 Bootstrap bias-corrected intervals
14.3.4 Bootstrap bias-corrected and accelerated intervals
14.3.5 Overview of the worked example
14.3.6 Choice of bootstrap interval

14.4 Practical issues when bootstrapping

14.4.1 Software
14.4.2 How many replications should be used?
14.4.3 Sensible strategies

14.5 Further examples of bootstrapping

14.5.1 Complex bootstrap samples

14.6 Bootstrap hypothesis testing
14.7 Limitations of bootstrapping
14.8 Permutation tests

14.8.1 Monte Carlo permutation tests
14.8.2 Limitations

14.9 Missing values

14.9.1 Dealing with missing values
14.9.2 Types of missingness
14.9.3 Complete case analyses

14.10 Naive imputation methods

14.10.1 Mean imputation
14.10.2 Conditional mean and regression imputation
14.10.3 Hot deck imputation and predictive mean matching
14.10.4 Longitudinal data

14.11 Univariate multiple imputation

14.11.1 Multiple imputation by regression
14.11.2 The three-step process in MI
14.11.3 Imputer's and analyst's models
14.11.4 Rubin's equations
14.11.5 Imputation diagnostics
14.11.6 Skewed continuous data
14.11.7 Other types of variables
14.11.8 How many imputations?

14.12 Multivariate multiple imputation

14.12.1 Monotone imputation
14.12.2 Data augmentation
14.12.3 Categorical variables
14.12.4 What to do when DA fails
14.12.5 Chained equations
14.12.6 Longitudinal data

14.13 When is it worth imputing?
Exercises

Appendix A Materials available on the website for this book

Appendix B Statistical tables

Appendix C Additional datasets for exercises

References

Index

Epidemiology: Study Design and Data Analysis, Third Edition

Comment from the Stata technical group

Table of contents

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies

Stata/MP4 Annual License (download)

Epidemiology: Study Design and Data Analysis, Third Edition

Comment from the Stata technical group

Table of contents

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies