Statistical Modeling for Biomedical Researchers: A Simple Introduction to the Analysis of Complex Data, Second Edition |
||||||||||||||||||||||||||||||||||||||
Click to enlarge |
As an Amazon Associate, StataCorp earns a small referral credit from
qualifying purchases made from affiliate links on our site.
eBook not available for this title
eBook not available for this title |
Review of the first edition from the Stata Journal
|
||||||||||||||||||||||||||||||||||||
Comment from the Stata technical groupWilliam Dupont’s Statistical Modeling for Biomedical Researchers, Second Edition is ideal for a one-semester graduate course in biostatistics and epidemiology. Dupont assumes only a basic knowledge of statistics, such as that obtained from a standard introductory statistics course. Stata is used extensively throughout the text, making it possible to introduce computationally complex methods with little or no higher-level mathematics. As a result, Dupont focuses on concepts and model assumptions, rather than on the underlying mathematics. The text covers linear regression, logistic regression, Poisson regression, survival analysis, and analysis of variance. Two chapters are devoted to each topic: an introductory chapter that uses simple data to develop the concept and a more advanced chapter devoted to explaining more complex models, case studies, diagnostic measures, etc. Dupont pays equal attention to the methods and to using Stata to apply them. When Stata output is displayed, the most important elements of the output are highlighted and explained in notes that follow the output. These notes help the reader make sense of the output by providing the appropriate focus for the problem at hand. The notes also include instructions for reproducing the analysis via Stata’s point-and-click user interface. The text, replete with examples featuring real medical data, uses Stata graphics extensively, providing ample explanation and detail for reproduction. |
||||||||||||||||||||||||||||||||||||||
Table of contentsView table of contents >> 1 Introduction
1.1 Algebraic notation
1.2 Descriptive statistics
1.2.1 Dot plot
1.3 The Stata Statistical Software Package 1.2.2 Sample mean 1.2.3 Residual 1.2.4 Sample variance 1.2.5 Sample standard deviation 1.2.6 Percentile and median 1.2.7 Box plot 1.2.8 Histogram 1.2.9 Scatter plot
1.3.1 Downloading data from my website
1.4 Inferential statistics 1.3.2 Creating histograms with Stata 1.3.3 Stata command syntax 1.3.4 Obtaining interactive help from Stata 1.3.5 Stata log files 1.3.6 Stata graphics and schemes 1.3.7 Stata do files 1.3.8 Stata pulldown menus 1.3.9 Displaying other descriptive statistics with Stata
1.4.1 Probability density function
1.5 Overview of methods discussed in this text 1.4.2 Mean, variance, and standard deviation 1.4.3 Normal distribution 1.4.4 Expected value 1.4.5 Standard error 1.4.6 Null hypothesis, alternative hypothesis, and P-value 1.4.7 95% confidence interval 1.4.8 Statistical power 1.4.9 The z and Student’s t distributions 1.4.10 Paired t test 1.4.11 Performing paired t tests with Stata 1.4.12 Independent t test using a pooled standard error estimate 1.4.13 Independent t test using separate standard error estimates 1.4.14 Independent t tests using Stata 1.4.15 The chi-squared distribution
1.5.1 Models with one response per patient
1.6 Additional reading 1.5.2 Models with multiple responses per patient 1.7 Exercises 2 Simple linear regression
2.1 Sample covariance
2.2 Sample correlation coefficient 2.3 Population covariance and correlation coefficient 2.4 Conditional expectation 2.5 Simple linear regression model 2.6 Fitting the linear regression model 2.7 Historical trivia: origin of the term regression 2.8 Determining the accuracy of linear regression estimates 2.9 Ethylene glycol poisoning example 2.10 95% confidence interval for y[x] = α + βx evaluated at x 2.11 95% prediction interval for the response of a new patient 2.12 Simple linear regression with Stata 2.13 Lowess regression 2.14 Plotting a lowess regression curve in Stata 2.15 Residual analyses 2.16 Studentized residual analysis using Stata 2.17 Transforming the x and y variables
2.17.1 Stabilizing the variance
2.18 Analyzing transformed data with Stata 2.17.2 Correcting for non-linearity 2.17.3 Example: research funding and morbidity for 29 diseases 2.19 Testing the equality of regression slopes
2.19.1 Example: the Framingham Heart Study
2.20 Comparing slope estimates with Stata 2.21 Density-distribution sunflower plots 2.22 Creating density-distribution sunflower plots with Stata 2.23 Additional reading 2.24 Exercises 3 Multiple linear regression
3.1 The model
3.2 Confounding variables 3.3 Estimating the parameters for a multiple linear regression model 3.4 R2 statistic for multiple regression models 3.5 Expected response in the multiple regression model 3.6 The accuracy of multiple regression parameter estimates 3.7 Hypothesis tests 3.8 Leverage 3.9 95% confidence interval for ŷi 3.10 95% prediction intervals 3.11 Example: the Framingham Heart Study
3.11.1 Preliminary univariate analyses
3.12 Scatter plot matrix graphs
3.12.1 Producing scatter plot matrix graphs with Stata
3.13 Modeling interaction in multiple linear regression
3.13.1 The Framingham example
3.14 Multiple regression modeling of the Framingham data 3.15 Intuitive understanding of a multiple regression model
3.15.1 The Framingham example
3.16 Calculating 95% confidence and prediction intervals 3.17 Multiple linear regression with Stata 3.18 Automatic methods of model selection
3.18.1 Forward selection using Stata
3.19 Collinearity 3.18.2 Backward selection 3.18.3 Forward stepwise selection 3.18.4 Backward stepwise selection 3.18.5 Pros and cons of automated model selection 3.20 Residual analyses 3.21 Influence
3.21.1 Δβ_hat influence statistic
3.22 Residual and influence analyses using Stata 3.21.2 Cook’s distance 3.21.3 The Framingham example 3.23 Using multiple linear regression for non-linear models 3.24 Building non-linear models with restricted cubic splines
3.24.1 Choosing the knots for a restricted cubic spline model
3.25 The SUPPORT Study of hospitalized patients
3.25.1 Modeling length-of-stay and MAP using restricted cubic splines
3.26 Additional reading 3.25.2 Using Stata for non-linear models with restricted cubic splines 3.27 Exercises 4 Simple logistic regression
4.1 Example: APACHE score and mortality in patients with sepsis
4.2 Sigmoidal family of logistic regression curves 4.3 The log odds of death given a logistic probability function 4.4 The binomial distribution 4.5 Simple logistic regression model 4.6 Generalized linear model 4.7 Contrast between logistic and linear regression 4.8 Maximum likelihood estimation
4.8.1 Variance of maximum likelihood parameter estimates
4.9 Statistical tests and confidence intervals
4.9.1 Likelihood ratio tests
4.10 Sepsis example 4.9.2 Quadratic approximations to the log likelihood ratio function 4.9.3 Score tests 4.9.4 Wald tests and confidence intervals 4.9.5 Which test should you use? 4.11 Logistic regression with Stata 4.12 Odds ratios and the logistic regression model 4.13 95% confidence interval for the odds ratio associated with a unit increase in x
4.13.1 Calculating this odds ratio with Stata
4.14 Logistic regression with grouped response data 4.15 95% confidence interval for π[x] 4.16 Exact 100(1 − α)% confidence intervals for proportions 4.17 Example: the Ibuprofen in Sepsis Study 4.18 Logistic regression with grouped data using Stata 4.19 Simple 2 × 2 case–control studies
4.19.1 Example: the Ille-et-Vilaine study of esophageal cancer and alcohol
4.20 Logistic regression models for 2 × 2 contingency tables 4.19.2 Review of classical case–control theory 4.19.3 95% confidence interval for the odds ratio: Woolf’s method 4.19.4 Test of the null hypothesis that the odds ratio equals one 4.19.5 Test of the null hypothesis that two proportions are equal
4.20.1 Nuisance parameters
4.21 Creating a Stata data file 4.20.2 95% confidence interval for the odds ratio: logistic regression 4.22 Analyzing case–control data with Stata 4.23 Regressing disease against exposure 4.24 Additional reading 4.25 Exercises 5 Multiple logistic regression
5.1 Mantel–Haenszel estimate of an age-adjusted odds ratio
5.2 Mantel–Haenszel χ2 statistic for multiple 2 × 2 tables 5.3 95% confidence interval for the age-adjusted odds ratio 5.4 Breslow–Day–Tarone test for homogeneity 5.5 Calculating the Mantel–Haenszel odds ratio using Stata 5.6 Multiple logistic regression model
5.6.1 Likelihood ratio test of the influence of the covariates on the response variable
5.7 95% confidence interval for an adjusted odds ratio 5.8 Logistic regression for multiple 2 × 2 contingency tables 5.9 Analyzing multiple 2 × 2 tables with Stata 5.10 Handling categorical variables in Stata 5.11 Effect of dose of alcohol on esophageal cancer risk
5.11.1 Analyzing model (5.25) with Stata
5.12 Effect of dose of tobacco on esophageal cancer risk 5.13 Deriving odds ratios from multiple parameters 5.14 The standard error of a weighted sum of regression coefficients 5.15 Confidence intervals for weighted sums of coefficients 5.16 Hypothesis tests for weighted sums of coefficients 5.17 The estimated variance–covariance matrix 5.18 Multiplicative models of two risk factors 5.19 Multiplicative model of smoking, alcohol, and esophageal cancer 5.20 Fitting a multiplicative model with Stata 5.21 Model of two risk factors with interaction 5.22 Model of alcohol, tobacco, and esophageal cancer with interaction terms 5.23 Fitting a model with interaction using Stata 5.24 Model fitting: nested models and model deviance 5.25 Effect modifiers and confounding variables 5.26 Goodness-of-fit tests
5.26.1 The Pearson χ2 goodness-of-fit statistic
5.27 Hosmer–Lemeshow goodness-of-fit test
5.27.1 An example: the Ille-et-Vilaine cancer data set
5.28 Residual and influence analysis
5.28.1 Standardized Pearson residual
5.29 Using Stata for goodness-of-fit tests and residual analyses 5.28.2 Δβ_hatj influence statistic 5.28.3 Residual plots of the Ille-et-Vilaine data on esophageal cancer 5.30 Frequency matched case–control studies 5.31 Conditional logistic regression 5.32 Analyzing data with missing values
5.32.1 Imputing data that is missing at random
5.33 Logistic regression using restricted cubic splines 5.32.2 Cardiac output in the Ibuprofen in Sepsis Study 5.32.3 Modeling missing values with Stata
5.33.1 Odds ratios from restricted cubic spline models
5.34 Modeling hospital mortality in the SUPPORT Study 5.33.2 95% confidence intervals for ψ_hat[x] 5.35 Using Stata for logistic regression with restricted cubic splines 5.36 Regression methods with a categorical response variable
5.36.1 Proportional odds logistic regression
5.37 Additional reading 5.36.2 Polytomous logistic regression 5.38 Exercises 6 Introduction to survival analysis
6.1 Survival and cumulative mortality functions
6.2 Right censored data 6.3 Kaplan–Meier survival curves 6.4 An example: genetic risk of recurrent intracerebral hemorrhage 6.5 95% confidence intervals for survival functions 6.6 Cumulative mortality function 6.7 Censoring and bias 6.8 Log-rank test 6.9 Using Stata to derive survival functions and the log-rank test 6.10 Log-rank test for multiple patient groups 6.11 Hazard functions 6.12 Proportional hazards 6.13 Relative risks and hazard ratios 6.14 Proportional hazards regression analysis 6.15 Hazard regression analysis of the intracerebral hemorrhage data 6.16 Proportional hazards regression analysis with Stata 6.17 Tied failure times 6.18 Additional reading 6.19 Exercises 7 Hazard regression analysis
7.1 Proportional hazards model
7.2 Relative risks and hazard ratios 7.3 95% confidence intervals and hypothesis tests 7.4 Nested models and model deviance 7.5 An example: the Framingham Heart Study
7.5.1 Kaplan–Meier survival curves for DBP
7.6 Proportional hazards regression analysis using Stata 7.5.2 Simple hazard regression model for CHD risk and DBP 7.5.3 Restricted cubic spline model of CHD risk and DBP 7.5.4 Categorical hazard regression model of CHD risk and DBP 7.5.5 Simple hazard regression model of CHD risk and gender 7.5.6 Multiplicative model of DBP and gender on risk of CHD 7.5.7 Using interaction terms to model the effects of gender and DBP on CHD 7.5.8 Adjusting for confounding variables 7.5.9 Interpretation 7.5.10 Alternative models 7.7 Stratified proportional hazards models 7.8 Survival analysis with ragged study entry
7.8.1 Kaplan–Meier survival curve and the log-rank test with ragged entry
7.9 Predicted survival, log–log plots, and the proportional hazards assumption 7.8.2 Age, sex, and CHD in the Framingham Heart Study 7.8.3 Proportional hazards regression analysis with ragged entry 7.8.4 Survival analysis with ragged entry using Stata
7.9.1 Evaluating the proportional hazards assumption with Stata
7.10 Hazard regression models with time-dependent covariates
7.10.1 Testing the proportional hazards assumption
7.11 Additional reading 7.10.2 Modeling time-dependent covariates with Stata 7.12 Exercises 8 Introduction to Poisson regression: inferences
on morbidity and mortality rates
8.1 Elementary statistics involving rates
8.2 Calculating relative risks from incidence data using Stata 8.3 The binomial and Poisson distributions 8.4 Simple Poisson regression for 2 × 2 tables 8.5 Poisson regression and the generalized linear model 8.6 Contrast between Poisson, logistic, and linear regression 8.7 Simple Poisson regression with Stata 8.8 Poisson regression and survival analysis
8.8.1 Recoding survival data on patients as patient–year data
8.9 Converting the Framingham survival data set to person–time data 8.8.2 Converting survival records to person–years of follow-up using Stata 8.10 Simple Poisson regression with multiple data records 8.11 Poisson regression with a classification variable 8.12 Applying simple Poisson regression to the Framingham data 8.13 Additional reading 8.14 Exercises 9 Multiple Poisson regression
9.1 Multiple Poisson regression model
9.2 An example: the Framingham Heart Study
9.2.1 A multiplicative model of gender, age, and coronary heart disease
9.3 Using Stata to perform Poisson regression 9.2.2 A model of age, gender, and CHD with interaction terms 9.2.3 Adding confounding variables to the model 9.4 Residual analyses for Poisson regression models
9.4.1 Deviance residuals
9.5 Residual analysis of Poisson regression models using Stata 9.6 Additional reading 9.7 Exercises 10 Fixed effects analysis of variance
10.1 One-way analysis of variance
10.2 Multiple comparisons 10.3 Reformulating analysis of variance as a linear regression model 10.4 Non-parametric methods 10.5 Kruskal–Wallis test 10.6 Example: a polymorphism in the estrogen receptor gene 10.7 User contributed software in Stata 10.8 One-way analyses of variance using Stata 10.9 Two-way analysis of variance, analysis of covariance, and other models 10.10 Additional reading 10.11 Exercises 11 Repeated-measures analysis of variance
11.1 Example: effect of race and dose of isoproterenol on blood flow
11.2 Exploratory analysis of repeated measures data using Stata 11.3 Response feature analysis 11.4 Example: the isoproterenol data set 11.5 Response feature analysis using Stata 11.6 The area-under-the-curve response feature 11.7 Generalized estimating equations 11.8 Common correlation structures 11.9 GEE analysis and the Huber–White sandwich estimator 11.10 Example: analyzing the isoproterenol data with GEE 11.11 Using Stata to analyze the isoproterenol data set using GEE 11.12 GEE analyses with logistic or Poisson models 11.13 Additional reading 11.14 Exercises Appendices
A Summary of statistical models discussed in this text
A.1 Models for continuous response variables with one response per
patient
A.2 Models for dichotomous or categorical response variables with one response per patient A.3 Models for survival data (follow-up time plus fate at exit observed on each patient) A.4 Models for response variables that are event rates or the number of events during a specified number of patient–years of follow-up. The event must be rare A.5 Models with multiple observations per patient or matched or clustered patients B Summary of Stata commands used in this text
B.1 Data manipulation and description
B.2 Analysis commands B.3 Graph commands B.4 Common options for graph commands (insert after comma) B.5 Post-estimation commands (affected by preceding regression-type command) B.6 Command prefixes B.7 Command qualifiers (insert before comma) B.8 Logical and relational operators and system variables (see Stata User’s Guide) B.9 Functions (see Stata Data Management Manual) References
Index
|
Learn
Free webinars
NetCourses
Classroom and web training
Organizational training
Video tutorials
Third-party courses
Web resources
Teaching with Stata
© Copyright 1996–2025 StataCorp LLC. All rights reserved.
×
We use cookies to ensure that we give you the best experience on our website—to enhance site navigation, to analyze usage, and to assist in our marketing efforts. By continuing to use our site, you consent to the storing of cookies on your device and agree to delivery of content, including web fonts and JavaScript, from third party web services.
Cookie Settings
Last updated: 16 November 2022
StataCorp LLC (StataCorp) strives to provide our users with exceptional products and services. To do so, we must collect personal information from you. This information is necessary to conduct business with our existing and potential customers. We collect and use this information only where we may legally do so. This policy explains what personal information we collect, how we use it, and what rights you have to that information.
These cookies are essential for our website to function and do not store any personally identifiable information. These cookies cannot be disabled.
This website uses cookies to provide you with a better user experience. A cookie is a small piece of data our website stores on a site visitor's hard drive and accesses each time you visit so we can improve your access to our site, better understand how you use our site, and serve you content that may be of interest to you. For instance, we store a cookie when you log in to our shopping cart so that we can maintain your shopping cart should you not complete checkout. These cookies do not directly store your personal information, but they do support the ability to uniquely identify your internet browser and device.
Please note: Clearing your browser cookies at any time will undo preferences saved here. The option selected here will apply only to the device you are currently using.