Multivariable Model-Building: A Pragmatic Approach to Regression Analysis Based on Fractional Polynomials for Modelling Continuous Variables |
||||||||||||||||||||||||||||||||||||||
Click to enlarge See the back cover |
As an Amazon Associate, StataCorp earns a small referral credit from
qualifying purchases made from affiliate links on our site.
eBook not available for this title
eBook not available for this title |
Review of this book from the Stata Journal
|
||||||||||||||||||||||||||||||||||||
Comment from the Stata technical groupSelecting the appropriate model from among a large class of candidate models is a difficult process: one must balance the (sometimes contradictory) goals of model interpretability, parsimony, good prediction properties, robustness to minor variations in the data, and applicability to other data. This text presents a well-rounded, practical approach to model selection, with its bulk devoted to general variable selection through the use of stepwise procedures (or otherwise) and the selection of functional forms for continuous variables. Regarding the selection of functional forms, the authors pay much attention to fractional polynomials and splines, drawing on their vast research in these areas. In particular, those looking for a tutorial on the use of fractional polynomials will find this text very useful. The methods prescribed can be applied widely, yet the examples used are primarily from the health sciences, with the typically used models being logistic regression, Cox regression, and generalized linear models. |
||||||||||||||||||||||||||||||||||||||
Table of contentsView table of contents >> Preface
1 Introduction
1.1 Real-Life Problems as Motivation for Model Building
1.1.1 Many Candidate Models
1.2 Issues in Modelling Continuous Predictors 1.1.2 Functional Form for Continuous Predictors 1.1.3 Example 1: Continuous Response 1.1.4 Example 2: Multivariable Model for Survival Data
1.2.1 Effects of Assumptions
1.3 Types of Regression Model Considered 1.2.2 Global versus Local Influence Models 1.2.3 Disadvantages of Fractional Polynomial Modelling 1.2.4 Controlling Model Complexity
1.3.1 Normal-Errors Regression
1.4 Role of Residuals 1.3.2 Logistic Regression 1.3.3 Cox Regression 1.3.4 Generalized Linear Models 1.3.5 Linear and Additive Predictors
1.4.1 Uses of Residuals
1.5 Role of Subject-Matter Knowledge in Model Development 1.4.2 Graphical Analysis of Residuals 1.6 Scope of Model Building in our Book 1.7 Modelling Preferences
1.7.1 General Issues
1.8 General Notation 1.7.2 Criteria for a Good Model 1.7.3 Personal Preferences 2 Selection of Variables
2.1 Introduction
2.2 Background 2.3 Preliminaries for a Multivariable Analysis 2.4 Aims of Multivariable Models 2.5 Prediction: Summary Statistics and Comparisons 2.6 Procedures for Selecting Variables
2.6.1 Strength of Predictors
2.7 Comparison of Selection Strategies in Examples 2.6.2 Stepwise Procedures 2.6.3 All-Subsets Model Selection Using Information Criteria 2.6.4 Further Considerations
2.7.1 Myeloma Study
2.8 Selection and Shrinkage 2.7.2 Educational Body-Fat Data 2.7.3 Glioma Study
2.8.1 Selection Bias
2.9 Discussion 2.8.2 Simulation Study 2.8.3 Shrinkage to Correct for Selection Bias 2.8.4 Post-estimation Shrinkage 2.8.5 Reducing Selection Bias 2.8.6 Example
2.9.1 Model Building in Small Datasets
2.9.2 Full, Pre-specified or Selected Model? 2.9.3 Comparison of Selection Procedures 2.9.4 Complexity, Stability and Interpretability 2.9.5 Conclusions and Outlook Handling Categorical and Continuous Predictors
3.1 Introduction
3.2 Types of Predictor
3.2.1 Binary
3.3 Handling Ordinal Predictors 3.2.2 Nominal 3.2.3 Ordinal, Counting, Continuous 3.2.4 Derived
3.3.1 Coding Schemes
3.4 Handling Counting and Continuous Predictors: Categorization 3.3.2 Effect of Coding Schemes on Variable Selection
3.4.1 ‘Optimal’ Cutpoints: A Dangerous Analysis
3.5 Example: Issues in Model Building with Categorized Variables 3.4.2 Other Ways of Choosing a Cutpoint
3.5.1 One Ordinal Variable
3.6 Handling Counting and Continuous Predictors: Functional Form 3.5.2 Several Ordinal Variables
3.6.1 Beyond Linearity
3.7 Empirical Curve Fitting 3.6.2 Does Nonlinearity Matter? 3.6.3 Simple versus Complex Functions 3.6.4 Interpretability and Transportability
3.7.1 General Approaches to Smoothing
3.8 Discussion 3.7.2 Critique of Local and Global Influence Models
3.8.1 Sparse Categories
3.8.2 Choice of Coding Scheme 3.8.3 Categorizing Continuous Variables 3.8.4 Handling Continuous Variables 4 Fractional Polynomials for One Variable
4.1 Introduction
4.2 Background
4.2.1 Genesis
4.3 Definition and Notation 4.2.2 Types of Model 4.2.3 Relation to Box–Tidwell and Exponential Functions
4.3.1 Fractional Polynomials
4.4 Characteristics 4.3.2 First Derivative
4.4.1 FP1 and FP2 Functions
4.5 Examples of Curve Shapes with FP1 and FP2 Functions 4.4.2 Maximum or Minimum of a FP2 Function 4.6 Choice of Powers 4.7 Choice of Origin 4.8 Model Fitting and Estimation 4.9 Inference
4.9.1 Hypothesis Testing
4.10 Function Selection Procedure 4.9.2 Interval Estimation
4.10.1 Choice of Default Function
4.11 Scaling and Centering 4.10.2 Closed Test Procedure for Function Selection 4.10.3 Example 4.10.4 Sequential Procedure 4.10.5 Type I Error and Power of the Function Selection Procedure
4.11.1 Computational Aspects
4.12 FP Powers as Approximations to Continuous Powers 4.11.2 Examples
4.12.1 Box–Tidwell and Fractional Polynomial Models
4.13 Presentation of Fractional Polynomial Functions 4.12.2 Example
4.13.1 Graphical
4.14 Worked Example 4.13.2 Tabular
4.14.1 Details of all Fractional Polynomial Models
4.15 Modelling Covariates with a Spike at Zero 4.14.2 Function Selection 4.14.3 Details of the Fitted Model 4.14.4 Standard Error of a Fitted Value 4.14.5 Fitted Odds Ratio and its Confidence Interval 4.16 Power of Fractional Polynomial Analysis
4.16.1 Underlying Function Linear
4.17 Discussion
4.16.2 Underlying Function FP1 or FP2 4.16.3 Comment 5 Some Issues with Univariate Fractional Polynomial Models
5.1 Introduction
5.2 Susceptibility to Influential Covariate Observations 5.3 A Diagnostic Plot for Influential Points in FP Models
5.3.1 Example 1: Educational Body-Fat Data
5.4 Dependence on Choice of Origin 5.3.2 Example 2: Primary Biliary Cirrhosis Data 5.5 Improving Robustness by Preliminary Transformation
5.5.1 Example 1: Educational Body-Fat Data
5.6 Improving Fit by Preliminary Transformation 5.5.2 Example 2: PBC Data 5.5.3 Practical Use of the Pre-transformation gδ(x)
5.6.1 Lack of Fit of Fractional Polynomial Models
5.7 Higher Order Fractional Polynomials 5.6.2 Negative Exponential Pre-transformation
5.7.1 Example 1: Nerve Conduction Data
5.8 When Fractional Polynomial Models are Unsuitable 5.7.2 Example 2: Triceps Skinfold Thickness
5.8.1 Not all Curves are Fractional Polynomials
5.9 Discussion 5.8.2 Example: Kidney Cancer 6 MFP: Multivariable Model-building with Fractional Polynomials
6.1 Introduction
6.2 Motivation 6.3 The MFP Algorithm
6.3.1 Remarks
6.4 Presenting the Model 6.3.2 Example
6.4.1 Parameter Estimates
6.5 Model Criticism 6.4.2 Function Plots 6.4.3 Effect Estimates
6.5.1 Function Plots
6.6 Further Topics 6.5.2 Graphical Analysis of Residuals 6.5.3 Assessing Fit by Adding More Complex Functions 6.5.4 Consistency with Subject-Matter Knowledge
6.6.1 Interval Estimation
6.7 Further Examples 6.6.2 Importance of the Nominal Significance Level 6.6.3 The Full MFP Model 6.6.4 A Single Predictor of Interest 6.6.5 Contribution of Individual Variables to the Model Fit 6.6.6 Predictive Value of Additional Variables
6.7.1 Example 1: Oral Cancer
6.8 Simple Versus Complex Fractional Polynomial Models 6.7.2 Example 2: Diabetes 6.7.3 Example 3: Whitehall I
6.8.1 Complexity and Modelling Aims
6.9 Discussion 6.8.2 Example: GBSG Breast Cancer Data
6.9.1 Philosophy of MFP
6.9.2 Function Complexity, Sample Size and Subject-Matter Knowledge 6.9.3 Improving Robustness by Preliminary Covariate Transformation 6.9.4 Conclusion and Future 7 Interactions
7.1 Introduction
7.2 Background 7.3 General Considerations
7.3.1 Effect of Type of Predictor
7.4 The MFPI Procedure 7.3.2 Power 7.3.3 Randomized Trials and Observational Studies 7.3.4 Predefined Hypothesis or Hypothesis Generation 7.3.5 Interactions Caused by Mismodelling Main Effects 7.3.6 The ‘Treatment–Effect’ Plot 7.3.7 Graphical Checks, Sensitivity and Stability Analyses 7.3.8 Cautious Interpretation is Essential
7.4.1 Model Simplifications
7.5 Example 1: Advanced Prostate Cancer 7.4.2 Check of the Results and Sensitivity Analysis
7.5.1 The Fitted Model
7.6 Example 2: GBSG Breast Cancer Study 7.5.2 Check of the Interactions 7.5.3 Final Model 7.5.4 Further Comments and Interpretation 7.5.5 FP Model Simplification
7.6.1 Oestrogen Receptor Positivity as a Predictive Factor
7.7 Categorization 7.6.2 A Predefined Hypothesis: Tamoxifen–Oestrogen Receptor Interaction
7.7.1 Interaction with Categorized Variables
7.8 STEPP 7.7.2 Example: GBSG Study 7.9 Example 3: Comparison of STEPP with MFPI
7.9.1 Interaction in the Kidney Cancer Data
7.10 Comment on Type I Error of MFPI 7.9.2 Stability Investigation 7.11 Continuous-by-Continuous Interactions
7.11.1 Mismodelling May Induce Interaction
7.12 Multi-Category Variables 7.11.2 MFPIgen: An FP Procedure to Investigate Interactions 7.11.3 Examples of MFPIgen 7.11.4 Graphical Presentation of Continuous-by-Continuous Interactions 7.11.5 Summary 7.13 Discussion Model Stability
8.1 Introduction
8.2 Background 8.3 Using the Bootstrap to Explore Model Stability
8.3.1 Selection of Variables Within a Bootstrap Sample
8.4 Example 1: Glioma Data 8.3.2 The Bootstrap Inclusion Frequency and the Importance of a Variable 8.5 Example 2: Educational Body-Fat Data
8.5.1 Effect of Influential Observations on Model Selection
8.6 Example 3: Breast Cancer Diagnosis 8.7 Model Stability for Functions
8.7.1 Summarizing Variation between Curves
8.8 Example 4: GBSG Breast Cancer Data
8.7.2 Measures of Curve Instability
8.8.1 Interdependencies among Selected Variables and Functions in Subsets
8.9 Discussion
8.8.2 Plots of Functions 8.8.3 Instability Measures 8.8.4 Stability of Functions Depending on Other Variables Included
8.9.1 Relationship between Inclusion Fractions
8.9.2 Stability of Functions 9 Some Comparisons of MFP with Splines
9.1 Introduction
9.2 Background 9.3 MVRS: A Procedure for Model Building with Regression Splines
9.3.1 Restricted Cubic Spline Functions
9.4 MVSS: A Procedure for Model Building with Cubic Smoothing Splines 9.3.2 Function Selection Procedure for Restricted Cubic Splines 9.3.3 The MVRS Algorithm
9.4.1 Cubic Smoothing Splines
9.5 Example 1: Boston Housing Data 9.4.2 Function Selection Procedure for Cubic Smoothing Splines 9.4.3 The MVSS Algorithm
9.5.1 Effect of Reducing the Sample Size
9.6 Example 2: GBSG Breast Cancer Study 9.5.2 Comparing Predictors 9.7 Example 3: Pima Indians 9.8 Example 4: PBC 9.9 Discussion
9.9.1 Splines in General
9.9.2 Complexity of Functions 9.9.3 Optimal Fit or Transferability? 9.9.4 Reporting of Selected Models 9.9.5 Conclusion 10 How to Work with MFP
10.1 Introduction
10.2 The Dataset 10.3 Univariate Analyses 10.4 MFP Analysis 10.5 Model Criticism
10.5.1 Function Plots
10.6 Stability Analysis 10.5.2 Residuals and Lack of Fit 10.5.3 Robustness Transformation and Subject-Matter Knowledge 10.5.4 Diagnostic Plot for Influential Observations 10.5.5 Refined Model 10.5.6 Interactions 10.7 Final Model 10.8 Issues to be Aware of
10.8.1 Selecting the Main-Effects Model
10.9 Discussion 10.8.2 Further Comments on Stability 10.8.3 Searching for Interactions 11 Special Topics Involving Fractional Polynomials
11.1 Time-Varying Hazard Ratios in the Cox Model
11.1.1 The Fractional Polynomial Time Procedure
11.2 Age-specific Reference Intervals 11.1.2 The MFP Time Procedure 11.1.3 Prognostic Model with Time-Varying Effects for Patients with Breast Cancer 11.1.4 Categorization of Survival Time 11.1.5 Discussion
11.2.1 Example: Fetal Growth
11.3 Other Topics 11.2.2 Using FP Functions as Smoothers 11.2.3 More Sophisticated Distributional Assumptions 11.2.4 Discussion
11.3.1 Quantitative Risk Assessment in Developmental Toxicity Studies
11.3.2 Model Uncertainty for Functions 11.3.3 Relative Survival 11.3.4 Approximating Smooth Functions 11.3.5 Miscellaneous Applications 12 Epilogue
12.1 Introduction
12.2 Towards Recommendations for Practice
12.2.1 Variable Selection Procedure
12.3 Omitted Topics and Future Directions 12.2.2 Functional Form for Continuous Covariates 12.2.3 Extreme Values or Influential Points 12.2.4 Sensitivity Analysis 12.2.5 Check for Model Stability 12.2.6 Complexity of a Predictor 12.2.7 Check for Interactions
12.3.1 Measurement Error in Covariates
12.4 Conclusion 12.3.2 Meta-analysis 12.3.3 Multi-level (Hierarchical) Models 12.3.4 Missing Covariate Data 12.3.5 Other Types of Model Appendix A: Data and Software Resources
A.1 Summaries of Datasets
A.2 Datasets used more than once
A.2.1 Research Body Fat
A.3 Software A.2.2 GBSG Breast Cancer A.2.3 Educational Body Fat A.2.4 Glioma A.2.5 Prostate Cancer A.2.6 Whitehall I A.2.7 PBC A.2.8 Oral Cancer A.2.9 Kidney Cancer Appendix B: Glossary of Abbreviations
References
Index
|
Learn
Free webinars
NetCourses
Classroom and web training
Organizational training
Video tutorials
Third-party courses
Web resources
Teaching with Stata
© Copyright 1996–2024 StataCorp LLC. All rights reserved.
×
We use cookies to ensure that we give you the best experience on our website—to enhance site navigation, to analyze usage, and to assist in our marketing efforts. By continuing to use our site, you consent to the storing of cookies on your device and agree to delivery of content, including web fonts and JavaScript, from third party web services.
Cookie Settings
Last updated: 16 November 2022
StataCorp LLC (StataCorp) strives to provide our users with exceptional products and services. To do so, we must collect personal information from you. This information is necessary to conduct business with our existing and potential customers. We collect and use this information only where we may legally do so. This policy explains what personal information we collect, how we use it, and what rights you have to that information.
These cookies are essential for our website to function and do not store any personally identifiable information. These cookies cannot be disabled.
This website uses cookies to provide you with a better user experience. A cookie is a small piece of data our website stores on a site visitor's hard drive and accesses each time you visit so we can improve your access to our site, better understand how you use our site, and serve you content that may be of interest to you. For instance, we store a cookie when you log in to our shopping cart so that we can maintain your shopping cart should you not complete checkout. These cookies do not directly store your personal information, but they do support the ability to uniquely identify your internet browser and device.
Please note: Clearing your browser cookies at any time will undo preferences saved here. The option selected here will apply only to the device you are currently using.