Stata Bookstore: Applied Survey Data Analysis, Second Edition

Home / Bookstore / Title index / Survey statistics / Applied Survey Data Analysis, Second Edition

Applied Survey Data Analysis, Second Edition

As an Amazon Associate, StataCorp earns a small referral credit from qualifying purchases made from affiliate links on our site.

Amazon Associate affiliate link

What are VitalSource eBooks?
Your access code will be emailed upon purchase.

eBook not available for this title

Authors:	Steven G. Heeringa, Brady T. West, and Patricia A. Berglund
Publisher:	CRC Press
Copyright:	2017
ISBN-13:	978-1-4987-6160-4
Pages:	568; hardcover

Authors:	Steven G. Heeringa, Brady T. West, and Patricia A. Berglund
Publisher:	CRC Press
Copyright:	2017
ISBN-13:
Pages:	568; eBook

Authors:	Steven G. Heeringa, Brady T. West, and Patricia A. Berglund
Publisher:	CRC Press
Copyright:	2017
ISBN-13:
Pages:	568; Kindle

Comment from the Stata technical group

Applied Survey Data Analysis, Second Edition is an intermediate-level, example-driven treatment of current methods for complex survey data. It will appeal to researchers of all disciplines who work with survey data and have basic knowledge of applied statistical methodology for standard (nonsurvey) data. Most of the examples in this book include corresponding Stata commands, making it a valuable resource for researchers analyzing complex survey data using Stata.

The authors begin with some history of applied survey data analysis, then discuss some widely used survey datasets, such as the National Health and Nutrition Examination Survey (NHANES). They then proceed to the basic concepts of survey data: sampling plans, weights, clustering, prestratification and poststratification, design effects, and multistage samples. Then they discuss the types of variance estimators: Taylor linearization, jackknife, bootstrap, and balanced repeated replication.

The middle sections of the book provide in-depth coverage of the types of analyses that can be performed with survey data, including means and proportions, correlations, tables, linear regression, logistic regression, multinomial logistic regression, Poisson regression, and survival analysis (including Cox regression). The final three chapters are devoted to advanced topics, such as analysis of longitudinal data, multiple imputation, Bayesian analysis, and structural equation models. The appendix provides overviews of popular statistical software, including Stata.

View table of contents >>

Preface

Authors

1. Applied Survey Data Analysis: An Overview

1.1 Introduction
1.2 A Brief History of Applied Survey Data Analysis

1.2.1 Key Theoretical Developments
1.2.2 Key Software Developments

1.3 Example Data Sets and Exercises
1.4 Steps in Applied Survey Data Analysis

2. Getting to Know the Complex Sample Design

2.1 Introduction

2.1.1 Technical Documentation and Supplemental Literature Review

2.2 Classification of Sample Designs

2.2.1 Sampling Plans
2.2.2 Other Types of Study Designs Involving Probability Sampling
2.2.3 Inference from Survey Data

2.3 Target Populations and Survey Populations
2.4 Simple Random Sampling: A Simple Model for Design-Based Inference

2.4.1 Relevance of SRS to Complex Sample Survey Data Analysis
2.4.2 SRS Fundamentals: A Framework for Design-Based Inference
2.4.3 Example of Design-Based Inference under SRS

2.5 Complex Sample Design Effects

2.5.1 Design Effect Ratio
2.5.2 Generalized Design Effects and Effective Sample Sizes

2.6 Complex Samples: Cluster Sampling and Stratification

2.6.1 Cluster Sampling Plans
2.6.2 Stratification
2.6.3 Joint Effects of Sample Stratification and Cluster Sampling

2.7 Weighting in Analysis of Survey Data

2.7.1 Introduction to Weighted Analysis of Survey Data
2.7.2 Weighting for Probabilities of Selection (w_sel)
2.7.3 Nonresponse Adjustment Weights (w_nr)

2.7.3.1 Weighting Class Approach (w_nr,wc)
2.7.3.2 Propensity Cell Adjustment Approach (w_nr,prop)

2.7.4 Poststratification Weight Factors (w_ps)
2.7.5 Design Effects Due to Weighted Analysis

2.8 Multistage Area Probability Sample Designs

2.8.1 Primary Stage Sampling
2.8.2 Secondary Stage Sampling
2.8.3 Third- and Fourth-Stage Sampling of HUs and Eligible Respondents

2.9 Special Types of Sampling Plans Encountered in Surveys

3. Foundations and Techniques for Design-Based Estimation and Inference

3.1 Introduction
3.2 Finite Populations and Superpopulation Models
3.3 CIs for Population Parameters
3.4 Weighted Estimation of Population Parameters
3.5 Probability Distributions and Design-Based Inference

3.5.1 Sampling Distributions of Survey Estimates
3.5.2 Degrees of Freedom for t under Complex Sample Designs

3.6 Variance Estimation

3.6.1 Simplifying Assumptions Employed in Complex Sample Variance Estimation
3.6.2 TSL Method
3.6.3 Replication Methods for Variance Estimation

3.6.3.1 Jackknife Repeated Replication
3.6.3.2 Balanced Repeated Replication
3.6.3.3 Fay's BRR Method
3.6.3.4 Bootstrap (Rao–Wu Rescaling Bootstrap)
3.6.3.5 Construction of Replicate Weights for Replicated Variance Estimation

3.6.4 Example Comparing Results from the TSL, JRR, BRR, and Bootstrap Methods

3.7 Hypothesis Testing in Survey Data Analysis
3.8 TSE and Its Impact on Survey Estimation and Inference

3.8.1 Variable Errors
3.8.2 Biases in Survey Data

4. Preparation for Complex Sample Survey Data Analysis

4.1 Introduction
4.2 Final Survey Weights: Review by the Data User

4.2.1 Identification of the Correct Weight Variable(s) for the Analysis
4.2.2 Determining the Distribution and Scaling of the Weight Variable(s)
4.2.3 Weighting Applications: Sensitivity of Survey Estimates to the Weights

4.3 Understanding and Checking the Sampling Error Calculation Model

4.3.1 Stratum and Cluster Codes in Complex Sample Survey Data Sets
4.3.2 Building the NCS-R Sampling Error Calculation Model
4.3.3 Combining Strata, Randomly Grouping PSUs, and Collapsing Strata
4.3.4 Checking the Sampling Error Calculation Model for the Survey Data Set

4.4 Addressing Item Missing Data in Analysis Variables

4.4.1 Potential Bias due to Ignoring Missing Data
4.4.2 Exploring Rates and Patterns of Missing Data Prior to Analysis

4.5 Preparing to Analyze Data for Sample Subpopulations

4.5.1 Subpopulation Distributions across Sample Design Units
4.5.2 Unconditional Approach for Subclass Analysis
4.5.3 Preparation for Subclass Analyses

4.6 Final Checklist for Data Users

5. Descriptive Analysis for Continuous Variables

5.1 Introduction
5.2 Special Considerations in Descriptive Analysis of Complex Sample Survey Data

5.2.1 Weighted Estimation
5.2.2 Design Effects for Descriptive Statistics
5.2.3 Matching the Method to the Variable Type

5.3 Simple Statistics for Univariate Continuous Distributions

5.3.1 Graphical Tools for Descriptive Analysis of Survey Data
5.3.2 Estimation of Population Totals
5.3.3 Means of Continuous, Binary, or Interval Scale Data
5.3.4 Standard Deviations of Continuous Variables
5.3.5 Estimation of Percentiles, Medians, and Measures of Inequality in Population Distributions for Continuous Variables

5.3.5.1 Estimation of Distribution Quantiles
5.3.5.2 Estimation of Measures of Inequality in Population Distributions

5.4 Bivariate Relationships between Two Continuous Variables

5.4.1 X–Y Scatter Plots
5.4.2 Product Moment Correlation Statistic (r)
5.4.3 Ratios of Two Continuous Variables

5.5 Descriptive Statistics for Subpopulations
5.6 Linear Functions of Descriptive Estimates and Differences of Means

5.6.1 Differences of Means for Two Subpopulations
5.6.2 Comparing Means over Time

6. Categorical Data Analysis

6.1 Introduction
6.2 Framework for Analysis of Categorical Survey Data

6.2.1 Incorporating the Complex Design and Pseudo Maximum Likelihood
6.2.2 Proportions and Percentages
6.2.3 Crosstabulations, Contingency Tables, and Weighted Frequencies

6.3 Univariate Analysis of Categorical Data

6.3.1 Estimation of Proportions for Binary Variables
6.3.2 Estimation of Category Proportions for Multinomial Variables
6.3.3 Testing Hypotheses Concerning a Vector of Population Proportions
6.3.4 Graphical Display for a Single Categorical Variable

6.4 Bivariate Analysis of Categorical Data

6.4.1 Response and Factor Variables
6.4.2 Estimation of Total, Row, and Column Proportions for Two-Way Tables
6.4.3 Estimating and Testing Differences in Subpopulation Proportions
6.4.4 Χ² Tests of Independence of Rows and Columns
6.4.5 Odds Ratios and Relative Risks
6.4.6 Simple Logistic Regression to Estimate the Odds Ratio
6.4.7 Bivariate Graphical Analysis

6.5 Analysis of Multivariate Categorical Data

6.5.1 Cochran–Mantel–Haenszel Test
6.5.2 Log-Linear Models for Contingency Tables

6.6 Summary

7. Linear Regression Models

7.1 Introduction
7.2 Linear Regression Model

7.2.1 Standard Linear Regression Model
7.2.2 Survey Treatment of the Regression Model

7.3 Four Steps in Linear Regression Analysis

7.3.1 Step 1: Specifying and Refining the Model
7.3.2 Step 2: Estimation of Model Parameters

7.3.2.1 Estimation for the Standard Linear Regression Model
7.3.2.2 Linear Regression Estimation for Complex Sample Survey Data

7.3.3 Step 3: Model Evaluation
7.3.4 Step 4: Inference

7.3.4.1 Inference Concerning Model Parameters
7.3.4.2 Prediction Intervals

7.4 Some Practical Considerations and Tools

7.4.1 Distribution of the Dependent Variable
7.4.2 Parameterization and Scaling for Independent Variables
7.4.3 Standardization of the Dependent and Independent Variables
7.4.4 Specification and Interpretation of Interactions and Nonlinear Relationships
7.4.5 Model-Building Strategies

7.5 Application: Modeling Diastolic Blood Pressure with the 2011–2012 NHANES Data

7.5.1 Exploring the Bivariate Relationships
7.5.2 Naïve Analysis: Ignoring Sample Design Features
7.5.3 Weighted Regression Analysis
7.5.4 Appropriate Analysis: Incorporating All Sample Design Features

8. Logistic Regression and Generalized Linear Models for Binary Survey Variables

8.1 Introduction
8.2 GLMs for Binary Survey Responses

8.2.1 Logistic Regression Model
8.2.2 Probit Regression Model
8.2.3 Complementary-Log-Log Model

8.3 Building the Logistic Regression Model: Stage 1—Model Specification
8.4 Building the Logistic Regression Model: Stage 2—Estimation of Model Parameters and Standard Errors
8.5 Building the Logistic Regression Model: Stage 3—Evaluation of the Fitted Model

8.5.1 Wald Tests of Model Parameters
8.5.2 GOF and Logistic Regression Diagnostics

8.6 Building the Logistic Regression Model: Stage 4—Interpretation and Inference
8.7 Analysis Application

8.7.1 Stage 1: Model Specification
8.7.2 Stage 2: Model Estimation
8.7.3 Stage 3: Model Evaluation
8.7.4 Stage 4: Model Interpretation/Inference

8.8 Comparing the Logistic, Probit, and C-L-L GLMs for Binary Dependent Variables

9. Generalized Linear Models for Multinomial, Ordinal, and Count Variables

9.1 Introduction
9.2 Analyzing Survey Data Using Multinomial Logit Regression Models

9.2.1 Multinomial Logit Regression Model
9.2.2 Multinomial Logit Regression Model: Specification Stage
9.2.3 Multinomial Logit Regression Model: Estimation Stage
9.2.4 Multinomial Logit Regression Model: Evaluation Stage
9.2.5 Multinomial Logit Regression Model: Interpretation Stage
9.2.6 Example: Fitting a Multinomial Logit Regression Model to Complex Sample Survey Data

9.3 Logistic Regression Models for Ordinal Survey Data

9.3.1 Cumulative Logit Regression Model
9.3.2 Cumulative Logit Regression Model: Specification Stage
9.3.3 Cumulative Logit Regression Model: Estimation Stage
9.3.4 Cumulative Logit Regression Model: Evaluation Stage
9.3.5 Cumulative Logit Regression Model: Interpretation Stage
9.3.6 Example: Fitting a Cumulative Logit Regression Model to Complex Sample Survey Data

9.4 Regression Models for Count Outcomes

9.4.1 Survey Count Variables and Regression Modeling Alternatives
9.4.2 Generalized Linear Models for Count Variables

9.4.2.1 Poisson Regression Model
9.4.2.2 Negative Binomial Regression Model
9.4.2.3 Two-Part Models: Zero-Inflated Poisson and Negative Binomial Regression Models

9.4.3 Regression Models for Count Data: Specification Stage
9.4.4 Regression Models for Count Data: Estimation Stage
9.4.5 Regression Models for Count Data: Evaluation Stage
9.4.6 Regression Models for Count Data: Interpretation Stage
9.4.7 Example: Fitting Poisson and Negative Binomial Regression Models to Complex Sample Survey Data

10. Survival Analysis of Event History Survey Data

10.1 Introduction
10.2 Basic Theory of Survival Analysis

10.2.1 Survey Measurement of Event History Data
10.2.2 Data for Event History Models
10.2.3 Important Notation and Definitions
10.2.4 Models for Survival Analysis

10.3 (Nonparametric) K–M Estimation of the Survivor Function

10.3.1 K–M Model Specification and Estimation
10.3.2 K–M Estimator: Evaluation and Interpretation
10.3.3 K–M Survival Analysis Example

10.4 The Cox Proportional Hazards (CPH) Model

10.4.1 CPH Model: Specification
10.4.2 CPH Model: Estimation Stage
10.4.3 CPH Model: Evaluation and Diagnostics
10.4.4 CPH Model: Interpretation and Presentation of Results
10.4.5 Example: Fitting a CPH Model to Complex Sample Survey Data

10.5 Discrete Time Survival Models

10.5.1 Discrete Time Logistic Model
10.5.2 Data Preparation for Discrete Time Survival Models
10.5.3 Discrete Time Models: Estimation Stage
10.5.4 Discrete Time Models: Evaluation and Interpretation
10.5.5 Fitting a Discrete Time Model to Complex Sample Survey Data

11. Analysis of Longitudinal Complex Sample Survey Data

11.1 Introduction
11.2 Alternative Analytic Objectives with Longitudinal Survey Data

11.2.1 Objective 1: Descriptive Estimation at a Single Time Point
11.2.2 Objective 2: Estimation of Change across Two Waves
11.2.3 Objective 3: Trajectory Estimation Based on Three or More Waves

11.2.3.1 Approach 1: Weighted Multilevel Modeling
11.2.3.2 Approach 2: Covariance Structure Modeling
11.2.3.3 Approach 3: Weighted GEE Estimation
11.2.3.4 Approach 4: Multiple Imputation Analysis
11.2.3.5 Approach 5: Calibration Adjustment for Respondents with Complete Data

11.3 Alternative Longitudinal Analyses of the HRS Data

11.3.1 Example: Descriptive Estimation at a Single Wave
11.3.2 Example: Change across Two Waves

11.3.2.1 Accounting for Refreshment Samples When Estimating Mean Change

11.3.3 Example: Weighted Multilevel Modeling

11.3.3.1 Example: Veiga et al. (2014)

11.3.4 Example: Weighted GEE Analysis

11.4 Concluding Remarks

12. Imputation of Missing Data: Practical Methods and Applications for Survey Analysts

12.1 Introduction
12.2 Important Missing Data Concepts

12.2.1 Sources and Types of Missing Data
12.2.2 Patterns of Item Missing Data in Surveys
12.2.3 Item Missing Data Mechanisms
12.2.4 Review of Strategies to Address Item Missing Data in Surveys

12.3 Factors to Consider in Choosing an Imputation Method
12.4 Multiple Imputation

12.4.1 Overview of MI and MI Phases
12.4.2 Models for Multiply Imputing Missing Data

12.4.2.1 Choosing the Variables to Include in the Imputation Model
12.4.2.2 Distributional Assumptions for the Imputation Model

12.4.3 Creating the MIs

12.4.3.1 Transforming the Imputation Problem to Monotonic Missing Data
12.4.3.2 Specifying an Explicit Multivariate Model and Applying Exact Bayesian Posterior Simulation Methods
12.4.3.3 SR or "Chained Regressions"

12.4.4 Estimation and Inference for Multiply Imputed Data

12.4.4.1 Estimators for Population Parameters and Associated Variance Estimators
12.4.4.2 Model Evaluation and Inference

12.5 Fractional Imputation

12.5.1 Background
12.5.2 Creating the FIs
12.5.3 Estimation and Inference with Fractionally Imputed Data
12.5.4 FI Software

12.6 Application of MI and FI Methods to the NHANES 2011–2012 Data

12.6.1 Problem Definition
12.6.2 Imputation Models for the NHANES DBP Example
12.6.3 Imputation of the Item Missing Data

12.6.3.1 Multiple Imputation
12.6.3.2 FEFI: Hot Deck Method

12.6.4 Estimation and Inference

12.6.4.1 Multiple Imputation
12.6.4.2 FI Estimation and Inference

12.6.5 Comparison of Example Results from Complete Case Analysis, MI, and FEFI

13. Advanced Topics in the Analysis of Survey Data

13.1 Introduction
13.2 Bayesian Analysis of Complex Sample Survey Data
13.3 GLMMs in Survey Data Analysis

13.3.1 Overview of GLMMs
13.3.2 GLMMs and Complex Sample Survey Data
13.3.3 Alternative Approaches to Fitting GLMMs to Survey Data: The PISA Example

13.4 Fitting Structural Equation Models to Complex Sample Survey Data

13.4.1 SEM Example: Analysis of ESS Data from Belgium
13.5 Small Area Estimation and Complex Sample Survey Data
13.6 Nonparametric Methods for Complex Sample Survey Data

References

Appendix A: Software Overview

Index

Applied Survey Data Analysis, Second Edition

Comment from the Stata technical group

Table of contents

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies

Stata/MP4 Annual License (download)

Applied Survey Data Analysis, Second Edition

Comment from the Stata technical group

Table of contents

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies