Applied Survey Data Analysis, Second Edition |
||||||||||||||||||||||||||||||||
Click to enlarge See the back cover |
As an Amazon Associate, StataCorp earns a small referral credit from
qualifying purchases made from affiliate links on our site.
eBook not available for this title
eBook not available for this title |
|
||||||||||||||||||||||||||||||
Comment from the Stata technical groupApplied Survey Data Analysis, Second Edition is an intermediate-level, example-driven treatment of current methods for complex survey data. It will appeal to researchers of all disciplines who work with survey data and have basic knowledge of applied statistical methodology for standard (nonsurvey) data. Most of the examples in this book include corresponding Stata commands, making it a valuable resource for researchers analyzing complex survey data using Stata. The authors begin with some history of applied survey data analysis, then discuss some widely used survey datasets, such as the National Health and Nutrition Examination Survey (NHANES). They then proceed to the basic concepts of survey data: sampling plans, weights, clustering, prestratification and poststratification, design effects, and multistage samples. Then they discuss the types of variance estimators: Taylor linearization, jackknife, bootstrap, and balanced repeated replication. The middle sections of the book provide in-depth coverage of the types of analyses that can be performed with survey data, including means and proportions, correlations, tables, linear regression, logistic regression, multinomial logistic regression, Poisson regression, and survival analysis (including Cox regression). The final three chapters are devoted to advanced topics, such as analysis of longitudinal data, multiple imputation, Bayesian analysis, and structural equation models. The appendix provides overviews of popular statistical software, including Stata. |
||||||||||||||||||||||||||||||||
Table of contentsView table of contents >> Preface
Authors
1. Applied Survey Data Analysis: An Overview
1.1 Introduction
1.2 A Brief History of Applied Survey Data Analysis
1.2.1 Key Theoretical Developments
1.3 Example Data Sets and Exercises1.2.2 Key Software Developments 1.4 Steps in Applied Survey Data Analysis 2. Getting to Know the Complex Sample Design
2.1 Introduction
2.1.1 Technical Documentation and Supplemental Literature Review
2.2 Classification of Sample Designs
2.2.1 Sampling Plans
2.3 Target Populations and Survey Populations2.2.2 Other Types of Study Designs Involving Probability Sampling 2.2.3 Inference from Survey Data 2.4 Simple Random Sampling: A Simple Model for Design-Based Inference
2.4.1 Relevance of SRS to Complex Sample Survey Data Analysis
2.5 Complex Sample Design Effects2.4.2 SRS Fundamentals: A Framework for Design-Based Inference 2.4.3 Example of Design-Based Inference under SRS
2.5.1 Design Effect Ratio
2.6 Complex Samples: Cluster Sampling and Stratification2.5.2 Generalized Design Effects and Effective Sample Sizes
2.6.1 Cluster Sampling Plans
2.7 Weighting in Analysis of Survey Data2.6.2 Stratification 2.6.3 Joint Effects of Sample Stratification and Cluster Sampling
2.7.1 Introduction to Weighted Analysis of Survey Data
2.8 Multistage Area Probability Sample Designs2.7.2 Weighting for Probabilities of Selection (wsel) 2.7.3 Nonresponse Adjustment Weights (wnr)
2.7.3.1 Weighting Class Approach (wnr,wc)
2.7.4 Poststratification Weight Factors (wps)2.7.3.2 Propensity Cell Adjustment Approach (wnr,prop) 2.7.5 Design Effects Due to Weighted Analysis
2.8.1 Primary Stage Sampling
2.9 Special Types of Sampling Plans Encountered in Surveys2.8.2 Secondary Stage Sampling 2.8.3 Third- and Fourth-Stage Sampling of HUs and Eligible Respondents 3. Foundations and Techniques for Design-Based Estimation and Inference
3.1 Introduction
3.2 Finite Populations and Superpopulation Models 3.3 CIs for Population Parameters 3.4 Weighted Estimation of Population Parameters 3.5 Probability Distributions and Design-Based Inference
3.5.1 Sampling Distributions of Survey Estimates
3.6 Variance Estimation3.5.2 Degrees of Freedom for t under Complex Sample Designs
3.6.1 Simplifying Assumptions Employed in Complex Sample Variance Estimation
3.7 Hypothesis Testing in Survey Data Analysis3.6.2 TSL Method 3.6.3 Replication Methods for Variance Estimation
3.6.3.1 Jackknife Repeated Replication
3.6.4 Example Comparing Results from the TSL, JRR, BRR, and Bootstrap Methods3.6.3.2 Balanced Repeated Replication 3.6.3.3 Fay's BRR Method 3.6.3.4 Bootstrap (Rao–Wu Rescaling Bootstrap) 3.6.3.5 Construction of Replicate Weights for Replicated Variance Estimation 3.8 TSE and Its Impact on Survey Estimation and Inference
3.8.1 Variable Errors
3.8.2 Biases in Survey Data 4. Preparation for Complex Sample Survey Data Analysis
4.1 Introduction
4.2 Final Survey Weights: Review by the Data User
4.2.1 Identification of the Correct Weight Variable(s) for the Analysis
4.3 Understanding and Checking the Sampling Error Calculation Model4.2.2 Determining the Distribution and Scaling of the Weight Variable(s) 4.2.3 Weighting Applications: Sensitivity of Survey Estimates to the Weights
4.3.1 Stratum and Cluster Codes in Complex Sample Survey Data Sets
4.4 Addressing Item Missing Data in Analysis Variables4.3.2 Building the NCS-R Sampling Error Calculation Model 4.3.3 Combining Strata, Randomly Grouping PSUs, and Collapsing Strata 4.3.4 Checking the Sampling Error Calculation Model for the Survey Data Set
4.4.1 Potential Bias due to Ignoring Missing Data
4.5 Preparing to Analyze Data for Sample Subpopulations4.4.2 Exploring Rates and Patterns of Missing Data Prior to Analysis
4.5.1 Subpopulation Distributions across Sample Design Units
4.6 Final Checklist for Data Users4.5.2 Unconditional Approach for Subclass Analysis 4.5.3 Preparation for Subclass Analyses 5. Descriptive Analysis for Continuous Variables
5.1 Introduction
5.2 Special Considerations in Descriptive Analysis of Complex Sample Survey Data
5.2.1 Weighted Estimation
5.3 Simple Statistics for Univariate Continuous Distributions5.2.2 Design Effects for Descriptive Statistics 5.2.3 Matching the Method to the Variable Type
5.3.1 Graphical Tools for Descriptive Analysis of Survey Data
5.4 Bivariate Relationships between Two Continuous Variables5.3.2 Estimation of Population Totals 5.3.3 Means of Continuous, Binary, or Interval Scale Data 5.3.4 Standard Deviations of Continuous Variables 5.3.5 Estimation of Percentiles, Medians, and Measures of Inequality in Population Distributions for Continuous Variables
5.3.5.1 Estimation of Distribution Quantiles
5.3.5.2 Estimation of Measures of Inequality in Population Distributions
5.4.1 X–Y Scatter Plots
5.5 Descriptive Statistics for Subpopulations5.4.2 Product Moment Correlation Statistic (r) 5.4.3 Ratios of Two Continuous Variables 5.6 Linear Functions of Descriptive Estimates and Differences of Means
5.6.1 Differences of Means for Two Subpopulations
5.6.2 Comparing Means over Time 6. Categorical Data Analysis
6.1 Introduction
6.2 Framework for Analysis of Categorical Survey Data
6.2.1 Incorporating the Complex Design and Pseudo Maximum Likelihood
6.3 Univariate Analysis of Categorical Data6.2.2 Proportions and Percentages 6.2.3 Crosstabulations, Contingency Tables, and Weighted Frequencies
6.3.1 Estimation of Proportions for Binary Variables
6.4 Bivariate Analysis of Categorical Data6.3.2 Estimation of Category Proportions for Multinomial Variables 6.3.3 Testing Hypotheses Concerning a Vector of Population Proportions 6.3.4 Graphical Display for a Single Categorical Variable
6.4.1 Response and Factor Variables
6.5 Analysis of Multivariate Categorical Data6.4.2 Estimation of Total, Row, and Column Proportions for Two-Way Tables 6.4.3 Estimating and Testing Differences in Subpopulation Proportions 6.4.4 Χ2 Tests of Independence of Rows and Columns 6.4.5 Odds Ratios and Relative Risks 6.4.6 Simple Logistic Regression to Estimate the Odds Ratio 6.4.7 Bivariate Graphical Analysis
6.5.1 Cochran–Mantel–Haenszel Test
6.6 Summary
6.5.2 Log-Linear Models for Contingency Tables 7. Linear Regression Models
7.1 Introduction
7.2 Linear Regression Model
7.2.1 Standard Linear Regression Model
7.3 Four Steps in Linear Regression Analysis7.2.2 Survey Treatment of the Regression Model
7.3.1 Step 1: Specifying and Refining the Model
7.4 Some Practical Considerations and Tools7.3.2 Step 2: Estimation of Model Parameters
7.3.2.1 Estimation for the Standard Linear Regression Model
7.3.3 Step 3: Model Evaluation7.3.2.2 Linear Regression Estimation for Complex Sample Survey Data 7.3.4 Step 4: Inference
7.3.4.1 Inference Concerning Model Parameters
7.3.4.2 Prediction Intervals
7.4.1 Distribution of the Dependent Variable
7.5 Application: Modeling Diastolic Blood Pressure with the 2011–2012 NHANES Data7.4.2 Parameterization and Scaling for Independent Variables 7.4.3 Standardization of the Dependent and Independent Variables 7.4.4 Specification and Interpretation of Interactions and Nonlinear Relationships 7.4.5 Model-Building Strategies
7.5.1 Exploring the Bivariate Relationships
7.5.2 Naïve Analysis: Ignoring Sample Design Features 7.5.3 Weighted Regression Analysis 7.5.4 Appropriate Analysis: Incorporating All Sample Design Features 8. Logistic Regression and Generalized Linear Models for Binary Survey Variables
8.1 Introduction
8.2 GLMs for Binary Survey Responses
8.2.1 Logistic Regression Model
8.3 Building the Logistic Regression Model: Stage 1—Model Specification8.2.2 Probit Regression Model 8.2.3 Complementary-Log-Log Model 8.4 Building the Logistic Regression Model: Stage 2—Estimation of Model Parameters and Standard Errors 8.5 Building the Logistic Regression Model: Stage 3—Evaluation of the Fitted Model
8.5.1 Wald Tests of Model Parameters
8.6 Building the Logistic Regression Model: Stage 4—Interpretation and Inference8.5.2 GOF and Logistic Regression Diagnostics 8.7 Analysis Application
8.7.1 Stage 1: Model Specification
8.8 Comparing the Logistic, Probit, and C-L-L GLMs for Binary Dependent Variables8.7.2 Stage 2: Model Estimation 8.7.3 Stage 3: Model Evaluation 8.7.4 Stage 4: Model Interpretation/Inference 9. Generalized Linear Models for Multinomial, Ordinal, and Count Variables
9.1 Introduction
9.2 Analyzing Survey Data Using Multinomial Logit Regression Models
9.2.1 Multinomial Logit Regression Model
9.3 Logistic Regression Models for Ordinal Survey Data9.2.2 Multinomial Logit Regression Model: Specification Stage 9.2.3 Multinomial Logit Regression Model: Estimation Stage 9.2.4 Multinomial Logit Regression Model: Evaluation Stage 9.2.5 Multinomial Logit Regression Model: Interpretation Stage 9.2.6 Example: Fitting a Multinomial Logit Regression Model to Complex Sample Survey Data
9.3.1 Cumulative Logit Regression Model
9.4 Regression Models for Count Outcomes9.3.2 Cumulative Logit Regression Model: Specification Stage 9.3.3 Cumulative Logit Regression Model: Estimation Stage 9.3.4 Cumulative Logit Regression Model: Evaluation Stage 9.3.5 Cumulative Logit Regression Model: Interpretation Stage 9.3.6 Example: Fitting a Cumulative Logit Regression Model to Complex Sample Survey Data
9.4.1 Survey Count Variables and Regression Modeling Alternatives
9.4.2 Generalized Linear Models for Count Variables
9.4.2.1 Poisson Regression Model
9.4.3 Regression Models for Count Data: Specification Stage9.4.2.2 Negative Binomial Regression Model 9.4.2.3 Two-Part Models: Zero-Inflated Poisson and Negative Binomial Regression Models 9.4.4 Regression Models for Count Data: Estimation Stage 9.4.5 Regression Models for Count Data: Evaluation Stage 9.4.6 Regression Models for Count Data: Interpretation Stage 9.4.7 Example: Fitting Poisson and Negative Binomial Regression Models to Complex Sample Survey Data 10. Survival Analysis of Event History Survey Data
10.1 Introduction
10.2 Basic Theory of Survival Analysis
10.2.1 Survey Measurement of Event History Data
10.3 (Nonparametric) K–M Estimation of the Survivor Function10.2.2 Data for Event History Models 10.2.3 Important Notation and Definitions 10.2.4 Models for Survival Analysis
10.3.1 K–M Model Specification and Estimation
10.4 The Cox Proportional Hazards (CPH) Model10.3.2 K–M Estimator: Evaluation and Interpretation 10.3.3 K–M Survival Analysis Example
10.4.1 CPH Model: Specification
10.5 Discrete Time Survival Models10.4.2 CPH Model: Estimation Stage 10.4.3 CPH Model: Evaluation and Diagnostics 10.4.4 CPH Model: Interpretation and Presentation of Results 10.4.5 Example: Fitting a CPH Model to Complex Sample Survey Data
10.5.1 Discrete Time Logistic Model
10.5.2 Data Preparation for Discrete Time Survival Models 10.5.3 Discrete Time Models: Estimation Stage 10.5.4 Discrete Time Models: Evaluation and Interpretation 10.5.5 Fitting a Discrete Time Model to Complex Sample Survey Data 11. Analysis of Longitudinal Complex Sample Survey Data
11.1 Introduction
11.2 Alternative Analytic Objectives with Longitudinal Survey Data
11.2.1 Objective 1: Descriptive Estimation at a Single Time Point
11.3 Alternative Longitudinal Analyses of the HRS Data11.2.2 Objective 2: Estimation of Change across Two Waves 11.2.3 Objective 3: Trajectory Estimation Based on Three or More Waves
11.2.3.1 Approach 1: Weighted Multilevel Modeling
11.2.3.2 Approach 2: Covariance Structure Modeling 11.2.3.3 Approach 3: Weighted GEE Estimation 11.2.3.4 Approach 4: Multiple Imputation Analysis 11.2.3.5 Approach 5: Calibration Adjustment for Respondents with Complete Data
11.3.1 Example: Descriptive Estimation at a Single Wave
11.4 Concluding Remarks11.3.2 Example: Change across Two Waves
11.3.2.1 Accounting for Refreshment Samples When Estimating Mean Change
11.3.3 Example: Weighted Multilevel Modeling
11.3.3.1 Example: Veiga et al. (2014)
11.3.4 Example: Weighted GEE Analysis12. Imputation of Missing Data: Practical Methods and Applications for Survey Analysts
12.1 Introduction12.2 Important Missing Data Concepts
12.2.1 Sources and Types of Missing Data
12.3 Factors to Consider in Choosing an Imputation Method12.2.2 Patterns of Item Missing Data in Surveys 12.2.3 Item Missing Data Mechanisms 12.2.4 Review of Strategies to Address Item Missing Data in Surveys 12.4 Multiple Imputation
12.4.1 Overview of MI and MI Phases
12.5 Fractional Imputation12.4.2 Models for Multiply Imputing Missing Data
12.4.2.1 Choosing the Variables to Include in the Imputation Model
12.4.3 Creating the MIs12.4.2.2 Distributional Assumptions for the Imputation Model
12.4.3.1 Transforming the Imputation Problem to Monotonic Missing Data
12.4.4 Estimation and Inference for Multiply Imputed Data12.4.3.2 Specifying an Explicit Multivariate Model and Applying Exact Bayesian Posterior Simulation Methods 12.4.3.3 SR or "Chained Regressions"
12.4.4.1 Estimators for Population Parameters and Associated Variance Estimators
12.4.4.2 Model Evaluation and Inference
12.5.1 Background
12.6 Application of MI and FI Methods to the NHANES 2011–2012 Data12.5.2 Creating the FIs 12.5.3 Estimation and Inference with Fractionally Imputed Data 12.5.4 FI Software
12.6.1 Problem Definition
12.6.2 Imputation Models for the NHANES DBP Example 12.6.3 Imputation of the Item Missing Data
12.6.3.1 Multiple Imputation
12.6.4 Estimation and Inference12.6.3.2 FEFI: Hot Deck Method
12.6.4.1 Multiple Imputation
12.6.5 Comparison of Example Results from Complete Case Analysis, MI, and FEFI12.6.4.2 FI Estimation and Inference 13. Advanced Topics in the Analysis of Survey Data
13.1 Introduction
13.2 Bayesian Analysis of Complex Sample Survey Data 13.3 GLMMs in Survey Data Analysis
13.3.1 Overview of GLMMs
13.4 Fitting Structural Equation Models to Complex Sample Survey Data13.3.2 GLMMs and Complex Sample Survey Data 13.3.3 Alternative Approaches to Fitting GLMMs to Survey Data: The PISA Example
13.4.1 SEM Example: Analysis of ESS Data from Belgium
13.5 Small Area Estimation and Complex Sample Survey Data 13.6 Nonparametric Methods for Complex Sample Survey Data References
Appendix A: Software Overview
Index
|
Learn
Free webinars
NetCourses
Classroom and web training
Organizational training
Video tutorials
Third-party courses
Web resources
Teaching with Stata
© Copyright 1996–2024 StataCorp LLC. All rights reserved.
×
We use cookies to ensure that we give you the best experience on our website—to enhance site navigation, to analyze usage, and to assist in our marketing efforts. By continuing to use our site, you consent to the storing of cookies on your device and agree to delivery of content, including web fonts and JavaScript, from third party web services.
Cookie Settings
Last updated: 16 November 2022
StataCorp LLC (StataCorp) strives to provide our users with exceptional products and services. To do so, we must collect personal information from you. This information is necessary to conduct business with our existing and potential customers. We collect and use this information only where we may legally do so. This policy explains what personal information we collect, how we use it, and what rights you have to that information.
These cookies are essential for our website to function and do not store any personally identifiable information. These cookies cannot be disabled.
This website uses cookies to provide you with a better user experience. A cookie is a small piece of data our website stores on a site visitor's hard drive and accesses each time you visit so we can improve your access to our site, better understand how you use our site, and serve you content that may be of interest to you. For instance, we store a cookie when you log in to our shopping cart so that we can maintain your shopping cart should you not complete checkout. These cookies do not directly store your personal information, but they do support the ability to uniquely identify your internet browser and device.
Please note: Clearing your browser cookies at any time will undo preferences saved here. The option selected here will apply only to the device you are currently using.