Stata's commands are intuitive and easy to learn. Even better, everything you learn about performing a task can be applied to other tasks.
Need to limit your analysis to females? Add if female==1 to any command.
Need standard errors that are robust to many common assumptions? Add vce(robust) to almost any estimation command.
Need to account for sampling weights, clusters, and stratification? Add svy: to the beginning of the command.
The consistency goes even deeper. What you learn about data management commands often applies to estimation commands, and vice versa.
There is a full suite of postestimation commands to perform hypothesis tests, form linear and nonlinear combinations, make predictions, form contrasts, and even perform marginal analysis with interaction plots. These commands work the same way after virtually every estimator.
Let's start with linear regression. We fit a variety of models and explore results using the postestimation commands for testing, prediction, and marginal analysis.
// Regression of body mass index (BMI) on age and region indicators regress bmi age i.region // Fit the model for females only regress bmi age i.region if female==1 // Obtain robust standard errors regress bmi age i.region, vce(robust) // Include a female indicator and its interaction with age regress bmi age i.region i.female c.age#i.female // Perform a joint test of significance for the region indicators testparm i.region // Compute the predicted BMI for each person predict bmi_hat // Obtain the average prediction (potential outcome), treating // all individuals as if they live in region 1 margins 1.region // Obtain average predictions for all regions margins region // Obtain average predictions by sex across a range of ages margins female, at(age=(20 40 60 80)) // Plot this interaction marginsplot (See the graph)
What if we instead have a binary outcome variable, an indicator of whether an individual has high blood pressure? We could fit a logistic regression model. We replace regress in the commands above with logistic, and we use highbp instead of bmi as the dependent variable. Otherwise, the model specification, options, and postestimation commands are almost identical.
// Logistic regression of high blood pressure on age and region indicators logistic highbp age i.region // Fit the model for females only logistic highbp age i.region if female==1 // Obtain robust standard errors logistic highbp age i.region, vce(robust) // Include a female indicator and its interaction with age logistic highbp age i.region i.female c.age#i.female // Perform a joint test of significance for the region indicators testparm i.region // Compute the predicted probability of high blood pressure // for each person predict prob_hbp // Obtain the average predicted probability (potential outcome), // treating all individuals as if they live in region 1 margins 1.region // Obtain average predicted probability for all regions margins region // Obtain average predicted probabilities by sex across a range of ages margins female, at(age=(20 40 60 80)) // Plot this interaction marginsplot (See the graph)
If we have a count outcome such as the number of individuals in the household, we might want to fit a Poisson model. We use the poisson command and housesize as the dependent variable, but again, the rest of the command syntax is the same.
// Poisson regression of household size on age and region indicators poisson housesize age i.region // Fit the model for females only poisson housesize age i.region if female==1 // Obtain robust standard errors poisson housesize age i.region, vce(robust) // Include a rural location indicator and its interaction with age poisson housesize age i.region i.rural c.age#i.rural // Perform a joint test of significance for the region indicators testparm i.region // Compute the predicted number of individuals in each household predict size // Obtain the average predicted household size (potential outcome), // treating all individuals as if they live in region 1 margins 1.region // Obtain average predicted household size for all regions margins region // Obtain average predicted household size by rural across // a range of ages margins rural, at(age=(20 40 60 80)) // Plot this interaction marginsplot (See the graph)
We could fit many other models. Models for ordered and unordered categorical outcomes. Multilevel models. Models for time-series, panel, or survival data. Models accounting for endogeneity and sample selection. Regardless of the model, we can use the same command structure, same options, and same postestimation commands that we used above.
See the commands for fitting and interpreting linear regression models. Or watch the webinar.