Stata's margins and marginsplot commands are powerful tools for visualizing the results of regression models. We will use linear regression below, but the same principles and syntax work with nearly all of Stata's regression commands, including probit, logistic, poisson, and others. You will want to review Stata's factor-variable notation if you have not used it before.
Let's begin by opening the nhanes2l dataset. Then let's describe and summarize the variables bpsystol, hlthstat, diabetes, age, and weight.
. webuse nhanes2l (Second National Health and Nutrition Examination Survey) . describe bpsystol hlthstat diabetes age weight
Variable Storage Display Value name type format label Variable label |
bpsystol int %9.0g Systolic blood pressure hlthstat byte %20.0g hlth Health status diabetes byte %12.0g diabetes Diabetes status age byte %9.0g Age (years) weight float %9.0g Weight (kg) |
Variable | Obs Mean Std. dev. Min Max | |
bpsystol | 10,351 130.8817 23.33265 65 300 | |
hlthstat | 10,335 2.586164 1.206196 1 5 | |
diabetes | 10,349 .0482172 .2142353 0 1 | |
age | 10,351 47.57965 17.21483 20 74 | |
weight | 10,351 71.89752 15.35642 30.84 175.88 |
We are going to fit a series of linear regression models for the outcome variable bpsystol, which measures systolic blood pressure (SBP) with a range of 65 to 300 mmHg. hlthstat measures health status with a range from 1 to 5. diabetes measures diabetes status with a range of 0 to 1. age measures age with a range of 20 to 74 years. And weight measures body mass with a range of 30.8 to 175.9 kilograms.
Let's fit a linear regression model using the continuous outcome variable bpsystol and the continuous predictor variables age and weight. Note that I have used factor-variable notation to tell Stata that age and weight are continuous predictors, and I have used the “##” operator to request the main effects and interaction of both predictor variables.
. regress bpsystol c.age##c.weight
Source | SS df MS | Number of obs = 10,351 | F(3, 10347) = 1510.88 |
Model | 1716436.54 3 572145.514 | Prob > F = 0.0000 | |
Residual | 3918233.49 10,347 378.683047 | R-squared = 0.3046 | Adj R-squared = 0.3044 |
Total | 5634670.03 10,350 544.412563 | Root MSE = 19.46 |
bpsystol | Coefficient Std. err. t P>|t| [95% conf. interval] | |
age | .8898576 .0536198 16.60 0.000 .7847525 .9949627 | |
weight | .5733109 .0368295 15.57 0.000 .5011179 .6455039 | |
c.age# | ||
c.weight | -.003581 .0007458 -4.80 0.000 -.0050429 -.0021191 | |
_cons | 59.60983 2.64211 22.56 0.000 54.43079 64.78888 | |
Interpreting the coefficients can be challenging with interactions. We cannot use the simple “a unit change in x...” interpretation for the main effects because we must account for the interaction term. And the interpretation of the interaction term is challenging because it involves the product of two variables. It is probably more useful to estimate the expected SBP in this situation.
Stata's margins command will estimate the expected SBP for combinations of age and weight. For example, we can use margins with the at() option to estimate the expected SBP for a 20-year-old who weighs 70 kilograms.
. margins, at(age=20 weight=70) Adjusted predictions Number of obs = 10,351 Model VCE: OLS Expression: Linear prediction, predict() At: age = 20 weight = 70
Delta-method | ||
Margin std. err. t P>|t| [95% conf. interval] | ||
_cons | 112.5253 .3614373 311.33 0.000 111.8168 113.2338 | |
The value in the “Margin” column tells us that the expected SBP is 112.5253 mmHg. The output also reports a standard error, t statistic, p-value, and 95% confidence interval for each estimate. The t statistic tests the null hypothesis that the expected SBP is zero.
This information is helpful, but this is a situation where a good visualization could help us understand the relationship between these three variables.
Let's begin our visualization by using margins to estimate predictions for all combinations of age ranging from 20 to 60 in increments of 5 and weight ranging from 30 to 160 kilograms in increments of 10. This command will produce a lot of output, so let's use the quietly prefix to hide the output and use the saving() option to save the results to a data file named predictions.dta.
. quietly margins, at(age == (20(5)60) weight == (30(20)160)) saving(predictions.dta, replace)
Let's open the data file and describe the variables _at1, _at2, and _margin.
. use predictions, clear (Created by command margins; also see char list) . describe _at1 _at2 _margin
Variable Storage Display Value name type format label Variable label |
_at1 byte %9.0g Age (years) _at2 int %9.0g Weight (kg) _margin float %9.0g Linear prediction, predict() |
The variable _at1 contains the age data, _at2 contains the weight data, and _margin contains the linear predictions of SBP.
Let's summarize the variable _margin.
. summarize _margin
Variable | Obs Mean Std. dev. Min Max | |
_margin | 63 133.9104 18.94039 92.4577 166.7687 |
The minimum value of _margin is 92.4577, and the maximum value is 170.3532. We will use this information in our graph below.
Now let's use twoway contour to create a contour plot of the predictions from our model.
The first line of the twoway command includes contour followed by the z-axis variable _margin, the y-axis variable _at2, and the x-axis variable _at1. These variables are followed by the ccuts() option, which tells twoway contour to draw cuts at _margin values of 100 through 170 in increments of 10. We chose these based on the minimum and maximum values reported by summarize _margin above.
The second line defines the x-axis labels for _at1, which contains age. I have selected the same range that I used for age in the margins command above.
The third line defines the y-axis labels for _at2, which contains weight. I have selected the same range that I used for weight in the margins command above.
Lines 4–7 include options to define titles for the axes and the graph.
. twoway (contour _margin _at2 _at1, ccuts(100(10)170)), xlabel(20(10)60) ylabel(30(20)160, angle(horizontal)) xtitle("Age (years)", margin(medium)) ytitle("Weight (kg)", margin(medium)) ztitle("Predicted SBP") title("Predicted SBP by Age and Weight")
The key on the right side of the graph tells us which colors in the graph correspond to the expected SBP for various combinations of age and weight. The contours of the graph clearly demonstrate the concept of interaction. age alone does not give us enough information to predict SBP. And weight alone does not give us enough information to predict SBP. We must know both age and weight to predict SBP.
You can read more about factor-variable notation, margins, and marginsplot in the Stata documentation. You can also watch a demonstration of these commands by clicking on the links to the YouTube videos below.
Read more in the Stata Base Reference Manual; see [R] margins, [R] marginsplot, and [R] regress. And in the Stata User’s Guide, see [U-11] factor variables.