Stata's margins and marginsplot commands are powerful tools for visualizing the results of regression models. We will use linear regression below, but the same principles and syntax work with nearly all of Stata's regression commands, including probit, logistic, poisson, and others. You will want to review Stata's factor-variable notation if you have not used it before.
Let's begin by opening the nhanes2l dataset. Then let's describe and summarize the variables bpsystol, hlthstat, diabetes, age, and bmi.
. webuse nhanes2l (Second National Health and Nutrition Examination Survey) . describe bpsystol hlthstat diabetes age bmi
Variable Storage Display Value name type format label Variable label |
bpsystol int %9.0g Systolic blood pressure hlthstat byte %20.0g hlth Health status diabetes byte %12.0g diabetes Diabetes status age byte %9.0g Age (years) bmi float %9.0g Body mass index (BMI) |
Variable | Obs Mean Std. dev. Min Max | |
bpsystol | 10,351 130.8817 23.33265 65 300 | |
hlthstat | 10,335 2.586164 1.206196 1 5 | |
diabetes | 10,349 .0482172 .2142353 0 1 | |
age | 10,351 47.57965 17.21483 20 74 | |
bmi | 10,351 25.5376 4.914969 12.3856 61.1297 |
We are going to fit a series of linear regression models for the outcome variable bpsystol, which measures systolic blood pressure (SBP) with a range of 65 to 300 mmHg. hlthstat measures health status with a range from 1 to 5. diabetes measures diabetes status with a range of 0 to 1. age measures age with a range of 20 to 74 years. And bmi measures body mass index with a range of 12.4 to 61.1 kg/m2.
Let's fit a linear regression model using the continuous outcome variable bpsystol, the binary predictor variable diabetes, and the categorical predictor variable hlthstat. Note that I have used factor-variable notation to tell Stata that diabetes and hlthstat are categorical predictors, and I have used the “##” operator to request the main effects and interaction of both predictor variables.
. regress bpsystol i.hlthstat##i.diabetes
Source | SS df MS | Number of obs = 10,335 | F(9, 10325) = 86.92 |
Model | 396524.045 9 44058.2272 | Prob > F = 0.0000 | |
Residual | 5233449.31 10,325 506.871604 | R-squared = 0.0704 | Adj R-squared = 0.0696 |
Total | 5629973.35 10,334 544.800982 | Root MSE = 22.514 |
bpsystol | Coefficient Std. err. t P>|t| [95% conf. interval] | |
hlthstat | ||
Very good | 2.636051 .6417076 4.11 0.000 1.37818 3.893922 | |
Good | 7.648725 .6272209 12.19 0.000 6.419251 8.8782 | |
Fair | 13.50647 .7408272 18.23 0.000 12.0543 14.95863 | |
Poor | 14.77223 1.032484 14.31 0.000 12.74837 16.7961 | |
diabetes | ||
Diabetic | 5.780232 4.618696 1.25 0.211 -3.273308 14.83377 | |
hlthstat# | ||
diabetes | ||
Very good # | ||
Diabetic | 17.43339 5.726714 3.04 0.002 6.207924 28.65886 | |
Good # | ||
Diabetic | 4.023894 5.032308 0.80 0.424 -5.840404 13.88819 | |
Fair # | ||
Diabetic | 7.316062 4.97969 1.47 0.142 -2.445096 17.07722 | |
Poor # | ||
Diabetic | 3.445358 5.09316 0.68 0.499 -6.538222 13.42894 | |
_cons | 124.2614 .4611975 269.43 0.000 123.3574 125.1655 | |
The output can be challenging to interpret because we have two predictors and an interaction, which produces many estimated coefficients. We could spend our time carefully interpreting each coefficient, or we could calculate the expected SBP for every combination of diabetes status and health status. But Stata's margins command will estimate the expected SBP for combinations of the two predictor variables or for one predictor “adjusted for” the other. Note that the “i.” prefix is required in the regress command but not in the margins command.
Let's estimate marginal predictions of SBP for each combination of the categories of hlthstat and diabetes.
. margins diabetes#hlthstat Adjusted predictions Number of obs = 10,335 Model VCE: OLS Expression: Linear prediction, predict()
Delta-method | ||
Margin std. err. t P>|t| [95% conf. interval] | ||
diabetes# | ||
hlthstat | ||
Not diabetic # | ||
Excellent | 124.2614 .4611975 269.43 0.000 123.3574 125.1655 | |
Not diabetic # | ||
Very good | 126.8975 .4461899 284.40 0.000 126.0229 127.7721 | |
Not diabetic # | ||
Good | 131.9102 .4250916 310.31 0.000 131.0769 132.7434 | |
Not diabetic # | ||
Fair | 137.7679 .5797601 237.63 0.000 136.6315 138.9043 | |
Not diabetic # | ||
Poor | 139.0337 .9237528 150.51 0.000 137.2229 140.8444 | |
Diabetic # | ||
Excellent | 130.0417 4.595612 28.30 0.000 121.0334 139.05 | |
Diabetic # | ||
Very good | 150.1111 3.356161 44.73 0.000 143.5324 156.6898 | |
Diabetic # | ||
Good | 141.7143 1.952195 72.59 0.000 137.8876 145.541 | |
Diabetic # | ||
Fair | 150.8642 1.768852 85.29 0.000 147.3969 154.3315 | |
Diabetic # | ||
Poor | 148.2593 1.93768 76.51 0.000 144.461 152.0575 | |
The numbers reported in the “Margin” column are average values of the linear prediction of SBP for each combination of the categories of hlthstat and diabetes. For example, the output tells us that the expected SBP is 124.2614 for a person in “Excellent” health without diabetes and the expected SBP is 148.2593 for a person in “Poor” health with diabetes.
The output also reports a standard error, t statistic, p-value, and 95% confidence interval for each estimate. The t statistic tests the null hypothesis that the expected SBP is zero.
We can plot the marginal predictions and their 95% confidence intervals by typing marginsplot.
. marginsplot Variables that uniquely identify margins: diabetes hlthstat
The resulting profile plot shows a separate line for each category of hlthstat. We could change the plot by changing the order of the variables in the margins command.
. margins hlthstat#diabetes Adjusted predictions Number of obs = 10,335 Model VCE: OLS Expression: Linear prediction, predict()
Delta-method | ||
Margin std. err. t P>|t| [95% conf. interval] | ||
hlthstat | ||
diabetes | ||
Excellent # | ||
Not diabetic | 124.2614 .4611975 269.43 0.000 123.3574 125.1655 | |
Excellent # | ||
Diabetic | 130.0417 4.595612 28.30 0.000 121.0334 139.05 | |
Very good # | ||
Not diabetic | 126.8975 .4461899 284.40 0.000 126.0229 127.7721 | |
Very good # | ||
Diabetic | 150.1111 3.356161 44.73 0.000 143.5324 156.6898 | |
Good # | ||
Not diabetic | 131.9102 .4250916 310.31 0.000 131.0769 132.7434 | |
Good # | ||
Diabetic | 141.7143 1.952195 72.59 0.000 137.8876 145.541 | |
Fair # | ||
Not diabetic | 137.7679 .5797601 237.63 0.000 136.6315 138.9043 | |
Fair # | ||
Diabetic | 150.8642 1.768852 85.29 0.000 147.3969 154.3315 | |
Poor # | ||
Not diabetic | 139.0337 .9237528 150.51 0.000 137.2229 140.8444 | |
Poor # | ||
Diabetic | 148.2593 1.93768 76.51 0.000 144.461 152.0575 | |
By default, marginsplot creates a profile plot using lines. We can use the recast(bar) option if we prefer a bar chart, or “dynamite plunger plot”. The option xdimension() defines the display of the x axis.
. marginsplot, recast(bar) xdimension(hlthstat diabetes) Variables that uniquely identify margins: hlthstat diabetes
The overlap of the horizontal axis labels make them impossible to read. We can make them legible by adding the horizontal option to create a horizontal bar chart.
. marginsplot, recast(bar) horizontal xdimension(hlthstat diabetes)
Let's add more options to make our graph look nicer. We can use the plotopts(barwidth)) option to add some space between the bars. And we can use the title(), subtitle(), xtitle(), and ytitle() options to add various titles to our graph.
. marginsplot, recast(bar) xdimension(hlthstat diabetes) horizontal plotopts(barwidth(0.8)) ytitle("") xtitle("Expected systolic blood pressure") title("Expected systolic blood pressure (mmHg)") subtitle("By health status and diabetes status") Variables that uniquely identify margins: hlthstat diabetes
We can also use margins to estimate marginal predictions for one variable averaged over other variables in the model. For example, we can estimate the expected SBP for categories of diabetes averaged over hlthstat.
. margins diabetes Predictive margins Number of obs = 10,335 Model VCE: OLS Expression: Linear prediction, predict()
Delta-method | ||
Margin std. err. t P>|t| [95% conf. interval] | ||
diabetes | ||
Not diabetic | 130.3211 .2273218 573.29 0.000 129.8755 130.7667 | |
Diabetic | 143.041 1.50395 95.11 0.000 140.093 145.9891 | |
Or we could estimate the expected SBP for categories of hlthstat averaged over diabetes.
. margins hlthstat Predictive margins Number of obs = 10,335 Model VCE: OLS Expression: Linear prediction, predict()
Delta-method | ||
Margin std. err. t P>|t| [95% conf. interval] | ||
hlthstat | ||
Excellent | 124.5405 .4918267 253.22 0.000 123.5764 125.5046 | |
Very good | 128.0183 .4545142 281.66 0.000 127.1274 128.9092 | |
Good | 132.3835 .4154021 318.69 0.000 131.5693 133.1978 | |
Fair | 138.4002 .5583383 247.88 0.000 137.3058 139.4947 | |
Poor | 139.4791 .8841156 157.76 0.000 137.7461 141.2121 | |
Let's work a simpler example without the interaction to help us understand how margins works. Let's begin by using tabulate to create indicator variables for each category of hlthstat.
. tabulate hlthstat, generate(hlthstat)
Health status | Freq. Percent Cum. | |
Excellent | 2,407 23.29 23.29 | |
Very good | 2,591 25.07 48.36 | |
Good | 2,938 28.43 76.79 | |
Fair | 1,670 16.16 92.95 | |
Poor | 729 7.05 100.00 | |
Total | 10,335 100.00 |
Let's list the first 10 observations to check our work.
. list hlthstat hlthstat1 hlthstat2 hlthstat3 hlthstat4 hlthstat5 in 1/10
hlthstat hlthst~1 hlthst~2 hlthst~3 hlthst~4 hlthst~5 | |
1. | Very good 0 1 0 0 0 |
2. | Very good 0 1 0 0 0 |
3. | Good 0 0 1 0 0 |
4. | Fair 0 0 0 1 0 |
5. | Very good 0 1 0 0 0 |
6. | Poor 0 0 0 0 1 |
7. | Very good 0 1 0 0 0 |
8. | Excellent 1 0 0 0 0 |
9. | Very good 0 1 0 0 0 |
10. | Poor 0 0 0 0 1 |
Next let's fit a linear regression model including diabetes and hlthstat without the interaction. The option coeflegend displays a legend that includes terms that refer to the coefficients in the model.
. regress bpsystol i.diabetes i.hlthstat, coeflegend
Source | SS df MS | Number of obs = 10,335 | F(5, 10329) = 153.09 |
Model | 388422.444 5 77684.4888 | Prob > F = 0.0000 | |
Residual | 5241550.91 10,329 507.459668 | R-squared = 0.0690 | Adj R-squared = 0.0685 |
Total | 5629973.35 10,334 544.800982 | Root MSE = 22.527 |
bpsystol | Coefficient Legend | |
diabetes | ||
Diabetic | 11.8325 _b[1.diabetes] | |
hlthstat | ||
Very good | 2.894063 _b[2.hlthstat] | |
Good | 7.61725 _b[3.hlthstat] | |
Fair | 13.68941 _b[4.hlthstat] | |
Poor | 14.34982 _b[5.hlthstat] | |
_cons | 124.2011 _b[_cons] | |
Let's display the contents of _b[2.hlthstat] to verify that it equals 2.894063.
. display _b[2.hlthstat] 2.894063
Now we can use coefficients and indicator variables to generate a new variable that equals the expected SBP assuming every observation in the sample does not have diabetes.
. generate sbp_diab0 = _b[_cons] + _b[1.diabetes]*0 + _b[2.hlthstat] * hlthstat2 + _b[3.hlthstat] * hlthstat3 + _b[4.hlthstat] * hlthstat4 + _b[5.hlthstat] * hlthstat5 (16 missing values generated)
Next we can generate a new variable that equals the expected SBP assuming every observation in the sample has diabetes.
. generate sbp_diab1 = _b[_cons] + _b[1.diabetes]*1 + _b[2.hlthstat] * hlthstat2 + _b[3.hlthstat] * hlthstat3 + _b[4.hlthstat] * hlthstat4 + _b[5.hlthstat] * hlthstat5 (16 missing values generated)
Then we can calculate the average of the two variables to estimate the expected SBP for people with and without diabetes.
. table (), statistic(mean sbp_diab0 sbp_diab1)
sbp_diab0 | 130.3163 | |
sbp_diab1 | 142.1488 | |
This matches the results reported by margins.
. margins diabetes Predictive margins Number of obs = 10,335 Model VCE: OLS Expression: Linear prediction, predict()
Delta-method | ||
Margin std. err. t P>|t| [95% conf. interval] | ||
diabetes | ||
Not diabetic | 130.3163 .2274263 573.00 0.000 129.8705 130.7621 | |
Diabetic | 142.1488 1.0333 137.57 0.000 140.1233 144.1742 | |
In the previous example, we first calculated the response for each observation and then calculated the average of those responses. This is the default method. But we could also calculate the average covariate values first and then report the response at those average values.
Let's begin by using table to estimate the proportion of observations in each category of hlthstat.
. table (hlthstat), statistic(proportion) nformat(%12.7f proportion) nototal
Proportion | ||
Health status | ||
Excellent | 0.2328979 | |
Very good | 0.2507015 | |
Good | 0.2842767 | |
Fair | 0.1615868 | |
Poor | 0.0705370 | |
Then we can use those proportions (averages) to estimate the expected SBP assuming no one in the sample has diabetes.
. display _b[_cons] + _b[1.diabetes] * 0 + _b[2.hlthstat] * 0.2507015 + _b[3.hlthstat] * 0.2842767 + _b[4.hlthstat] * 0.1615868 + _b[5.hlthstat] * 0.0705370
We can also calculate the expected SBP assuming everyone in the sample has diabetes.
. display _b[_cons] + _b[1.diabetes] * 1 + _b[2.hlthstat] * 0.2507015 + _b[3.hlthstat] * 0.2842767 + _b[4.hlthstat] * 0.1615868 + _b[5.hlthstat] * 0.0705370
And we can check our work using margins with the atmeans option.
. margins diabetes, atmeans Adjusted predictions Number of obs = 10,335 Model VCE: OLS Expression: Linear prediction, predict() At: 0.diabetes = .9517175 (mean) 1.diabetes = .0482825 (mean) 1.hlthstat = .2328979 (mean) 2.hlthstat = .2507015 (mean) 3.hlthstat = .2842767 (mean) 4.hlthstat = .1615868 (mean) 5.hlthstat = .070537 (mean)
Delta-method | ||
Margin std. err. t P>|t| [95% conf. interval] | ||
diabetes | ||
Not diabetic | 130.3163 .2274263 573.00 0.000 129.8705 130.7621 | |
Diabetic | 142.1488 1.0333 137.57 0.000 140.1233 144.1742 | |
Estimating the average response (method 1) and the response at the average (method 2) gives us the same results for linear regression. But the results may differ for generalized linear models such as probit, logistic, or Poisson regression.
You can read more about factor-variable notation, margins, and marginsplot in the Stata documentation. You can also watch a demonstration of these commands by clicking on the links to the YouTube videos below.
Read more in the Stata Base Reference Manual; see [R] margins, [R] marginsplot, and [R] regress. And in the Stata User’s Guide, see [U-11] factor variables.