margins and marginsplot for the interaction of two categorical predictor variables

Stata's margins and marginsplot commands are powerful tools for visualizing the results of regression models. We will use linear regression below, but the same principles and syntax work with nearly all of Stata's regression commands, including probit, logistic, poisson, and others. You will want to review Stata's factor-variable notation if you have not used it before.

Let's begin by opening the nhanes2l dataset. Then let's describe and summarize the variables bpsystol, hlthstat, diabetes, age, and bmi.

. webuse nhanes2l
(Second National Health and Nutrition Examination Survey)

. describe bpsystol hlthstat diabetes age bmi


Variable      Storage   Display    Value
    name         type    format    label      Variable label
                                                                                
bpsystol        int     %9.0g                 Systolic blood pressure
hlthstat        byte    %20.0g     hlth       Health status
diabetes        byte    %12.0g     diabetes   Diabetes status
age             byte    %9.0g                 Age (years)
bmi             float   %9.0g                 Body mass index (BMI)


. summarize bpsystol hlthstat diabetes age bmi


    Variable          Obs        Mean    Std. dev.       Min        Max
   
    bpsystol       10,351    130.8817    23.33265         65        300
    hlthstat       10,335    2.586164    1.206196          1          5
    diabetes       10,349    .0482172    .2142353          0          1
         age       10,351    47.57965    17.21483         20         74
         bmi       10,351     25.5376    4.914969    12.3856    61.1297

We are going to fit a series of linear regression models for the outcome variable bpsystol, which measures systolic blood pressure (SBP) with a range of 65 to 300 mmHg. hlthstat measures health status with a range from 1 to 5. diabetes measures diabetes status with a range of 0 to 1. age measures age with a range of 20 to 74 years. And bmi measures body mass index with a range of 12.4 to 61.1 kg/m².

Let's fit a linear regression model using the continuous outcome variable bpsystol, the binary predictor variable diabetes, and the categorical predictor variable hlthstat. Note that I have used factor-variable notation to tell Stata that diabetes and hlthstat are categorical predictors, and I have used the “##” operator to request the main effects and interaction of both predictor variables.

. regress bpsystol i.hlthstat##i.diabetes


      Source         SS           df       MS     Number of obs   =    10,335
      F(9, 10325)     =     86.92
       Model    396524.045         9  44058.2272    Prob > F        =    0.0000
    Residual    5233449.31    10,325  506.871604    R-squared       =    0.0704
      Adj R-squared   =    0.0696
       Total    5629973.35    10,334  544.800982    Root MSE        =    22.514




    bpsystol   Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
   
    hlthstat                                                                  
  Very good      2.636051   .6417076     4.11   0.000      1.37818    3.893922
       Good      7.648725   .6272209    12.19   0.000     6.419251      8.8782
       Fair      13.50647   .7408272    18.23   0.000      12.0543    14.95863
       Poor      14.77223   1.032484    14.31   0.000     12.74837     16.7961
                                                                              
    diabetes                                                                  
   Diabetic      5.780232   4.618696     1.25   0.211    -3.273308    14.83377
                                                                              
    hlthstat#                                                                  
    diabetes                                                                  
  Very good #                                                                  
   Diabetic      17.43339   5.726714     3.04   0.002     6.207924    28.65886
       Good #                                                                  
   Diabetic      4.023894   5.032308     0.80   0.424    -5.840404    13.88819
       Fair #                                                                  
   Diabetic      7.316062    4.97969     1.47   0.142    -2.445096    17.07722
       Poor #                                                                  
   Diabetic      3.445358    5.09316     0.68   0.499    -6.538222    13.42894
                                                                              
       _cons     124.2614   .4611975   269.43   0.000     123.3574    125.1655

The output can be challenging to interpret because we have two predictors and an interaction, which produces many estimated coefficients. We could spend our time carefully interpreting each coefficient, or we could calculate the expected SBP for every combination of diabetes status and health status. But Stata's margins command will estimate the expected SBP for combinations of the two predictor variables or for one predictor “adjusted for” the other. Note that the “i.” prefix is required in the regress command but not in the margins command.

Let's estimate marginal predictions of SBP for each combination of the categories of hlthstat and diabetes.

. margins diabetes#hlthstat

Adjusted predictions                                    Number of obs = 10,335
Model VCE: OLS

Expression: Linear prediction, predict()



                          Delta-method                                        
                   Margin   std. err.      t    P>|t|     [95% conf. interval]
   
     diabetes#                                                                  
     hlthstat                                                                  
Not diabetic #                                                                  
   Excellent      124.2614   .4611975   269.43   0.000     123.3574    125.1655
Not diabetic #                                                                  
   Very good      126.8975   .4461899   284.40   0.000     126.0229    127.7721
Not diabetic #                                                                  
        Good      131.9102   .4250916   310.31   0.000     131.0769    132.7434
Not diabetic #                                                                  
        Fair      137.7679   .5797601   237.63   0.000     136.6315    138.9043
Not diabetic #                                                                  
        Poor      139.0337   .9237528   150.51   0.000     137.2229    140.8444
    Diabetic #                                                                  
   Excellent      130.0417   4.595612    28.30   0.000     121.0334      139.05
    Diabetic #                                                                  
   Very good      150.1111   3.356161    44.73   0.000     143.5324    156.6898
    Diabetic #                                                                  
        Good      141.7143   1.952195    72.59   0.000     137.8876     145.541
    Diabetic #                                                                  
        Fair      150.8642   1.768852    85.29   0.000     147.3969    154.3315
    Diabetic #                                                                  
        Poor      148.2593    1.93768    76.51   0.000      144.461    152.0575

The numbers reported in the “Margin” column are average values of the linear prediction of SBP for each combination of the categories of hlthstat and diabetes. For example, the output tells us that the expected SBP is 124.2614 for a person in “Excellent” health without diabetes and the expected SBP is 148.2593 for a person in “Poor” health with diabetes.

The output also reports a standard error, t statistic, p-value, and 95% confidence interval for each estimate. The t statistic tests the null hypothesis that the expected SBP is zero.

We can plot the marginal predictions and their 95% confidence intervals by typing marginsplot.

. marginsplot

Variables that uniquely identify margins: diabetes hlthstat

The resulting profile plot shows a separate line for each category of hlthstat. We could change the plot by changing the order of the variables in the margins command.

. margins hlthstat#diabetes

Adjusted predictions                                    Number of obs = 10,335
Model VCE: OLS

Expression: Linear prediction, predict()



                          Delta-method                                        
                   Margin   std. err.      t    P>|t|     [95% conf. interval]
   
     hlthstat                                                                  
     diabetes                                                                  
   Excellent #                                                                  
Not diabetic      124.2614   .4611975   269.43   0.000     123.3574    125.1655
   Excellent #                                                                  
    Diabetic      130.0417   4.595612    28.30   0.000     121.0334      139.05
   Very good #                                                                  
Not diabetic      126.8975   .4461899   284.40   0.000     126.0229    127.7721
   Very good #                                                                  
    Diabetic      150.1111   3.356161    44.73   0.000     143.5324    156.6898
        Good #                                                                  
Not diabetic      131.9102   .4250916   310.31   0.000     131.0769    132.7434
        Good #                                                                  
    Diabetic      141.7143   1.952195    72.59   0.000     137.8876     145.541
        Fair #                                                                  
Not diabetic      137.7679   .5797601   237.63   0.000     136.6315    138.9043
        Fair #                                                                  
    Diabetic      150.8642   1.768852    85.29   0.000     147.3969    154.3315
        Poor #                                                                  
Not diabetic      139.0337   .9237528   150.51   0.000     137.2229    140.8444
        Poor #                                                                  
    Diabetic      148.2593    1.93768    76.51   0.000      144.461    152.0575



. marginsplot

Variables that uniquely identify margins: hlthstat diabetes

By default, marginsplot creates a profile plot using lines. We can use the recast(bar) option if we prefer a bar chart, or “dynamite plunger plot”. The option xdimension() defines the display of the x axis.

. marginsplot, recast(bar) xdimension(hlthstat diabetes)

Variables that uniquely identify margins: hlthstat diabetes

The overlap of the horizontal axis labels make them impossible to read. We can make them legible by adding the horizontal option to create a horizontal bar chart.

. marginsplot, recast(bar) horizontal xdimension(hlthstat diabetes)

Let's add more options to make our graph look nicer. We can use the plotopts(barwidth)) option to add some space between the bars. And we can use the title(), subtitle(), xtitle(), and ytitle() options to add various titles to our graph.

. marginsplot, recast(bar) xdimension(hlthstat diabetes)
               horizontal plotopts(barwidth(0.8)) 
               ytitle("") 
               xtitle("Expected systolic blood pressure") 
               title("Expected systolic blood pressure (mmHg)") 
               subtitle("By health status and diabetes status")

Variables that uniquely identify margins: hlthstat diabetes

Marginal effects

We can also use margins to estimate marginal predictions for one variable averaged over other variables in the model. For example, we can estimate the expected SBP for categories of diabetes averaged over hlthstat.

. margins diabetes

Predictive margins                                      Number of obs = 10,335
Model VCE: OLS

Expression: Linear prediction, predict()



                           Delta-method                                        
                    Margin   std. err.      t    P>|t|     [95% conf. interval]
   
     diabetes                                                                  
Not diabetic      130.3211   .2273218   573.29   0.000     129.8755    130.7667
    Diabetic       143.041    1.50395    95.11   0.000      140.093    145.9891

Or we could estimate the expected SBP for categories of hlthstat averaged over diabetes.

. margins hlthstat

Predictive margins                                      Number of obs = 10,335
Model VCE: OLS

Expression: Linear prediction, predict()



                           Delta-method                                        
                    Margin   std. err.      t    P>|t|     [95% conf. interval]
   
    hlthstat                                                                   
  Excellent       124.5405   .4918267   253.22   0.000     123.5764    125.5046
  Very good       128.0183   .4545142   281.66   0.000     127.1274    128.9092
       Good       132.3835   .4154021   318.69   0.000     131.5693    133.1978
       Fair       138.4002   .5583383   247.88   0.000     137.3058    139.4947
       Poor       139.4791   .8841156   157.76   0.000     137.7461    141.2121

How does it work?

Method 1: Average response

Let's work a simpler example without the interaction to help us understand how margins works. Let's begin by using tabulate to create indicator variables for each category of hlthstat.

. tabulate hlthstat, generate(hlthstat)


       Health status        Freq.     Percent        Cum.
   
           Excellent        2,407       23.29       23.29
           Very good        2,591       25.07       48.36
                Good        2,938       28.43       76.79
                Fair        1,670       16.16       92.95
                Poor          729        7.05      100.00
   
               Total       10,335      100.00

Let's list the first 10 observations to check our work.

. list hlthstat hlthstat1 hlthstat2 hlthstat3 hlthstat4 hlthstat5 in 1/10


       hlthstat   hlthst~1   hlthst~2   hlthst~3   hlthst~4   hlthst~5 
  1.  Very good          0          1          0          0          0 
  2.  Very good          0          1          0          0          0 
  3.       Good          0          0          1          0          0 
  4.       Fair          0          0          0          1          0 
  5.  Very good          0          1          0          0          0 
  6.       Poor          0          0          0          0          1 
  7.  Very good          0          1          0          0          0 
  8.  Excellent          1          0          0          0          0 
  9.  Very good          0          1          0          0          0 
 10.       Poor          0          0          0          0          1

Next let's fit a linear regression model including diabetes and hlthstat without the interaction. The option coeflegend displays a legend that includes terms that refer to the coefficients in the model.

. regress bpsystol i.diabetes i.hlthstat, coeflegend


      Source         SS           df       MS     Number of obs   =    10,335
      F(5, 10329)     =    153.09
       Model    388422.444         5  77684.4888    Prob > F        =    0.0000
    Residual    5241550.91    10,329  507.459668    R-squared       =    0.0690
      Adj R-squared   =    0.0685
       Total    5629973.35    10,334  544.800982    Root MSE        =    22.527




    bpsystol   Coefficient  Legend                                            
   
    diabetes                                                                  
   Diabetic       11.8325  _b[1.diabetes]                                     
                                                                              
    hlthstat                                                                  
  Very good      2.894063  _b[2.hlthstat]                                     
       Good       7.61725  _b[3.hlthstat]                                     
       Fair      13.68941  _b[4.hlthstat]                                     
       Poor      14.34982  _b[5.hlthstat]                                     
                                                                              
       _cons     124.2011  _b[_cons]

Let's display the contents of _b[2.hlthstat] to verify that it equals 2.894063.

. display _b[2.hlthstat]
2.894063

Now we can use coefficients and indicator variables to generate a new variable that equals the expected SBP assuming every observation in the sample does not have diabetes.

. generate sbp_diab0 = _b[_cons] + _b[1.diabetes]*0 + _b[2.hlthstat] * hlthstat2
                                                      + _b[3.hlthstat] * hlthstat3
                                                      + _b[4.hlthstat] * hlthstat4
                                                      + _b[5.hlthstat] * hlthstat5
(16 missing values generated)

Next we can generate a new variable that equals the expected SBP assuming every observation in the sample has diabetes.

. generate sbp_diab1 = _b[_cons] + _b[1.diabetes]*1 + _b[2.hlthstat] * hlthstat2
                                                      + _b[3.hlthstat] * hlthstat3
                                                      + _b[4.hlthstat] * hlthstat4
                                                      + _b[5.hlthstat] * hlthstat5
(16 missing values generated)

Then we can calculate the average of the two variables to estimate the expected SBP for people with and without diabetes.

. table (), statistic(mean sbp_diab0 sbp_diab1)



sbp_diab0    130.3163
sbp_diab1    142.1488

This matches the results reported by margins.

. margins diabetes

Predictive margins                                      Number of obs = 10,335
Model VCE: OLS

Expression: Linear prediction, predict()



                           Delta-method                                        
                    Margin   std. err.      t    P>|t|     [95% conf. interval]
   
     diabetes                                                                  
Not diabetic      130.3163   .2274263   573.00   0.000     129.8705    130.7621
    Diabetic      142.1488     1.0333   137.57   0.000     140.1233    144.1742

Method 2: Response at average

In the previous example, we first calculated the response for each observation and then calculated the average of those responses. This is the default method. But we could also calculate the average covariate values first and then report the response at those average values.

Let's begin by using table to estimate the proportion of observations in each category of hlthstat.

. table (hlthstat), statistic(proportion) nformat(%12.7f proportion) nototal



                 Proportion
   
Health status              
  Excellent       0.2328979
  Very good       0.2507015
  Good            0.2842767
  Fair            0.1615868
  Poor            0.0705370

Then we can use those proportions (averages) to estimate the expected SBP assuming no one in the sample has diabetes.

. display _b[_cons] + _b[1.diabetes] * 0  + _b[2.hlthstat] * 0.2507015
                                            + _b[3.hlthstat] * 0.2842767
                                            + _b[4.hlthstat] * 0.1615868
                                            + _b[5.hlthstat] * 0.0705370

We can also calculate the expected SBP assuming everyone in the sample has diabetes.

. display _b[_cons] + _b[1.diabetes] * 1  + _b[2.hlthstat] * 0.2507015
                                            + _b[3.hlthstat] * 0.2842767
                                            + _b[4.hlthstat] * 0.1615868
                                            + _b[5.hlthstat] * 0.0705370

And we can check our work using margins with the atmeans option.

. margins diabetes, atmeans

Adjusted predictions                                    Number of obs = 10,335
Model VCE: OLS

Expression: Linear prediction, predict()
At: 0.diabetes = .9517175 (mean)
    1.diabetes = .0482825 (mean)
    1.hlthstat = .2328979 (mean)
    2.hlthstat = .2507015 (mean)
    3.hlthstat = .2842767 (mean)
    4.hlthstat = .1615868 (mean)
    5.hlthstat =  .070537 (mean)



                           Delta-method                                        
                    Margin   std. err.      t    P>|t|     [95% conf. interval]
   
     diabetes                                                                  
Not diabetic      130.3163   .2274263   573.00   0.000     129.8705    130.7621
    Diabetic      142.1488     1.0333   137.57   0.000     140.1233    144.1742

Estimating the average response (method 1) and the response at the average (method 2) gives us the same results for linear regression. But the results may differ for generalized linear models such as probit, logistic, or Poisson regression.

You can read more about factor-variable notation, margins, and marginsplot in the Stata documentation. You can also watch a demonstration of these commands by clicking on the links to the YouTube videos below.

See it in action

Watch Introduction to margins in Stata, part 3: Interactions.

Watch Profile plots and interaction plots in Stata: Interactions of categorical and continuous variables.

Tell me more

Products

New in Stata 19

Why Stata

All features

Disciplines

Stata/MP

StataNow

Order Stata

Purchase

Order Stata

Bookstore

Stata Press

Stata Journal

Gift Shop

Learn

Free webinars

NetCourses

Classroom and web training

Organizational training

Video tutorials

Third-party courses

Web resources

Teaching with Stata

Support

Training

Video tutorials

FAQs

Statalist: The Stata Forum

Resources

Technical support

Customer service

Alerts

Company

News and events

Customer service

Careers

We use cookies

We use cookies to ensure that we give you the best experience on our website—to enhance site navigation, to analyze usage, and to assist in our marketing efforts. By continuing to use our site, you consent to the storing of cookies on your device and agree to delivery of content, including web fonts and JavaScript, from third party web services.

Cookie Settings

Privacy policy

Last updated: 16 November 2022

StataCorp LLC (StataCorp) strives to provide our users with exceptional products and services. To do so, we must collect personal information from you. This information is necessary to conduct business with our existing and potential customers. We collect and use this information only where we may legally do so. This policy explains what personal information we collect, how we use it, and what rights you have to that information.

Required cookies

Advertising cookies

Required cookies

These cookies are essential for our website to function and do not store any personally identifiable information. These cookies cannot be disabled.
Advertising and performance cookies

This website uses cookies to provide you with a better user experience. A cookie is a small piece of data our website stores on a site visitor's hard drive and accesses each time you visit so we can improve your access to our site, better understand how you use our site, and serve you content that may be of interest to you. For instance, we store a cookie when you log in to our shopping cart so that we can maintain your shopping cart should you not complete checkout. These cookies do not directly store your personal information, but they do support the ability to uniquely identify your internet browser and device.

Please note: Clearing your browser cookies at any time will undo preferences saved here. The option selected here will apply only to the device you are currently using.

Accept Cookies

Variable		Obs Mean Std. dev. Min Max

bpsystol		10,351 130.8817 23.33265 65 300
hlthstat		10,335 2.586164 1.206196 1 5
diabetes		10,349 .0482172 .2142353 0 1
age		10,351 47.57965 17.21483 20 74
bmi		10,351 25.5376 4.914969 12.3856 61.1297

Source	SS df MS	Number of obs = 10,335
		F(9, 10325) = 86.92
Model	396524.045 9 44058.2272	Prob > F = 0.0000
Residual	5233449.31 10,325 506.871604	R-squared = 0.0704
		Adj R-squared = 0.0696
Total	5629973.35 10,334 544.800982	Root MSE = 22.514


bpsystol		Coefficient Std. err. t P>\|t\| [95% conf. interval]

hlthstat
Very good		2.636051 .6417076 4.11 0.000 1.37818 3.893922
Good		7.648725 .6272209 12.19 0.000 6.419251 8.8782
Fair		13.50647 .7408272 18.23 0.000 12.0543 14.95863
Poor		14.77223 1.032484 14.31 0.000 12.74837 16.7961

diabetes
Diabetic		5.780232 4.618696 1.25 0.211 -3.273308 14.83377

hlthstat#
diabetes
Very good #
Diabetic		17.43339 5.726714 3.04 0.002 6.207924 28.65886
Good #
Diabetic		4.023894 5.032308 0.80 0.424 -5.840404 13.88819
Fair #
Diabetic		7.316062 4.97969 1.47 0.142 -2.445096 17.07722
Poor #
Diabetic		3.445358 5.09316 0.68 0.499 -6.538222 13.42894

_cons		124.2614 .4611975 269.43 0.000 123.3574 125.1655


		Delta-method
		Margin std. err. t P>\|t\| [95% conf. interval]

diabetes#
hlthstat
Not diabetic #
Excellent		124.2614 .4611975 269.43 0.000 123.3574 125.1655
Not diabetic #
Very good		126.8975 .4461899 284.40 0.000 126.0229 127.7721
Not diabetic #
Good		131.9102 .4250916 310.31 0.000 131.0769 132.7434
Not diabetic #
Fair		137.7679 .5797601 237.63 0.000 136.6315 138.9043
Not diabetic #
Poor		139.0337 .9237528 150.51 0.000 137.2229 140.8444
Diabetic #
Excellent		130.0417 4.595612 28.30 0.000 121.0334 139.05
Diabetic #
Very good		150.1111 3.356161 44.73 0.000 143.5324 156.6898
Diabetic #
Good		141.7143 1.952195 72.59 0.000 137.8876 145.541
Diabetic #
Fair		150.8642 1.768852 85.29 0.000 147.3969 154.3315
Diabetic #
Poor		148.2593 1.93768 76.51 0.000 144.461 152.0575

Health status		Freq. Percent Cum.

Excellent		2,407 23.29 23.29
Very good		2,591 25.07 48.36
Good		2,938 28.43 76.79
Fair		1,670 16.16 92.95
Poor		729 7.05 100.00

Total		10,335 100.00

	hlthstat hlthst~1 hlthst~2 hlthst~3 hlthst~4 hlthst~5
1.	Very good 0 1 0 0 0
2.	Very good 0 1 0 0 0
3.	Good 0 0 1 0 0
4.	Fair 0 0 0 1 0
5.	Very good 0 1 0 0 0
6.	Poor 0 0 0 0 1
7.	Very good 0 1 0 0 0
8.	Excellent 1 0 0 0 0
9.	Very good 0 1 0 0 0
10.	Poor 0 0 0 0 1


bpsystol		Coefficient Legend

diabetes
Diabetic		11.8325 _b[1.diabetes]

hlthstat
Very good		2.894063 _b[2.hlthstat]
Good		7.61725 _b[3.hlthstat]
Fair		13.68941 _b[4.hlthstat]
Poor		14.34982 _b[5.hlthstat]

_cons		124.2011 _b[_cons]


		Proportion

Health status
Excellent		0.2328979
Very good		0.2507015
Good		0.2842767
Fair		0.1615868
Poor		0.0705370