Home  /  Resources & support  /  FAQs  /  How to incorporate statistics from postestimation commands to etable?

How can I include the statistics generated by postestimation commands in an estimation table created using etable?

Title   How can I add the goodness-of-fit test statistics or classification statistics to the estimation results of logistic regression models within a table?
Author Pei-Chun Lai, StataCorp

The etable command is used to efficiently create tables that display the estimation results. It is straightforward to include model statistics such as the \(R^{2}\) or the F statistic, which are computed by the estimation command itself, in the estimation table by using the mstat() option with etable. However, if we execute a postestimation command after fitting a regression model, it is possible to include the results from the postestimation command in the table as well.

When we run a postestimation command, the statistics generated by this command are stored as scalars in r() (return results). These scalars are accessible. We also can specify these scalars in the mstat() option to include them in the generated table. Below, you will learn how to add results from one or multiple postestimation commands for one or multiple regression models to a table of estimation results.

1. Incorporating results from one postestimation command into the estimation table

1a: A single estimation model

First, let’s consider an example that uses the estat gof command to perform the goodness-of-fit test after executing a logit command. Below is the initial code:

. clear all

. webuse lbw
(Hosmer & Lemeshow data)

. quietly: logit low age lwt smoke ptl ht ui i.race

. estat gof

Goodness-of-fit test after logistic model
Variable: low

      Number of observations =    189
Number of covariate patterns =    182
           Pearson chi2(173) = 179.24
                 Prob > chi2 = 0.3567

After running estat gof, the following scalars are stored in r():

  • r(p) stores the p-value for the \(\chi^2\) test.
  • r(df) stores the degrees of freedom.
  • r(chi2) stores the \(\chi^2\) statistic.
  • r(m) stores the number of covariate patterns or groups.
  • r(N) stores the number of observations.

These r() scalars can be viewed by executing return list after implementing estat gof:

. return list

scalars:
                  r(p) =  .3567000894166227
                 r(df) =  173
               r(chi2) =  179.2406804943249
                  r(m) =  182
                  r(N) =  189

Typing etable gives us a basic table of the estimation results. By default, this table displays the estimated coefficients, their standard errors in parentheses below the coefficients, and the number of observations.

. etable

low
Age of mother -0.027 (0.036) Weight at last menstrual period -0.015 (0.007) Smoked during pregnancy 0.923 (0.401) Premature labor history (count) 0.542 (0.346) Has history of hypertension 1.833 (0.692) Presence, uterine irritability 0.759 (0.459) Race Black 1.263 (0.526) Other 0.862 (0.439) Intercept 0.461 (1.205) Number of observations 189

To include the Pearson \(\chi^2\) statistic (stored as r(chi2)) and the corresponding p-values from the goodness-of-fit test (stored as r(p)) in the above estimation table, we can add the options mstat(chi2_gof=r(chi2), label(GOF \(\chi^2\))) and mstat(p_gof=r(p), label(GOF p-value)) to the etable command. The mstat() option enables us to incorporate model statistics. The syntax here is mstat(name=exp, label()). The label suboption is used to modify the label for the specified model statistic.

Note that by default, the mstat() option picks up the e() results. If we directly type mstat(chi2) instead of mstat(chi2_gof=r(chi2)), the table will display the value of e(chi2). This value represents the likelihood \(\chi^2\) test statistic for the logistic regression model generated by the logit command. Therefore, we need to type mstat(chi2_gof=r(chi2)) to display the Pearson \(\chi^2\) statistic generated by the estat gof command. The mstat() options are specified twice here to allow for the inclusion of multiple model statistics.

Let's run the following etable command:

. etable, mstat(chi2_gof=(r(chi2)), label(GOF χ²))
     mstat(p_gof=(r(p)), label(GOF p-value) nformat(%6.4f))
     showstars showstarsnote

The table appears as follows:

low
Age of mother -0.027 (0.036) Weight at last menstrual period -0.015 * (0.007) Smoked during pregnancy 0.923 * (0.401) Premature labor history (count) 0.542 (0.346) Has history of hypertension 1.833 ** (0.692) Presence, uterine irritability 0.759 (0.459) Race Black 1.263 * (0.526) Other 0.862 * (0.439) Intercept 0.461 (1.205) GOF χ² 179.24 GOF p-value 0.3567
** p<.01, * p<.05

We use the nformat(%6.4f) option to specify the format for the numeric value of the p-value, displaying four digits after the decimal points. The showstars option shows the stars for indicating the significant results, and the showstarsnote option provides an explanatory note for these stars.

1b: Multiple estimation models

To compare the goodness-of-fit statistics across multiple logistic regression models, what steps should we take?

Suppose we plan to create a single table that contains all of these statistics for three different logistic regression models. We can first use the code above to fit the first logistic regression model and generate the corresponding table with the statistics we want. Then we fit the second model, run the postestimation command estat gof, and call etable with the append option to add these new results to the existing table. We can repeat this process again for the third model. Below is the example code:

. clear all
. webuse lbw
(Hosmer & Lemeshow data)

. quietly: logit low age lwt smoke ptl ht ui i.race
. estat gof
. etable, column(index) 
     mstat(chi2_gof=(r(chi2)), label(GOF χ²)) 
     mstat(p_gof=(r(p)), label(GOF p-value) nformat(%6.4f))
     showstars showstarsnote

. quietly: logit low age lwt smoke ptl ht ui
. estat gof
. etable, append

. quietly: logit low age lwt smoke ht ui
. estat gof
. etable, append

The table appears as follows:

1 2 3
Age of mother -0.027 -0.042 -0.034 (0.036) (0.035) (0.034) Weight at last menstrual period -0.015 * -0.014 * -0.015 * (0.007) (0.007) (0.007) Smoked during pregnancy 0.923 * 0.551 0.647 (0.401) (0.344) (0.337) Premature labor history (count) 0.542 0.593 (0.346) (0.348) Has history of hypertension 1.833 ** 1.862 ** 1.892 ** (0.692) (0.686) (0.683) Presence, uterine irritability 0.759 0.737 0.885 * (0.459) (0.456) (0.444) Race Black 1.263 * (0.526) Other 0.862 * (0.439) Intercept 0.461 1.379 1.397 (1.205) (1.089) (1.080) GOF χ² 179.24 180.08 178.61 GOF p-value 0.3567 0.3021 0.3294
** p<.01, * p<.05

By default, etable uses the dependent variable name as the column headers. Because these three logistic regression models share a common response variable, low, we specify the column(index) option to use the result set index as the column headers instead. This will help distinguish between each model.

2. Adding results from multiple postestimation commands for one model to an estimation table

Next let’s consider executing two postestimation commands, estat classification and estat gof, after running the logit command. We would like to add the percentage of correct classification r(P_corr) obtained from running estat classification, along with the Pearson \(\chi^2\) statistic (r(chi2)) and the corresponding p-values (r(p)) obtained from estat gof to an estimation table.

2a: A single estimation model

We can use local macro names to store the values of the scalars: r(P_corr), r(chi2), and r(p). The macros can be treated as scalar expressions and incorporated into the mstat() option to generate the table. Below is the sample code:

. clear all
. webuse lbw
(Hosmer & Lemeshow data)

. quietly: logit low age lwt smoke ptl ht ui i.race

. estat classification
. local prob_corr=r(P_corr)

. estat gof
. local chi2_gof=r(chi2)
. local p_gof=r(p)

. etable,
     mstat(P_correct=`prob_corr', label(Prob. of correct classification) sformat("%s%%")) 
     mstat(chi2_gof=`chi2_gof' , label(GOF χ²)) 
     mstat(p_gof=`p_gof', label(GOF p-value) nformat(%6.4f)) 
     showstars showstarsnote

The generated table looks as follows:

low
Age of mother -0.027 (0.036) Weight at last menstrual period -0.015 * (0.007) Smoked during pregnancy 0.923 * (0.401) Premature labor history (count) 0.542 (0.346) Has history of hypertension 1.833 ** (0.692) Presence, uterine irritability 0.759 (0.459) Race Black 1.263 * (0.526) Other 0.862 * (0.439) Intercept 0.461 (1.205) Prob. of correct classification 73.54% GOF χ² 179.24 GOF p-value 0.3567
** p<.01, * p<.05

In the above example, we suggest using the macros to store the values of these r() scalars and specifying these macros in the mstat() option.

This is because the return scalar r(P_corr) created by the first postestimation command, estat classification, will be erased after executing the second postestimation command, estat gof. Therefore, we use local macros to preserve these values and cannot simply incorporate those scalars directly into the mstat() option.

2b: Multiple estimation models

Further, if we want to compare those statistics across multiple logistic regression models, we can repeat all the steps for each estimation model. This includes using the long etable command with the append option. However, this approach could result in lengthy code, so we write a loop to simplify the code. For example,

. clear all
. webuse lbw
(Hosmer & Lemeshow data)

. * Store models to fit in local macros 
      local m1 "logit low age lwt smoke ptl ht ui i.race"
      local m2 "logit low age lwt smoke ptl ht ui"
      local m3 "logit low age lwt smoke ht ui"

. * Use a -forvalues- loop to implement the logit commands and create the table repeatedly.
      forvalues i=1(1)3{
        quietly: `m`i''

        * Get classification accuracy
        estat classification
        local prob_corr=r(P_corr)

        * Get goodness-of-fit statistics
        estat gof
        local chi2_gof=r(chi2)
        local p_gof=r(p)
  
        * Add results to etable
        etable, column(index)  
        mstat(P_correct=`prob_corr', label(Probability of correct classification) sformat("%s%%")) 
        mstat(chi2_gof=`chi2_gof', label(GOF χ²)) 
        mstat(p_gof=`p_gof', label(GOF p-value) nformat(%6.4f))
        showstars showstarsnote append
}

Alternatively, we can define a short program that stores the results from estat classification and estat gof in the r() list. Let’s use the code below to define the program my_estat:

. program my_estat, rclass
     estat classification
     return add
     estat gof
     return add
end

The return add command adds the current values of r(P_corr), r(chi2), and r(p) to the r() list at the end of an r-class program.

We use this program along with the etable command, as shown in the following code. After implementing my_estat, the new values of r(P_corr), r(chi2), and r(p) are stored in the r() list. The mstat() option can use these new values for each model. This will allow us to create a table to compare r(P_corr), r(chi2), and r(p) across multiple logistic regression models. Here is the code:

. clear all
. program my_estat, rclass
     estat classification
     return add
     estat gof
     return add
end

.webuse lbw
(Hosmer & Lemeshow data)

.quietly: logit low age lwt smoke ptl ht ui i.race
.my_estat
.etable, column(index) 
     mstat(P_correct=r(P_corr), label(Prob. of correct classification) sformat("%s%%")) 
     mstat(chi2_gof=r(chi2), label(GOF χ²)) 
     mstat(p_gof=r(p), label(GOF p-value) nformat(%6.4f))
     showstars showstarsnote

.quietly: logit low age lwt smoke ptl ht ui
.my_estat
.etable, append 

.quietly: logit low age lwt smoke ht ui
.my_estat
.etable, append 

The created table looks as follows:

1 2 3
Age of mother -0.027 -0.042 -0.034 (0.036) (0.035) (0.034) Weight at last menstrual period -0.015 * -0.014 * -0.015 * (0.007) (0.007) (0.007) Smoked during pregnancy 0.923 * 0.551 0.647 (0.401) (0.344) (0.337) Premature labor history (count) 0.542 0.593 (0.346) (0.348) Has history of hypertension 1.833 ** 1.862 ** 1.892 ** (0.692) (0.686) (0.683) Presence, uterine irritability 0.759 0.737 0.885 * (0.459) (0.456) (0.444) Race Black 1.263 * (0.526) Other 0.862 * (0.439) Intercept 0.461 1.379 1.397 (1.205) (1.089) (1.080) Prob. of correct classification 73.54% 70.37% 72.49% GOF χ² 179.24 180.08 178.61 GOF p-value 0.3567 0.3021 0.3294
** p<.01, * p<.05

For more information about etable, please see the following:

stata.com/stata-news/news38-1/etable/

stata.com/features/overview/tables-of-estimation-results/