
Title | How can I add the goodness-of-fit test statistics or classification statistics to the estimation results of logistic regression models within a table? | |
Author | Pei-Chun Lai, StataCorp |
The etable command is used to efficiently create tables that display the estimation results. It is straightforward to include model statistics such as the \(R^{2}\) or the F statistic, which are computed by the estimation command itself, in the estimation table by using the mstat() option with etable. However, if we execute a postestimation command after fitting a regression model, it is possible to include the results from the postestimation command in the table as well.
When we run a postestimation command, the statistics generated by this command are stored as scalars in r() (return results). These scalars are accessible. We also can specify these scalars in the mstat() option to include them in the generated table. Below, you will learn how to add results from one or multiple postestimation commands for one or multiple regression models to a table of estimation results.
First, let’s consider an example that uses the estat gof command to perform the goodness-of-fit test after executing a logit command. Below is the initial code:
. clear all . webuse lbw (Hosmer & Lemeshow data) . quietly: logit low age lwt smoke ptl ht ui i.race . estat gof Goodness-of-fit test after logistic model Variable: low Number of observations = 189 Number of covariate patterns = 182 Pearson chi2(173) = 179.24 Prob > chi2 = 0.3567
After running estat gof, the following scalars are stored in r():
These r() scalars can be viewed by executing return list after implementing estat gof:
. return list scalars: r(p) = .3567000894166227 r(df) = 173 r(chi2) = 179.2406804943249 r(m) = 182 r(N) = 189
Typing etable gives us a basic table of the estimation results. By default, this table displays the estimated coefficients, their standard errors in parentheses below the coefficients, and the number of observations.
. etable
low |
Age of mother -0.027 (0.036) Weight at last menstrual period -0.015 (0.007) Smoked during pregnancy 0.923 (0.401) Premature labor history (count) 0.542 (0.346) Has history of hypertension 1.833 (0.692) Presence, uterine irritability 0.759 (0.459) Race Black 1.263 (0.526) Other 0.862 (0.439) Intercept 0.461 (1.205) Number of observations 189 |
To include the Pearson \(\chi^2\) statistic (stored as r(chi2)) and the corresponding p-values from the goodness-of-fit test (stored as r(p)) in the above estimation table, we can add the options mstat(chi2_gof=r(chi2), label(GOF \(\chi^2\))) and mstat(p_gof=r(p), label(GOF p-value)) to the etable command. The mstat() option enables us to incorporate model statistics. The syntax here is mstat(name=exp, label()). The label suboption is used to modify the label for the specified model statistic.
Note that by default, the mstat() option picks up the e() results. If we directly type mstat(chi2) instead of mstat(chi2_gof=r(chi2)), the table will display the value of e(chi2). This value represents the likelihood \(\chi^2\) test statistic for the logistic regression model generated by the logit command. Therefore, we need to type mstat(chi2_gof=r(chi2)) to display the Pearson \(\chi^2\) statistic generated by the estat gof command. The mstat() options are specified twice here to allow for the inclusion of multiple model statistics.
Let's run the following etable command:
. etable, mstat(chi2_gof=(r(chi2)), label(GOF χ²)) mstat(p_gof=(r(p)), label(GOF p-value) nformat(%6.4f)) showstars showstarsnote
The table appears as follows:
low |
Age of mother -0.027 (0.036) Weight at last menstrual period -0.015 * (0.007) Smoked during pregnancy 0.923 * (0.401) Premature labor history (count) 0.542 (0.346) Has history of hypertension 1.833 ** (0.692) Presence, uterine irritability 0.759 (0.459) Race Black 1.263 * (0.526) Other 0.862 * (0.439) Intercept 0.461 (1.205) GOF χ² 179.24 GOF p-value 0.3567 |
We use the nformat(%6.4f) option to specify the format for the numeric value of the p-value, displaying four digits after the decimal points. The showstars option shows the stars for indicating the significant results, and the showstarsnote option provides an explanatory note for these stars.
To compare the goodness-of-fit statistics across multiple logistic regression models, what steps should we take?
Suppose we plan to create a single table that contains all of these statistics for three different logistic regression models. We can first use the code above to fit the first logistic regression model and generate the corresponding table with the statistics we want. Then we fit the second model, run the postestimation command estat gof, and call etable with the append option to add these new results to the existing table. We can repeat this process again for the third model. Below is the example code:
. clear all . webuse lbw (Hosmer & Lemeshow data) . quietly: logit low age lwt smoke ptl ht ui i.race . estat gof . etable, column(index) mstat(chi2_gof=(r(chi2)), label(GOF χ²)) mstat(p_gof=(r(p)), label(GOF p-value) nformat(%6.4f)) showstars showstarsnote . quietly: logit low age lwt smoke ptl ht ui . estat gof . etable, append . quietly: logit low age lwt smoke ht ui . estat gof . etable, append
The table appears as follows:
1 2 3 |
Age of mother -0.027 -0.042 -0.034 (0.036) (0.035) (0.034) Weight at last menstrual period -0.015 * -0.014 * -0.015 * (0.007) (0.007) (0.007) Smoked during pregnancy 0.923 * 0.551 0.647 (0.401) (0.344) (0.337) Premature labor history (count) 0.542 0.593 (0.346) (0.348) Has history of hypertension 1.833 ** 1.862 ** 1.892 ** (0.692) (0.686) (0.683) Presence, uterine irritability 0.759 0.737 0.885 * (0.459) (0.456) (0.444) Race Black 1.263 * (0.526) Other 0.862 * (0.439) Intercept 0.461 1.379 1.397 (1.205) (1.089) (1.080) GOF χ² 179.24 180.08 178.61 GOF p-value 0.3567 0.3021 0.3294 |
By default, etable uses the dependent variable name as the column headers. Because these three logistic regression models share a common response variable, low, we specify the column(index) option to use the result set index as the column headers instead. This will help distinguish between each model.
Next let’s consider executing two postestimation commands, estat classification and estat gof, after running the logit command. We would like to add the percentage of correct classification r(P_corr) obtained from running estat classification, along with the Pearson \(\chi^2\) statistic (r(chi2)) and the corresponding p-values (r(p)) obtained from estat gof to an estimation table.
We can use local macro names to store the values of the scalars: r(P_corr), r(chi2), and r(p). The macros can be treated as scalar expressions and incorporated into the mstat() option to generate the table. Below is the sample code:
. clear all . webuse lbw (Hosmer & Lemeshow data) . quietly: logit low age lwt smoke ptl ht ui i.race . estat classification . local prob_corr=r(P_corr) . estat gof . local chi2_gof=r(chi2) . local p_gof=r(p) . etable, mstat(P_correct=`prob_corr', label(Prob. of correct classification) sformat("%s%%")) mstat(chi2_gof=`chi2_gof' , label(GOF χ²)) mstat(p_gof=`p_gof', label(GOF p-value) nformat(%6.4f)) showstars showstarsnote
The generated table looks as follows:
low |
Age of mother -0.027 (0.036) Weight at last menstrual period -0.015 * (0.007) Smoked during pregnancy 0.923 * (0.401) Premature labor history (count) 0.542 (0.346) Has history of hypertension 1.833 ** (0.692) Presence, uterine irritability 0.759 (0.459) Race Black 1.263 * (0.526) Other 0.862 * (0.439) Intercept 0.461 (1.205) Prob. of correct classification 73.54% GOF χ² 179.24 GOF p-value 0.3567 |
In the above example, we suggest using the macros to store the values of these r() scalars and specifying these macros in the mstat() option.
This is because the return scalar r(P_corr) created by the first postestimation command, estat classification, will be erased after executing the second postestimation command, estat gof. Therefore, we use local macros to preserve these values and cannot simply incorporate those scalars directly into the mstat() option.
Further, if we want to compare those statistics across multiple logistic regression models, we can repeat all the steps for each estimation model. This includes using the long etable command with the append option. However, this approach could result in lengthy code, so we write a loop to simplify the code. For example,
. clear all . webuse lbw (Hosmer & Lemeshow data) . * Store models to fit in local macros local m1 "logit low age lwt smoke ptl ht ui i.race" local m2 "logit low age lwt smoke ptl ht ui" local m3 "logit low age lwt smoke ht ui" . * Use a -forvalues- loop to implement the logit commands and create the table repeatedly. forvalues i=1(1)3{ quietly: `m`i'' * Get classification accuracy estat classification local prob_corr=r(P_corr) * Get goodness-of-fit statistics estat gof local chi2_gof=r(chi2) local p_gof=r(p) * Add results to etable etable, column(index) mstat(P_correct=`prob_corr', label(Probability of correct classification) sformat("%s%%")) mstat(chi2_gof=`chi2_gof', label(GOF χ²)) mstat(p_gof=`p_gof', label(GOF p-value) nformat(%6.4f)) showstars showstarsnote append }
Alternatively, we can define a short program that stores the results from estat classification and estat gof in the r() list. Let’s use the code below to define the program my_estat:
. program my_estat, rclass estat classification return add estat gof return add end
The return add command adds the current values of r(P_corr), r(chi2), and r(p) to the r() list at the end of an r-class program.
We use this program along with the etable command, as shown in the following code. After implementing my_estat, the new values of r(P_corr), r(chi2), and r(p) are stored in the r() list. The mstat() option can use these new values for each model. This will allow us to create a table to compare r(P_corr), r(chi2), and r(p) across multiple logistic regression models. Here is the code:
. clear all . program my_estat, rclass estat classification return add estat gof return add end .webuse lbw (Hosmer & Lemeshow data) .quietly: logit low age lwt smoke ptl ht ui i.race .my_estat .etable, column(index) mstat(P_correct=r(P_corr), label(Prob. of correct classification) sformat("%s%%")) mstat(chi2_gof=r(chi2), label(GOF χ²)) mstat(p_gof=r(p), label(GOF p-value) nformat(%6.4f)) showstars showstarsnote .quietly: logit low age lwt smoke ptl ht ui .my_estat .etable, append .quietly: logit low age lwt smoke ht ui .my_estat .etable, append
The created table looks as follows:
1 2 3 |
Age of mother -0.027 -0.042 -0.034 (0.036) (0.035) (0.034) Weight at last menstrual period -0.015 * -0.014 * -0.015 * (0.007) (0.007) (0.007) Smoked during pregnancy 0.923 * 0.551 0.647 (0.401) (0.344) (0.337) Premature labor history (count) 0.542 0.593 (0.346) (0.348) Has history of hypertension 1.833 ** 1.862 ** 1.892 ** (0.692) (0.686) (0.683) Presence, uterine irritability 0.759 0.737 0.885 * (0.459) (0.456) (0.444) Race Black 1.263 * (0.526) Other 0.862 * (0.439) Intercept 0.461 1.379 1.397 (1.205) (1.089) (1.080) Prob. of correct classification 73.54% 70.37% 72.49% GOF χ² 179.24 180.08 178.61 GOF p-value 0.3567 0.3021 0.3294 |
For more information about etable, please see the following:
stata.com/stata-news/news38-1/etable/
stata.com/features/overview/tables-of-estimation-results/