Title | Fitting ordered logistic and probit models with constraints | |
Author |
Mark Inlow, StataCorp Ronna Cong, StataCorp |
Consider a parameterization in which a constant is present, e.g., Greene’s formulation (Greene 2018, Chapter 18):
Pr(Y = 0) = F(−Xb) Pr(Y = 1) = F(u1 −Xb) − F(−Xb) Pr(Y = 2) = F(u2 −Xb) − F(u1 −Xb) ...
In the preceding, F is the cumulative distribution function (CDF), either the cumulative standard normal distribution for ordered probit regression or the cumulative logistic distribution for ordered logistic regression. Since Greene includes a constant in his Xb, we need to indicate this to make his notation and Stata’s ordered probit/logistic notation comparable:
Pr(Y = 0) = F(−Xb − con) Pr(Y = 1) = F(u1 − Xb − con) − F(−Xb − con) Pr(Y = 1) = F(u2 − Xb − con) − F(u1 −Xb − con) ...
Now, compare this with Stata’s no-constant model:
Pr(Y = 0) = F(/cut1 − Xb) Pr(Y = 1) = F(/cut2 − Xb) − F(/cut1 − Xb) Pr(Y = 2) = F(/cut3 − Xb) − F(/cut2 − Xb) ...
Examining the expressions for Pr(Y = 0), we see that
−Xb − con = /cut1 − Xb
so Greene’s constant equals –/cut1. Greene set the first cut point to zero, whereas Stata set the constant to zero.
Combining this observation with the expressions for Pr(Y = 1), we see that Greene’s u1 = /cut2 + con = /cut2 − /cut1. Doing the same for Pr(Y = 2), we see that u2 = /cut3 − /cut1. Thus to estimate Greene’s model using the coefficient estimates from Stata’s ordered probit/logistic regression commands we can use the following:
Greene's intercept = −/cut1 Greene's u1 = /cut2 − /cut1 Greene's u2 = /cut3 − /cut1 ...
After you fit your model using Stata, you can convert to Greene’s parameterization using lincom, which will provide both the coefficient estimate and the standard error as follows:
ologit/oprobit ... lincom _b[/cut2] - _b[/cut1] lincom _b[/cut3] - _b[/cut1] ...
To make things concrete, consider the following example using the auto dataset, which is shipped with Stata.
. sysuse auto, clear (1978 Automobile Data) . replace rep78 = 2 if rep78 == 1 | missing(rep78) (7 real changes made) . tabulate rep78
Repair | ||
Record 1978 | Freq. Percent Cum. | |
2 | 15 20.27 20.27 | |
3 | 30 40.54 60.81 | |
4 | 18 24.32 85.14 | |
5 | 11 14.86 100.00 | |
Total | 74 100.00 |
rep78 | Coefficient Std. err. z P>|z| [95% conf. interval] | |
price | .0000966 .0000515 1.88 0.061 -4.36e-06 .0001976 | |
weight | -.0007095 .0002013 -3.52 0.000 -.0011041 -.000315 | |
/cut1 | -2.468357 .5580629 -3.56214 -1.374573 | |
/cut2 | -1.276601 .5310947 -2.317528 -.2356748 | |
/cut3 | -.3720451 .5046055 -1.361054 .6169635 | |
Thus the intercept (constant) is −/cut1 = 2.47, and now we compute the point estimate and standard error of u1:
. lincom _b[/cut2] - _b[/cut1] ( 1) - [/]cut1 + [/]cut2 = 0
rep78 | Coefficient Std. err. z P>|z| [95% conf. interval] | |
(1) | 1.191755 .183964 6.48 0.000 .8311925 1.552318 | |
Our estimate of u1 is 1.19 with a standard error of 0.18. Finally we estimate u2:
. lincom _b[/cut3] - _b[/cut1] ( 1) - [/]cut1 + [/]cut3 = 0
rep78 | Coefficient Std. err. z P>|z| [95% conf. interval] | |
(1) | 2.096311 .2457135 8.53 0.000 1.614722 2.577901 | |
Thus our estimate of u2 is 2.096 with a standard error of .246.