Ordinal outcome
Zero inflation: zero observations generated by two distinct processes
Robust, cluster–robust, and bootstrap standard errors
Complex survey designs support
Predict marginal, joint, and conditional probabilities of levels
Predict probability of participation and nonparticipation
Stata's zioprobit command fits zero-inflated ordered probit (ZIOP) models.
ZIOP models are used for ordered response variables, such as (1) fully ambulatory, (2) ambulatory with restrictions, and (3) partially ambulatory, when the data exhibit a high fraction of observations at the lowest end of the ordering. The concept of zero-inflation has its origin in Poisson models of count data with an overabundance of zeros. zioprobit applies this idea to ordinal data, where numeric value of the lowest category need not be zero. Given the category values we just used, Stata's zioprobit command could fit 1-inflated models. Or we could have numbered the categories 0, 1, and 2, and fit a 0-inflated model. The results would be the same either way.
Standard ordered probit models cannot account for the preponderance of zero observations when the zeros relate to an extra, distinct source. Consider a study of tobacco use in which the outcome of interest, smoking, is an ordered discrete response with four levels coded as 0, 1, 2, and 3, with 0 meaning "Nonsmoker" and 3 meaning "Daily, 20+ cigarettes/day".
Many of the individuals in the first category will be nonsmokers who have never smoked and will never smoke. The rest of them will be ex-smokers. Think of the standard ordered probit model as fitting the behavior of smokers, including ex-smokers. The zero inflation arises because the first group now includes those who have never smoked.
We have fictional data on the smoking study just described. The outcome variable is called tobacco and contains
tobacco usage | Freq. Percent Cum. | |
Nonsmoker | 11,642 78.14 78.14 | |
Weekly or less | 532 3.57 81.71 | |
Daily, <20 cigarettes/day | 1,933 12.97 94.68 | |
Daily, 20+ cigarettes/day | 792 5.32 100.00 | |
Total | 14,899 100.00 |
We believe that the 0 is inflated.
We want to fit a model in which smoking by those who have ever smoked is given by
income
gender
age
And membership in the never-smoked group is determined by
income
gender
age
whether parents smoked
religion
To fit the model, we type
. zioprobit tobacco income i.female age, inflate(income i.female age i.parent i.religion) Iteration 0: log likelihood = -11427.864 Iteration 1: log likelihood = -10365.839 (not concave) Iteration 2: log likelihood = -10362.27 Iteration 3: log likelihood = -10301.882 Iteration 4: log likelihood = -10299.872 Iteration 5: log likelihood = -10299.787 Iteration 6: log likelihood = -10299.787 Zero-inflated ordered probit regression Number of obs = 14,899 Wald chi2(3) = 751.43 Log likelihood = -10299.787 Prob > chi2 = 0.0000
tobacco | Coefficient Std. err. z P>|z| [95% conf. interval] | |
tobacco | ||
income | .1503256 .0057582 26.11 0.000 .1390398 .1616113 | |
female | ||
female | -.2726466 .047975 -5.68 0.000 -.3666759 -.1786173 | |
age | -.1394573 .011523 -12.10 0.000 -.1620419 -.1168727 | |
inflate | ||
income | -.0654874 .0087703 -7.47 0.000 -.082677 -.0482979 | |
female | ||
female | -.2166707 .0509783 -4.25 0.000 -.3165863 -.1167552 | |
age | .1205886 .0165181 7.30 0.000 .0882136 .1529636 | |
parent | ||
smoking | .7219495 .0436831 16.53 0.000 .6363321 .8075669 | |
religion | ||
discourages | -.2095319 .0586036 -3.58 0.000 -.3243927 -.094671 | |
_cons | -.5335904 .0873953 -6.11 0.000 -.7048821 -.3622987 | |
/cut1 | .0683114 .0881964 -.1045504 .2411731 | |
/cut2 | .2977055 .0804097 .1401054 .4553055 | |
/cut3 | 1.402649 .067253 1.270836 1.534463 | |
The standard ordered probit parameters, coefficients and cutpoints, are displayed in the first and last parts of the output, respectively.
The middle part of the output reports the probit coefficients for the inflation.
Coefficients can be difficult to interpret. For instance, what does a parent smoking coefficient of 0.72 mean? It means that, on average in the data, those whose parents are smokers are about 27% less likely to be never-smokers than those whose parents did not use tobacco. We obtained the 27% by using Stata's margins command:
. margins, predict(pnpar) dydx(parent) Average marginal effects Number of obs = 14,899 Model VCE: OIM Expression: Pr(nonparticipation), predict(pnpar) dy/dx wrt: 1.parent
Delta-method | ||
dy/dx Std. err. z P>|z| [95% conf. interval] | ||
parent | ||
smoking | -.266089 .015175 -17.53 0.000 -.2958314 -.2363467 | |
Note: dy/dx for factor levels is the discrete change from the base level. |
The predict(pnpar) option is unique to margins when used after zioprobit or ziologit. We asked margins to calculate predictions of the probability of nonparticipation, which in this example means the probability of being a never-smoker.
You can also fit Bayesian zero-inflated ordered probit models using the bayes prefix.
Read more about zero-inflated ordered probit in the Stata Base Reference Manual.