Zero-inflated ordered probit

Order

Watch video demo

<- See Stata's other features

Highlights

Ordinal outcome
Zero inflation: zero observations generated by two distinct processes
Robust, cluster–robust, and bootstrap standard errors
Complex survey designs support
Predict marginal, joint, and conditional probabilities of levels
Predict probability of participation and nonparticipation
Support for Bayesian estimation

Stata's zioprobit command fits zero-inflated ordered probit (ZIOP) models.

ZIOP models are used for ordered response variables, such as (1) fully ambulatory, (2) ambulatory with restrictions, and (3) partially ambulatory, when the data exhibit a high fraction of observations at the lowest end of the ordering. The concept of zero-inflation has its origin in Poisson models of count data with an overabundance of zeros. zioprobit applies this idea to ordinal data, where numeric value of the lowest category need not be zero. Given the category values we just used, Stata's zioprobit command could fit 1-inflated models. Or we could have numbered the categories 0, 1, and 2, and fit a 0-inflated model. The results would be the same either way.

Standard ordered probit models cannot account for the preponderance of zero observations when the zeros relate to an extra, distinct source. Consider a study of tobacco use in which the outcome of interest, smoking, is an ordered discrete response with four levels coded as 0, 1, 2, and 3, with 0 meaning "Nonsmoker" and 3 meaning "Daily, 20+ cigarettes/day".

Many of the individuals in the first category will be nonsmokers who have never smoked and will never smoke. The rest of them will be ex-smokers. Think of the standard ordered probit model as fitting the behavior of smokers, including ex-smokers. The zero inflation arises because the first group now includes those who have never smoked.

Let's see it work

We have fictional data on the smoking study just described. The outcome variable is called tobacco and contains


            tobacco usage        Freq.     Percent        Cum.

                Nonsmoker       11,642       78.14       78.14
           Weekly or less          532        3.57       81.71
Daily, <20 cigarettes/day        1,933       12.97       94.68
Daily, 20+ cigarettes/day          792        5.32      100.00 

                    Total       14,899      100.00

We believe that the 0 is inflated.

We want to fit a model in which smoking by those who have ever smoked is given by

income
gender
age

And membership in the never-smoked group is determined by

income
gender
age
whether parents smoked
religion

To fit the model, we type

. zioprobit tobacco income i.female age, inflate(income i.female age i.parent 
     i.religion)

Iteration 0:   log likelihood = -11427.864  
Iteration 1:   log likelihood = -10365.839  (not concave)
Iteration 2:   log likelihood =  -10362.27  
Iteration 3:   log likelihood = -10301.882  
Iteration 4:   log likelihood = -10299.872  
Iteration 5:   log likelihood = -10299.787  
Iteration 6:   log likelihood = -10299.787  

Zero-inflated ordered probit regression                 Number of obs = 14,899
                                                        Wald chi2(3)  = 751.43
Log likelihood = -10299.787                             Prob > chi2   = 0.0000



     tobacco   Coefficient  Std. err.      z    P>|z|     [95% conf. interval]

tobacco       
      income     .1503256   .0057582    26.11   0.000     .1390398    .1616113
              
      female  
     female     -.2726466    .047975    -5.68   0.000    -.3666759   -.1786173
         age    -.1394573    .011523   -12.10   0.000    -.1620419   -.1168727

inflate       
      income    -.0654874   .0087703    -7.47   0.000     -.082677   -.0482979
              
      female  
     female     -.2166707   .0509783    -4.25   0.000    -.3165863   -.1167552
         age     .1205886   .0165181     7.30   0.000     .0882136    .1529636
              
      parent  
    smoking      .7219495   .0436831    16.53   0.000     .6363321    .8075669
              
    religion  
discourages     -.2095319   .0586036    -3.58   0.000    -.3243927    -.094671
       _cons    -.5335904   .0873953    -6.11   0.000    -.7048821   -.3622987

       /cut1     .0683114   .0881964                     -.1045504    .2411731
       /cut2     .2977055   .0804097                      .1401054    .4553055
       /cut3     1.402649    .067253                      1.270836    1.534463

The standard ordered probit parameters, coefficients and cutpoints, are displayed in the first and last parts of the output, respectively.

The middle part of the output reports the probit coefficients for the inflation.

Coefficients can be difficult to interpret. For instance, what does a parent smoking coefficient of 0.72 mean? It means that, on average in the data, those whose parents are smokers are about 27% less likely to be never-smokers than those whose parents did not use tobacco. We obtained the 27% by using Stata's margins command:

. margins, predict(pnpar) dydx(parent)

Average marginal effects                                Number of obs = 14,899
Model VCE: OIM

Expression: Pr(nonparticipation), predict(pnpar)
dy/dx wrt:  1.parent



                          Delta-method
                    dy/dx   Std. err.      z    P>|z|     [95% conf. interval]

      parent  
    smoking      -.266089    .015175   -17.53   0.000    -.2958314   -.2363467

Note: dy/dx for factor levels is the discrete change from the base level.

The predict(pnpar) option is unique to margins when used after zioprobit or ziologit. We asked margins to calculate predictions of the probability of nonparticipation, which in this example means the probability of being a never-smoker.

Tell me more

You can also fit Bayesian zero-inflated ordered probit models using the bayes prefix.

Read more about zero-inflated ordered probit in the Stata Base Reference Manual.

Products

New in Stata 19

Why Stata

All features

Disciplines

Stata/MP

StataNow

Order Stata

Purchase

Order Stata

Bookstore

Stata Press

Stata Journal

Gift Shop

Learn

Free webinars

NetCourses

Classroom and web training

Organizational training

Video tutorials

Third-party courses

Web resources

Teaching with Stata

Support

Training

Video tutorials

FAQs

Statalist: The Stata Forum

Resources

Technical support

Customer service

Alerts

Company

News and events

Customer service

Careers

We use cookies

We use cookies to ensure that we give you the best experience on our website—to enhance site navigation, to analyze usage, and to assist in our marketing efforts. By continuing to use our site, you consent to the storing of cookies on your device and agree to delivery of content, including web fonts and JavaScript, from third party web services.

Cookie Settings

Privacy policy

Last updated: 16 November 2022

StataCorp LLC (StataCorp) strives to provide our users with exceptional products and services. To do so, we must collect personal information from you. This information is necessary to conduct business with our existing and potential customers. We collect and use this information only where we may legally do so. This policy explains what personal information we collect, how we use it, and what rights you have to that information.

Required cookies

Advertising cookies

Required cookies

These cookies are essential for our website to function and do not store any personally identifiable information. These cookies cannot be disabled.
Advertising and performance cookies

This website uses cookies to provide you with a better user experience. A cookie is a small piece of data our website stores on a site visitor's hard drive and accesses each time you visit so we can improve your access to our site, better understand how you use our site, and serve you content that may be of interest to you. For instance, we store a cookie when you log in to our shopping cart so that we can maintain your shopping cart should you not complete checkout. These cookies do not directly store your personal information, but they do support the ability to uniquely identify your internet browser and device.

Please note: Clearing your browser cookies at any time will undo preferences saved here. The option selected here will apply only to the device you are currently using.

Accept Cookies

tobacco usage		Freq. Percent Cum.

Nonsmoker		11,642 78.14 78.14
Weekly or less		532 3.57 81.71
Daily, <20 cigarettes/day		1,933 12.97 94.68
Daily, 20+ cigarettes/day		792 5.32 100.00

Total		14,899 100.00


tobacco		Coefficient Std. err. z P>\|z\| [95% conf. interval]

tobacco
income		.1503256 .0057582 26.11 0.000 .1390398 .1616113

female
female		-.2726466 .047975 -5.68 0.000 -.3666759 -.1786173
age		-.1394573 .011523 -12.10 0.000 -.1620419 -.1168727

inflate
income		-.0654874 .0087703 -7.47 0.000 -.082677 -.0482979

female
female		-.2166707 .0509783 -4.25 0.000 -.3165863 -.1167552
age		.1205886 .0165181 7.30 0.000 .0882136 .1529636

parent
smoking		.7219495 .0436831 16.53 0.000 .6363321 .8075669

religion
discourages		-.2095319 .0586036 -3.58 0.000 -.3243927 -.094671
_cons		-.5335904 .0873953 -6.11 0.000 -.7048821 -.3622987

/cut1		.0683114 .0881964 -.1045504 .2411731
/cut2		.2977055 .0804097 .1401054 .4553055
/cut3		1.402649 .067253 1.270836 1.534463


		Delta-method
		dy/dx Std. err. z P>\|z\| [95% conf. interval]

parent
smoking		-.266089 .015175 -17.53 0.000 -.2958314 -.2363467

Note: dy/dx for factor levels is the discrete change from the base level.