Left-censoring, right-censoring, or both
Censoring that varies by observation
Random effects
Random intercepts
Random coefficients (slopes)
Multilevel: two, three, or more levels
Make inferences about either the uncensored or the censored outcome
Support for complex survey data
Graph marginal means and marginal effects
Intraclass correlation
The metobit command fits multilevel and panel-data models for which the outcome is censored. Censored means that rather than the outcome \(y\) being observed precisely in all observations, it is known only that \(y \leq y_l\) (left-censoring) or \(y \geq y_u\) (right-censoring) in some of the observations. For instance, the amount of a pollutant may be left-censored because the measurement instrument has a lower limit of detection. The number of attendees at an event may be right-censored because the stadium has a limited number of seats.
Multilevel means that the fitted model accounts for lack of independence within groups of observations, such as people who live near each other or students who attend the same school or students who are tested repeatedly. metobit can also fit models with multiple levels of nesting. You can fit models with data on students within school districts within cities and even have random effects for each level!
Tobit models, whether multilevel or one-level, can be used for two types of inference—for the entire population as if it were not censored and for the censored population.
We have been hired to analyze data on attendance at 500 soccer stadiums. The data are censored when the stadium is sold out. In such cases, it is likely that attendance would have been greater had there been more seats.
Clients who run stadiums, who could increase the number of seats at a cost, would be interested in analysis of the uncensored population. Clients who rent and cannot increase the number of seats are interested in analysis of the censored population.
We can use metobit to answer questions for both types of clients. In fact, we will fit the model once and use different predictions to answer different questions.
The data we have include attend, stadium attendance in thousands for each game played during the season. We will model attendance as a function of
winp, the winning percentage of the local team;
inter, the probability that the local team makes it to an international postseason;
cost, the cost of a ticket for the game being played; and
weather, whether a storm—rain or snow—was forecast on game day.
We could fit this model with a linear multilevel estimator but for the fact that each stadium has a seating limit. That limit is recorded in the variable max.
Using metobit, we will type
. metobit attend winp inter cost i.weather || stadium: winp, ul(max)
Option ul(max) specifies the upper-censoring point.
Right of the ||
is the level-2 ID variable, stadium. We are specifying
that we want random intercepts for each stadium and random coefficients for winning
percentage.
We fit our desired model:
. metobit attend winp inter cost i.weather || stadium: winp, ul(max) Fitting fixed-effects model: Iteration 0: Log likelihood = -21793.676 Iteration 1: Log likelihood = -21321.165 Iteration 2: Log likelihood = -21239.918 Iteration 3: Log likelihood = -21239.16 Iteration 4: Log likelihood = -21239.159 Refining starting values: Grid node 0: Log likelihood = -19826.409 Fitting full model: Iteration 0: Log likelihood = -19826.409 (not concave) Iteration 1: Log likelihood = -18956.642 (not concave) Iteration 2: Log likelihood = -18440.049 Iteration 3: Log likelihood = -17938.155 Iteration 4: Log likelihood = -17860.781 Iteration 5: Log likelihood = -17822.36 Iteration 6: Log likelihood = -17820.965 Iteration 7: Log likelihood = -17820.958 Iteration 8: Log likelihood = -17820.958 Mixed-effects tobit regression Number of obs = 8,131 Uncensored = 5,451 Limits: Lower = -inf Left-censored = 0 Upper = max Right-censored = 2,680 Group variable: stadium Number of groups = 500 Obs per group: min = 9 avg = 16.3 max = 20 Integration method: mvaghermite Integration pts. = 7 Wald chi2(4) = 11728.37 Log likelihood = -17820.958 Prob > chi2 = 0.0000
attend | Coefficient Std. err. z P>|z| [95% conf. interval] | |
winp | .4287463 .0708727 6.05 0.000 .2898384 .5676542 | |
inter | .5575962 .0051627 108.00 0.000 .5474774 .5677149 | |
cost | -.0053072 .0005233 -10.14 0.000 -.0063329 -.0042815 | |
1.weather | -.2126963 .2977593 -0.71 0.475 -.7962937 .3709011 | |
_cons | 9.213013 .348091 26.47 0.000 8.530767 9.895259 | |
stadium | ||
var(winp) | 1.335236 .1471157 1.075903 1.657078 | |
var(_cons) | 35.74543 2.838831 30.59284 41.76584 | |
var(e.attend) | 22.48149 .4547381 21.60765 23.39066 | |
From the model, we can obtain estimates of average attendance. There are many ways to calculate average attendance. What would be the uncensored average attendance if max had not been in effect? What is the predicted average attendance given max? What would be average attendance if seating was increased by 1,000 in all stadiums having more than 90% average attendance? 2,000? 3,000?
In a world where max was not relevant, average attendance would have been about 23,510:
. margins Predictive margins Number of obs = 8,131 Model VCE: OIM Expression: Marginal linear prediction, predict()
Delta-method | ||
Margin std. err. z P>|z| [95% conf. interval] | ||
_cons | 23.50981 .3189898 73.70 0.000 22.8846 24.13502 | |
In the real world where the current value of max is binding, it would be about 18,712:
. margins, predict(ystar(.,max)) Predictive margins Number of obs = 8,131 Model VCE: OIM Expression: E(attend*|attend<max), predict(ystar(.,max))
Delta-method | ||
Margin std. err. z P>|z| [95% conf. interval] | ||
_cons | 18.7123 .1790272 104.52 0.000 18.36141 19.06319 | |
We could also use margins to answer what attendance would be if max was increased by 1,000 in stadiums with over 90% attendance.
. quietly generate new_max = max + 1000 . margins if attend_rate<.90, predict(ystar(.,new_max)) Predictive margins Number of obs = 5,284 Model VCE: OIM Expression: E(attend*|attend<new_max), predict(ystar(.,new_max))
Delta-method | ||
Margin std. err. z P>|z| [95% conf. interval] | ||
_cons | 26.81172 .3243089 82.67 0.000 26.17608 27.44735 | |
Average attendance would be 26,812 in stadiums with attendance rate greater than 90%. This seems like a large number, but the stadiums in our sample with more than 90% attendance are the larger stadiums with teams with the highest winning percentage.
You can also fit Bayesian multilevel tobit models using the bayes prefix.
Learn more about Stata's multilevel mixed-effects models features.
Read more about multilevel tobit models in the Multilevel Mixed-Effects Reference Manual; see [ME] metobit.