Mathew Stalker replied to Christer Thrane:
My initial model is:
Y = a + b1x1 + controls + e
where Y is expenditures on a commodity and x1 is income.
Since there are a lot of zeroes, I use the Tobit apporach. However, since
the log-linear model performed better than the linear, I use the former
(Before the log transformation of Y, I follow convention and set zeroes
to 1.)
Accordingly, the estimated Tobit model is:
logY = a + b1x1 + controls + e
The problem:
I want to predict the value of Y (not logY) for certain values of income
(and put it in a graph); that is, both the conditional Y (i.e. the Y
given
that the threshold value of 0 was passed) and the unconditional (latent)
value of Y.
Does anyone know how to do this?
The prediction of Y from your model would simply be the exponential of the
predicted logY.
However, you should note that the log of zero is minus infinity, so in your
log model no observations where Y is zero will be included. Is this really
what you want?
Replacing 0s by 1s is clearly not very satisfactory. Using generalised
linear models with log link makes _that_ unnecessary.
glm Y <predictors>, link(log)
This approach has two extra advantages. First, it automatically
yields predictions on the scale of the response, here Y.
Second, the back-transformation approach mentioned by
Mathew raises bias issues, well documented in some
literatures (for some reason, there is masses on
this within health economics) which don't arise in the GLM
case.
Nick
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/