Hello Statalist,
I am having some issues analyzing data with glm. I have tried several
methods to analyze my zero-inflated data set (zinb, hurdle and glm).
The best model fit that I get are when I log transform the response
variable prior to analysis with a glm model using a negative binomial
distribution. The negative binomial uses a log link function, so I
think that this analysis is essentially double log-transforming the
data, once initially, and then when the response is linked to the
predictors it is log-transformed again. I have not been able to find
any literature regarding this, so I was wondering if anyone knows if
this is an appropriate way to analyze these data? Does it violate
assumptions of the glm?? Thanks for your time.
Chuck
======================
Chuck:
Think about it this way. Poisson and negative binomial (NB) are count
response models. One can use them with decimals, e.g. 15.4, etc, but essentially
the assumption upon which the models are based is that they model counts, or
integers. By log transforming the counts you have seriously compromised the
assumptions. Look at the range of your response?
You are also correct in thinking that you have logged a response that has
already been logged internally from with the algorithm. It's a bit more
complicated than that, but you should not do it.
It appears from your comments that there are excessive zeros in the
response. Either a hurdle or ZINB is probably the best approach -- if you are still
intending to model counts. It just may be that neither of these models fit the
data well. Do you know the reason why there are excessive 0's. Try a
2-parameter log-gamma or 2-parameter log-inverse Gaussian model. Compare the AIC
statistics. You can also try severing the data by excluding 0's and model using
a 0-runcated program -- but only if you know that the 0's data have been
generated by an entirely differeent method than the positive count data. This is
not an ideal solution, but a possible one in certain circumstances.
Joe Hilbe
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/