Hi,
I'm trying to run a regression model to identify independent predcitors of a specific continuous outcome (independent variable).
(1) The outcome is non-normal (swilk p-value 0.0000), so I can't use a linear regression model.
(2) There are a number of patients where the outcome value is zero (approximately 30% of the cohort). So I can't direct use a log linear model because automatically patients in whom the outcome is zero have a non-calculable log(outcome) and are dropped from the analysis. One option would be that i have nominal value for those with zero, i.e. add 0.5 to all patients so that the outcome is not zero.
(3) Even if the outcome is a count variable (incidence), the variance is much >>> the mean, and the Poisson goodness of fit has a p of 0.000.
(4) Negative binomial model has a better fit, but does the high number of zeros raise any concern?
(5) I also tried zero inflated negative binomial regression, but all the examples I've seen are where one of the independent variables has a high number of zeros. Is it appropriate to use the zinb command when the dependent variable has a high number of zeros?
Thanks,
Ashwin
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/