<(1) The outcome is non-normal (swilk p-value 0.0000), so I can't use a
linear regression model.>
But the normality assumption in linear regression refers to residuals rather
than independent variable. If your independent variable is per patient
health care costs, for instance, there's a very negligible chance that they
follow a normal distribution.
<(4) Negative binomial model has a better fit, but does the high number of
zeros raise any concern?>
Observed zeros can give you some problems as far as their frequency is
higher than that expected by the probability distribution you selected.
<(5) I also tried zero inflated negative binomial regression, but all the
examples I've seen are where one of the independent variables has a high
number of zeros. Is it appropriate to use the zinb command when the
dependent variable has a high number of zeros?>
For more on this topic, please see:
J. Scott Long, Jeremy Frase. Regression Model for Categorical Dependent
Variables Using Stata. Second edition. College Station: Stata Press, 2006.
I do not know whether a more recent version is currently available (please,
see www.stata.com, bookstore section.
HTH and Kind Regards,
Carlo
-----Messaggio originale-----
Da: [email protected]
[mailto:[email protected]] Per conto di Ashwin
Ananthakrishnan
Inviato: sabato 3 ottobre 2009 3.31
A: [email protected]
Oggetto: st: Which regression model to use for zero-inflated, non-normal
outcome?
Hi,
I'm trying to run a regression model to identify independent predcitors of a
specific continuous outcome (independent variable).
(1) The outcome is non-normal (swilk p-value 0.0000), so I can't use a
linear regression model.
(2) There are a number of patients where the outcome value is zero
(approximately 30% of the cohort). So I can't direct use a log linear model
because automatically patients in whom the outcome is zero have a
non-calculable log(outcome) and are dropped from the analysis. One option
would be that i have nominal value for those with zero, i.e. add 0.5 to all
patients so that the outcome is not zero.
(3) Even if the outcome is a count variable (incidence), the variance is
much >>> the mean, and the Poisson goodness of fit has a p of 0.000.
(4) Negative binomial model has a better fit, but does the high number of
zeros raise any concern?
(5) I also tried zero inflated negative binomial regression, but all the
examples I've seen are where one of the independent variables has a high
number of zeros. Is it appropriate to use the zinb command when the
dependent variable has a high number of zeros?
Thanks,
Ashwin
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/