Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Hurdle models vs. zero inflated models
From
David Hoaglin <[email protected]>
To
[email protected]
Subject
Re: st: Hurdle models vs. zero inflated models
Date
Mon, 18 Nov 2013 21:15:50 -0500
Dear Pooja,
The nature of your data suggests that two processes are involved:
whether a patient was hospitalized and, given that the patient was
hospitalized, the number of days in hospital (I assume that the length
of stay is a positive integer number of days). Thus, a hurdle model
seems a reasonable starting point, especially since 90% of patients
are not hospitalized.
Among the patients who were hospitalized (taking covariates into
account), do the data suggest that the distribution of length of stay
is a mixture? If so, you should try to identify the components and
model them. In what follows, I assume a single component.
The logistic model will give you a predicted probability of being
hospitalized (for each patient). The model for length of stay (among
hospitalized patients) will give you each patient's expected length of
stay (if hospitalized). Then the unconditional predicted length of
stay for the patient is
[1 - Prob(hospitalized)] x 0 + Prob(hospitalized) x (predicted length
of stay, given hospitalized).
If appropriate, you can average this predicted length of stay over the
patients in your sample. Or you could average over a hypothetical
population of interest. Or you could calculate predicted values for
individual patients with "interesting" combinations of
characteristics.
I hope this helps.
Regards,
David Hoaglin
On Mon, Nov 18, 2013 at 4:20 AM, Pooja Desai <[email protected]> wrote:
> Hello All,
>
> I have a variable which represents the length of stay in the hospital
> for patients. 90% of the patients were not hospitalized so they have a
> 0 length of stay. I was considering using either a zero-inflated
> negative binomial regression or a hurdle model (logit and zero
> truncated negative binomial) for this variable. Since all the zeros on
> the length of stay come from one source—the patients not having a
> hospitalization, I thought a hurdle model would be a better option
> (there are no sampling zeros). Is that an appropriate choice?
>
> If so, I need to find the predicted length of stay from the model. Is
> there a way to get the predicted value after accounting for both the
> logistic and zero truncated negative binomial models?
>
> Thanks in advance.
>
> Pooja Desai
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/