Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Steven Samuels <sjsamuels@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: How to model a positive continuous dependent variable with many zeros? |
Date | Thu, 2 Jun 2011 12:47:39 -0400 |
Ah.. you are asking about the combination. The expected duration Y for a person with covariates X is: E(Y|X) = P(Y>0|X)*E(Y|Y>0,X) Where the P is from the logistic (or other) binary model and the expected value is from the survival model. However you have multiple episodes per-person, so that a single two-part model will not suffice. As you are really interested in the proportion of total time spent in seclusion, consider analyzing just that proportion directly. See Kit Baum's Stata Journal tip at http://www.scribd.com/doc/55505304/61/Stata-tip-63-Modeling-proportions. Steve sjsamuels@gmail.com On Wed, Jun 1, 2011 at 2:38 AM, Steve Samuels wrote These are known as "two-part" or "hurdle" models, and a google search will find hundreds of references. On Wed, Jun 1, 2011 at 2:38 AM, Adriaan Hoogendoorn <aw.hoogendoorn@gmail.com> wrote: Adriaan wrote: Thank you, Hithesh (and Maarten in a previous post), for your help. Your help is highly appreciated. The situation Maarten described appears exactly to be the case: Clinic staff members try reducing total seclusion durations (at the clinic level) by ending seclusions as soon as possible at the risk of introducing more seclusion episodes. Total seclusion duration (rated against the total time spent in the clinic) seems the appropriate quantity to evaluate seclusion policies. We find that total seclusion durations differ substantially across clinics. The explanation clinics give for having higher total seclusion durations than other clinics is that they claim to have “harder” patients, as Maarten suggested. Explaining these differences from patient characteristics (and some clinic characteristics) is exactly what this study is about. Your suggestion of combining the modeled zeros (from a logistic regression, or from the Poisson as Maarten suggested) with a model for non-zero duration (from GLM or Survival Analysis) seems very attractive. However, I have no experience on how to do this. Do you mean: after modeling the zeros, model the non-zeros by deleting the zeros from the data set using the same predictors? This would provide me with two sets of parameters. Do you think I can use these two sets of model parameters to estimate the total seclusion duration for a given ward with a given set of patients? I’ve never seen such a combined model in scientific literature – which may well be my mistake. Do you have any references how such a combination was applied and discussed? * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/