Steven--I like this approach in general, but from the original post,
it's not clear that data on the timing of first visit or even time at
risk is on the data--perhaps the poster can clarify? Also, would you
propose using the predicted hazard in the period of first visit as
some kind of selection correction? The outcome is visits divided by
time at risk for subsequent visits in your setup, so represents a
fractional outcome (constrained to lie between zero and one) in
theory, though only the zero limit is likely to bind, which makes it
tricky to implement, I would guess--if you are worried about the
nonnormal error distribution and the selection b
Ignoring the possibility of detailed data on times of utilization, why
can't you just run a standard count model on number of visits and use
that to predict probability of at least one visit? One visit in 10
years is not that different from no visits in 10 years, yeah? It
makes no sense to me to predict utilization only for those who have
positive utilization and worry about selection etc. instead of just
using the whole sample, including the zeros. I.e. run a -poisson- to
start with. If you have a lot of zeros, that can just arise from the
fact that a lot of people have predicted number of visits in the .01
range and number of visits has to be an integer. Zero inflation or
overdispersion also can arise often from not having the right
specification for the explanatory variables... but you can also move
to another model in the -glm- or -nbreg- family.
On Tue, Jun 2, 2009 at 1:21 PM, <[email protected]> wrote:
> A potential problem with Jon's original approach is that the use of
> services is an event with a time dimension--time to first use of
> services. People might not use services until they need them.
> Instead of a logit model (my preference also), a survival model for
> the first part might be appropriate.
>
> With later first-use, the time available for later visits is reduced,
> and number of visits might be associated with the time from first use
> to the end of observation. Moreover, people with later first-visits
> (or none) might differ in their degree of need for subsequent visits.
>
> To account for unequal follow-up times, I suggest a supplementary
> analysis in which the outcome for the second part of the hurdle model
> is not the number of visits, but the rate of visits (per unit time at
> risk).
>
> -Steve.
>
> On Tue, Jun 2, 2009 at 12:22 PM, Lachenbruch, Peter
> <[email protected]> wrote:
>> This could also be handled by a two-part or hurdle model. The 0 vs. non-zero model is given by a probit or logit (my preference) model. The non-zeros are modeled by the count data or OLS or what have you. The results can be combined since the likelihood separates (the zero values are identifiable - no visits vs number of visits).
>>
>>
>> -----Original Message-----
>> From: [email protected] [mailto:[email protected]] On Behalf Of Martin Weiss
>> Sent: Tuesday, June 02, 2009 7:02 AM
>> To: [email protected]
>> Subject: st: AW: Sample selection models under zero-truncated negative binomial models
>>
>> *************
>> ssc d cmp
>> *************
>> -----Ursprüngliche Nachricht-----
>> Von: [email protected]
>> [mailto:[email protected]] Im Auftrag von John Ataguba
>> Gesendet: Dienstag, 2. Juni 2009 16:00
>> An: Statalist statalist mailing
>> Betreff: st: Sample selection models under zero-truncated negative binomial
>> models
>>
>> Dear colleagues,
>>
>> I want to enquire if it is possible to perform a ztnb (zero-truncated
>> negative binomial) model on a dataset that has the zeros observed in a
>> fashion similar to the heckman sample selection model.
>>
>> Specifically, I have a binary variable on use/non use of outpatient health
>> services and I fitted a standard probit/logit model to observe the factors
>> that predict the probaility of use. Subsequently, I want to explain the
>> factors the influence the amount of visits to the health facililities. Since
>> this is a count data, I cannot fit the standard Heckman model using the
>> standard two-part procedure in stata command -heckman-.
>>
>> My fear now is that my sample of users will be biased if I fit a ztnb model
>> on only the users given that i have information on the non-users which I
>> used to run the initial probit/logit estimation.
>>
>> Is it possible to generate the inverse of mills' ratio from the probit model
>> and include this in the ztnb model? will this be consistent? etc...
>>
>> Are there any smarter suggestions? Any reference that has used the similar
>> sample selection form will be appreciated.
>>
>> Regards
>>
>> Jon
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/