Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Count models and fractional variables
From
Austin Nichols <[email protected]>
To
[email protected]
Subject
Re: st: Count models and fractional variables
Date
Mon, 19 Mar 2012 10:28:55 -0400
J.M.C. Santos Silva <[email protected]>:
While others differ, Stata convention is that truncation of y means
the values are not observed if y<a (as in -truncreg-) but censoring
means that y is measured as a if y<a (as in -tobit-). In the first
case, we do not have X either, so we cannot estimate a tobit using
censored obs.
I agree with your advice not to use the tobit or a zero inflated
model, but only on the grounds that they require (typically) untenable
distributional assumptions for consistent estimates, whereas a -glm-
needs only the functional form for the conditional mean.
However, it is possible that there is a form of censoring in such
data. It is true that we would expect the conditional mean of a count
outcome such as patents (or an arrival rate of patents) to be nonzero
given any covariates X, even if infinitely small, but it need not be;
it can be exactly zero, if for example some group of inventors is
ineligible to file for patents.
If categorical ineligibility derives from a binary (latent)
characteristic, perhaps as in a logit or probit model, then a zero
inflation process is at work; if the ineligibility derives from a
cutoff score on some continuous (latent) characteristic, then perhaps
a censoring model is in order (but probably still not a -tobit-).
Just to reiterate: I agree -glm- or an equivalent count model is
preferable in general for the application described, or perhaps a
discrete-time hazard model, but for some applications a zero-inflated
model may be preferable; without knowing much more about the data and
the data generating process (i.e. institutional settting) is hard to
know for sure.
Fabiana and Stefano H. Baruffaldi <[email protected]>:
Are there many inventors who never get a patent?
Do you believe there are inventors with a zero conditional mean in
patents per year?
If so, is there a known reason for that property?
Do you measure the reasons for the zero/nonzero conditional mean?
If you can estimate a logit that perfectly predicts zero/nonzero total
patents, perhaps you should just use -glm- on the group with nonzero
total patents.
On Sat, Mar 17, 2012 at 12:02 PM, Santos Silva, J.M.C.
<[email protected]> wrote:
> Dear Fabiana,
>
> Sorry for not seeing you post earlier. Let me see if I can clarify this.
>
> First, your friend should not use the Tobit as it is meant for
> truncated data and there is no truncation in this dataset.
>
> Second, the ZI models can be estimated even if the dependent
> variable is continuous. So, there is no need to round the data
> and of course you get different results if you do.
>
> The fact that you can estimate zero inflated models with
> continuous data does not mean that it is a good idea to do it!
> In particular, the results of zero inflated models are not invariant
> to the scale of the dependent variable, and that explains why
> different results are obtained if it is multiplied by 10000.
>
> The reason for this is that by rescaling the variable you change
> the amount of overdispersion (the mean is multiplied by k
> and the variance by k^2). Therefore, studying the number of
> patents per year of by quarter will give different results!
>
> The advice is now obvious: go back to modeling the counts
> using an appropriate count data model (is zero inflation really
> needed?). In general, my advice is that one should model the
> variable we care about and not some transformation of it; as this
> example illustrates, messing with the dependent variable may
> have very undesirable consequences.
>
> All the best,
>
> Joao
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/