|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: suggested references about the variables to include in zero-inflated portion of zinb?
From |
Steven Samuels <[email protected]> |
To |
[email protected] |
Subject |
Re: st: suggested references about the variables to include in zero-inflated portion of zinb? |
Date |
Sun, 26 Oct 2008 11:05:43 -0400 |
Tim--the Subject of your last post was completely uninformative (st:
Re: statalist-digest V4 #3224). If you receive the Digest, do not use
the "Reply" button to respond.
I have a few thoughts:
1. The reviewer's original opinion is not correct. If your target
parameter is the mean score, then OLS may give a consistent estimate,
even if the data are skew and non-normal. The proviso is that you
have a good prediction model for the mean. However with OLS,
standard errors will be incorrect. The fix is easy: -reg- with a -
robust- option will give standard errors that are model-free.
2. Did you compare observed and expected values by eye and with a chi
square test? If the -zinb- fit is not good, there is little
justification for using it.
3. If, by chance, -zinb- happens to give a good fit, standard errors
based on the ZINB model will be wrong. You should use the -robust-
option or a bootstrap, as Carlo suggested.
4. Published analyses of CESD with the zero-inflated negative
binomial are not, in themselves, justification for using -zinb- in
your problem. Did the published distributions fit the data? I've
done analyses with full and reduced versions CESD. In one data set
and in national data the distribution was quite symmetric. In another
data set the distribution was bimodal. (I think this was an
interviewer problem) In neither case was there a lump at the minimum
(or maximum) value. In fact, the extreme responses were the rarest
ones.
5. If you do see lumps at the extremes, considered that they are
dishonest. Why? With count data, a separate model for responding at
all is plausible. With questionnaire scales, a minimum or maximum
score is the result of a respondent checking the same value for
every item. (I use the world "lumps", but in the statistical
literature, isolated higher density regions are usually called "bumps".)
6. If you want to fit the distribution of scores, as opposed to
predicting means, the beta distribution may provide a good
approximation. Divide the scores by the maximum possible, so that the
results are proportions. Then download -betafit- from SSC. You will
need to add a small constant to the zeros and subtract it from the
ones before you do your regressions.
-Steve
I am using zinb to estimate level of psychological distress (scores
range from 0-24) using various demographic variables and measures
of use of the Internet. I've used -countfit- to compare various
count models and the results support zinb as the best fitting model.
I am uncertain, however, about how to justify the variables that I
include in the zero-inflated part of the model. I've read journal
articles that have used zinb, read the book by Freese and Long, and
searched the Internet and Statalist but I have not been able to
find any detailed recommendations or procedures. Can anyone suggest
any other sources (books or journals) that provide an explanation
or a good example of this process?
Ideally I would like to find a good source that I can cite in the
paper -- but I appreciate any suggestions about this you might have.
Thanks for you help,
Tim
-----------------------------------------------------
Timothy M. Hale, MA
Graduate Assistant
University of Alabama at Birmingham
Department of Sociology
email: [email protected]
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/