Thanks to Carlo Lazzaro, Steven Samuels and Austin Nichols for comments.
There don't seem to be inflated zero's. The distribution of the
response variable follows a poisson distribution very closely.
The counts were obtained from 1-meter square seed traps in different
locations over the same time period. I gather from the student that
the zero counts on the response variable were expected (there were
other zeros as well) but that the zeros on the exposure were not
anticipated.
I like the idea of using the total seed count as a predictor. I will
try out some of the other suggestions as well.
--
Phil Ender
Statistical Consulting Group
UCLA Academic Technology Services
------- Carlo Lazzaro--------------
isn't this a mission for -help zip-?
-------Steven Samuels-------------
Some thoughts:
I'd want to know what the 'observations' are: different times, areas?
Although posited as a Poisson problem, this is a problem in predicting
proportions between 0 & 1, since the student is willing to condition
on 'exposure' equal to the number of seeds of all plants. I would
suggest a random effects binomial regression model like -xtmelogit- or
-glogit-. In either case, the cases with no seeds cannot be used.
I'd recommend a preliminary analysis to predict the total number of
seeds with one of Stata's count procedures, including -xtmepoisson- ,
-xtnbreg- . This analysis could separate out out influences on total
numbers of seeds from influences on the proportion belonging to the
species of interest. This preliminary analysis could predict the zero
counts of seeds.
A more advanced model- could predict the relative and absolute numbers
of more than two species, distinguishing between separate and common
influences. To me another question is: why expect a Poisson
distribution at all? If seeds are generated 'locally', then there will
be an unmeasured source of variation within areas, namely the number
of plants of each species.
--------Austin Nichols------------------
I think this is fine to model using a Poisson regression, though a
fractional logit (Papke and Wooldridge 1996) and other models are also
possibilities. But I think the cases with zero exposure supply no
information, in the context of the model specified, and are rightly
dropped. These are like cases with no observations, and can therefore
supply no information to form estimates about the rate at which events
happen.
Leslie E. Papke and Jeffrey M. Wooldridge. 1996. "Econometric Methods
for Fractional Response Variables with an Application to 401(k) Plan
Participation Rates." Journal of Applied Econometrics, 11(6): 619-632.
[see also http://www.nber.org/papers/t0147.pdf]
--------posted 3/27/08----------------------
A student comes in with a poisson model. The response variable is the
number of seeds of a certain species. There is an exposure variable
which is the total seeds of all species. The problem is that there
are six exposure values of zero. There are three other predictor
variables and 72 total observations. Is there any way of dealing with
this problem other than dropping those six values? Any suggestions?
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/