Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: Poisson and Negbin models
From
Simon Falck <[email protected]>
To
"[email protected]" <[email protected]>
Subject
st: Poisson and Negbin models
Date
Sun, 28 Oct 2012 21:22:28 +0000
Hello,
I have a few questions related to Poisson and Negbin models. Using a cross-section, I am estimating the number of new firms (Y) across 72 countries (N) as a function of a range of different country attributes (X1, X2…Xn). There is no time or dummy variables included. All regressors take continues values.
Given that Y take non-negative integers and have a mean <10, a count data approach is appropriate, why I choose to apply standard Poission and Negbin models. The first indication is it that Y is overdispersed as the mean and the variance is not equal, nor close being equal (mean 4.347222 < var 542.6806). A formal Goodness-of-fit test of Y alone, using –estat gof- after -poisson $y-, indicates Y is significantly different from a Poisson distribution (chi2 = 1726.882, Prob > chi2(71) = 0.0000). Similarly the LR test of alpha related to the output from -nbreg $y- indicates that the negbin model is preferred over the Poisson (LR=0: chibar2(01) = 1570.16 Prob>=chibar2 = 0.000).
When I run the model Y=X1 X2…Xn, using the -nbreg- command, I end up with some problems. The model outcome indicates some problem with the alpha, and a LR test indicating that the Poission is preferred over the negbin model:
. nbreg $y $xlist, nolog irr
Negative binomial regression Number of obs = 72
LR chi2(7) = 112.28
Dispersion = mean Prob > chi2 = 0.0000
Log likelihood = -59.651782 Pseudo R2 = 0.4848
------------------------------------------------------------------------------
DV | IRR Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
X1 | 1.036879 .0295334 1.27 0.204 .9805806 1.096409
X2 | .994819 .0136695 -0.38 0.705 .9683849 1.021975
X3 | .5148558 .1462298 -2.34 0.019 .2950711 .8983481
X4 | .9809783 .0158745 -1.19 0.235 .9503532 1.01259
X5 | 1.325681 .0718518 5.20 0.000 1.192076 1.47426
X6 | .138362 .0542472 -5.04 0.000 .0641636 .2983629
X7 | 1.059356 .0270835 2.26 0.024 1.007581 1.113791
-------------+----------------------------------------------------------------
/lnalpha | -18.90698 558.4214 113.393 1075.579
-------------+----------------------------------------------------------------
alpha | 6.15e-09 3.43e-06 0 .
------------------------------------------------------------------------------
Likelihood-ratio test of alpha=0: chibar2(01) = 1.2e-05 Prob>=chibar2 = 0.499
The alpha seems to imply some problem, which is confirmed if I try to compute the predicted rate and probabilities for count models using –prcounts-, from which I end up with following error:
. prcounts nb, plot
problem with alpha prevents estimation of predicted probabilities.
r(198);
end of do-file
r(198);
I could be mentioned that Scott & Longs -countfit- indicates that a negbin model is preferred, over the Poission, and zero-inflated models. Furthermore, If I run a Poisson model, using –poisson-, and compare the outcome, I end up with very similar results (coef, LogL, AIC, BIC), yet, the Pseudo R-squared is quite different: PO=0.934, NB=0.485)
. poisson $y $xlist, irr nolog
Poisson regression Number of obs = 72
LR chi2(7) = 1682.44
Prob > chi2 = 0.0000
Log likelihood = -59.651788 Pseudo R2 = 0.9338
DV IRR Std. Err. z P>z [95% Conf. Interval]
X1 1.036879 .0295334 1.27 0.204 .9805804 1.096409
X2 .9948191 .0136695 -0.38 0.705 .968385 1.021975
X3 .5148558 .1462298 -2.34 0.019 .295071 .8983482
X4 .9809785 .0158745 -1.19 0.235 .9503533 1.012591
X5 1.325681 .0718518 5.20 0.000 1.192076 1.47426
X6 .1383621 .0542473 -5.04 0.000 .0641636 .2983633
X7 1.059355 .0270835 2.26 0.024 1.007581 1.113791
When I compare the observed and predicted values of Y, using -prcounts, the Poisson model seems to do a quite good job, e.g.
.list $y nbrate in 1/10
DV nbrate
----------------
1. 192 194
2. 41 34
3. 35 36
4. 6 5
5. 4 3
----------------
6. 3 3
7. 3 2
8. 3 0
9. 2 2
10. 2 2
I would appreciate if someone could explain what seems to be the problem(s) here, and some indication on the problem related to the alpha in the negbin model. One could argue that since the assumption on equidispersion that apply to the Poisson model appears not to hold, the PO-model outcome is quite “flattering”, perhaps too flattering (?). I am aware of that N is relatively small for a maximum LL model, but not sure if, and then how, this impact the model outcome in this particular situation. It could be mentioned that there is some collinearity between the regressors but which should not cause too much problems.
Thanks in advance,
/Simon
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/