Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Subject: st: simulating random numbers from zero inflated negative binomial estimates
From
[email protected]
To
[email protected]
Subject
Subject: st: simulating random numbers from zero inflated negative binomial estimates
Date
Tue, 15 Nov 2011 13:29:30 -0500 (EST)
Back in June I posted code on the List for creating synthetic
zero-inflated Poisson and negative binomal models. They were based on
the same logic as the synthetic models i had
developed for other models such as synthetic hurdle models, synthetic
finite mixture models, ordered and unordered logit and probit
categorical response models (proportional odds; multinomial) and other
discrete response regression models that are in my "Logistic Regression
Models" [LRM] and "Negative Binomial Regression" (2nd edition) [NBR2]
books. Many of these were discussed in my Stata Journal article on the
subject in early 2010 (SJ, Vol 10, Nu 1).
James Hardin and I have added a new chapter on creating synthetic data
including correlated data in the third edition of Stata Press book,
"Generalized Linear Models and Extensions" [GLME3], which is due to be
out next month. In addition, for many -- perhaps the majority -- of the
models discussed in GLME3 we walk through the setting up of synthetic
data and synthetic models for the model being discussed For example,
in the section on zero-inflated Poisson and negative binomial models we
discuss the logic creating synthetic zero-inflated Poisson data. Other
considerably more complicated syntheic models are developed and
explained for models as well such as generalized negative binomial
(NB-P) which has a third parameter, and a three component finite
mixture model. We also developed Stata code for estimating models such
as Poisson Inverse Gaussian (PIG), zero-inflated PIG, generalized
Poisson, zero-inflated generalized Poisson, NB-P, and a number of other
models that can prove to be useful in research. These models are
discussed in my NBR2 book as well, but Stata code for models like PIG,
ZIPIG, ZIGP, and NB-P was not yet developed. In my NBR2 and LRM books,
R, Limdep, and SAS are used for models which had no Stata support --
official or unofficial. . for the examples in the book.
My point in telling you this is to let you know that there is a lot of
material already available on developing synthetic models like the ZINB
you were interested in. Much is already published- as as ZINB. Aside
from my stuff and what Hardin and I have developed .. Cameron and
Trivedi's Stata Press book on "Microeconometrics Using Stata" has
quite a bit of excellent code for developing synthetic data. The code
below for creating user specified ZINB data and models has the
following default values, You can easily amend the number of predictors
you want for each component as well as the values for the coefficients.
The default value of alpha, the negative binomial heterogeneity
parameter is .5. The default values are given in the comment lines at
the beginning of the code. I ran the do file once with the defaults and
no seed value, so another run witl produce slightly different results.
You can see that the coefficient values of the resultant synthetic
model are quite close to what we specified.
I hope this helps a bit. Joe Hilbe
SYNTHETIC ZINB WITH LOGIT BINARY COMPONENT
===================================================
* Synthetic zero-inflated negative binomial with logit as binary
component
* Joseph Hilbe 15Aug2005; 10Oct2009; 5Jun2011 zinb_syn.do
* LOGIT: x1= .9, x2= .1, _c= .2
* NB2 : x1=.75, n2=-1.25, _c=2, alpha=.5
clear
set obs 50000
* set seed 1000
gen x1 = invnorm(runiform())
gen x2 = invnorm(runiform())
* NEGATIVE BINOMIAL- NB2
gen xb = 2 + 0.75*x1 - 1.25*x2
gen a = .5
gen ia = 1/a
gen exb = exp(xb)
gen xg = rgamma(ia, a)
gen xbg = exb * xg
gen nby = rpoisson(xbg)
* BERNOULLI
gen pi =1/(1+exp(-(.9*x1 + .1*x2+.2)))
gen bernoulli = runiform()>pi
gen zy = bernoulli*nby
rename zy y
* ZINB
zinb y x1 x2, inf(x1 x2) nolog
==============================================
Zero-inflated negative binomial regression Number of obs =
50000
Nonzero obs =
19433
Zero obs =
30567
Inflation model = logit LR chi2(2) =
25898.50
Log likelihood = -89533.47 Prob > chi2 =
0.0000
-------------------------------------------------------------------------
-----
y | Coef. Std. Err. z P>|z| [95% Conf.
Interval]
-------------+-----------------------------------------------------------
-----
y |
x1 | .7506944 .0066242 113.33 0.000 .7377112
.7636775
x2 | -1.250086 .0068687 -182.00 0.000 -1.263549
-1.236624
_cons | 1.99581 .006915 288.62 0.000 1.982257
2.009364
-------------+-----------------------------------------------------------
-----
inflate |
x1 | .9047498 .0141011 64.16 0.000 .8771121
.9323875
x2 | .1003384 .0125745 7.98 0.000 .0756928
.1249839
_cons | .199909 .0123115 16.24 0.000 .175779
.2240391
-------------+-----------------------------------------------------------
-----
/lnalpha | -.6870374 .0153996 -44.61 0.000 -.7172202
-.6568547
-------------+-----------------------------------------------------------
-----
alpha | .5030642 .007747 .4881072
.5184796
-------------------------------------------------------------------------
-----
Date: Mon, 14 Nov 2011 16:04:17 +0000
From: Daniel Hill-McManus <[email protected]>
Subject: st: simulating random numbers from zero inflated negative
binomial estimates
Hi,
I came across this code recently that Paul wrote in response to a query
(see below).
I've also found it useful, thank you. But working through it I cannot
understand why the linear predictor p2 is not exponentiated before
being
used in rgamma(1/alph, alph*p2).
I'd be grateful if someone could point out to me why this is.
Dan
On 3/06/2011 2:56 a.m., E. Paul Wileyto wrote:
I'm not sure whether anyone has answered this yet.
First, read the help on zinb post-estimation commands. There are
many flavors of "predict" listed there. You will need three of them
before you start generating random numbers. The first one you will need
is:
predict p1, pr
That will generate a new variable, p1, which will be the predicted
probability of an inflated zero. All the work is done for you.
The second predicted quantity you will need is:
predict lp, xb
That will generate the linear combination of predictor variables
weighted by coefficients for the negative binomial part of the model.
Finally, you will need:
predict alpha, xb eq(#3)
which will generate a variable containing the overdispersion
parameter for the negative binomial. With those three bits, you can get
on to simulating.
Here's my script:
zinb cignums drug week, inf(drug week)
predict p1 , pr
predict p2 , xb
predict lnalpha , xb eq(#3)
gen alph=exp(lnalpha)
gen xg=rgamma(1/alph, alph*p2)
gen pg=rpoisson(xg)
gen zi=runiform()>p1
gen newcigs=zi*pg
zinb newcigs drug week, inf(drug week)
Paul
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/