Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Subject: st: simulating random numbers from zero inflated negative binomial estimates

From	[email protected]
To	[email protected]
Subject	Subject: st: simulating random numbers from zero inflated negative binomial estimates
Date	Tue, 15 Nov 2011 13:29:30 -0500 (EST)

Back in June I posted code on the List for creating syntheticzero-inflated Poisson and negative binomal models. They were based onthe same logic as the synthetic models i haddeveloped for other models such as synthetic hurdle models, syntheticfinite mixture models, ordered and unordered logit and probitcategorical response models (proportional odds; multinomial) and otherdiscrete response regression models that are in my "Logistic RegressionModels" [LRM] and "Negative Binomial Regression" (2nd edition) [NBR2]books. Many of these were discussed in my Stata Journal article on thesubject in early 2010 (SJ, Vol 10, Nu 1).

James Hardin and I have added a new chapter on creating synthetic dataincluding correlated data in the third edition of Stata Press book,"Generalized Linear Models and Extensions" [GLME3], which is due to beout next month. In addition, for many -- perhaps the majority -- of themodels discussed in GLME3 we walk through the setting up of syntheticdata and synthetic models for the model being discussed For example,in the section on zero-inflated Poisson and negative binomial models wediscuss the logic creating synthetic zero-inflated Poisson data. Otherconsiderably more complicated syntheic models are developed andexplained for models as well such as generalized negative binomial(NB-P) which has a third parameter, and a three component finitemixture model. We also developed Stata code for estimating models suchas Poisson Inverse Gaussian (PIG), zero-inflated PIG, generalizedPoisson, zero-inflated generalized Poisson, NB-P, and a number of othermodels that can prove to be useful in research. These models arediscussed in my NBR2 book as well, but Stata code for models like PIG,ZIPIG, ZIGP, and NB-P was not yet developed. In my NBR2 and LRM books,R, Limdep, and SAS are used for models which had no Stata support --official or unofficial. . for the examples in the book.

My point in telling you this is to let you know that there is a lot ofmaterial already available on developing synthetic models like the ZINByou were interested in. Much is already published- as as ZINB. Asidefrom my stuff and what Hardin and I have developed .. Cameron andTrivedi's Stata Press book on "Microeconometrics Using Stata" hasquite a bit of excellent code for developing synthetic data. The codebelow for creating user specified ZINB data and models has thefollowing default values, You can easily amend the number of predictorsyou want for each component as well as the values for the coefficients.The default value of alpha, the negative binomial heterogeneityparameter is .5. The default values are given in the comment lines atthe beginning of the code. I ran the do file once with the defaults andno seed value, so another run witl produce slightly different results.You can see that the coefficient values of the resultant syntheticmodel are quite close to what we specified.


I hope this helps a bit.   Joe Hilbe

SYNTHETIC ZINB WITH LOGIT BINARY COMPONENT
===================================================

* Synthetic zero-inflated negative binomial with logit as binarycomponent

* Joseph Hilbe  15Aug2005; 10Oct2009; 5Jun2011   zinb_syn.do
* LOGIT: x1= .9, x2= .1, _c= .2
* NB2  : x1=.75, n2=-1.25, _c=2, alpha=.5
clear
set obs 50000
* set seed 1000
gen x1 = invnorm(runiform())
gen x2 = invnorm(runiform())
* NEGATIVE BINOMIAL- NB2
gen xb = 2 + 0.75*x1 - 1.25*x2
gen a = .5
gen ia = 1/a
gen exb = exp(xb)
gen xg = rgamma(ia, a)
gen xbg = exb * xg
gen nby = rpoisson(xbg)
* BERNOULLI
gen pi =1/(1+exp(-(.9*x1 + .1*x2+.2)))
gen bernoulli = runiform()>pi
gen zy = bernoulli*nby
rename zy y
* ZINB
zinb y x1 x2, inf(x1 x2) nolog
==============================================

Zero-inflated negative binomial regression Number of obs =50000Nonzero obs =19433Zero obs =30567

Inflation model = logit LR chi2(2) =25898.50Log likelihood = -89533.47 Prob > chi2 =0.0000


-------------------------------------------------------------------------
-----

y | Coef. Std. Err. z P>|z| [95% Conf.Interval]

-------------+-----------------------------------------------------------
-----
y            |

x1 | .7506944 .0066242 113.33 0.000 .7377112.7636775x2 | -1.250086 .0068687 -182.00 0.000 -1.263549-1.236624_cons | 1.99581 .006915 288.62 0.000 1.9822572.009364

-------------+-----------------------------------------------------------
-----
inflate      |

x1 | .9047498 .0141011 64.16 0.000 .8771121.9323875x2 | .1003384 .0125745 7.98 0.000 .0756928.1249839_cons | .199909 .0123115 16.24 0.000 .175779.2240391

-------------+-----------------------------------------------------------
-----

/lnalpha | -.6870374 .0153996 -44.61 0.000 -.7172202-.6568547

-------------+-----------------------------------------------------------
-----

alpha | .5030642 .007747 .4881072.5184796

-------------------------------------------------------------------------
-----








Date: Mon, 14 Nov 2011 16:04:17 +0000
From: Daniel Hill-McManus <[email protected]>

Subject: st: simulating random numbers from zero inflated negativebinomial estimates


Hi,
I came across this code recently that Paul wrote in response to a query
(see below).

I've also found it useful, thank you. But working through it I cannot

understand why the linear predictor p2 is not exponentiated beforebeing

used in rgamma(1/alph, alph*p2).

I'd be grateful if someone could point out to me why this is.

Dan

On 3/06/2011 2:56 a.m., E. Paul Wileyto wrote:

    I'm not sure whether anyone has answered this yet.


    First, read the help on zinb post-estimation commands. There are
many flavors of "predict" listed there. You will need three of them

before you start generating random numbers. The first one you will needis:


    predict p1, pr


    That will generate a new variable, p1, which will be the predicted
probability of an inflated zero. All the work is done for you.

    The second predicted quantity you will need is:

    predict lp, xb


    That will generate the linear combination of predictor variables
weighted by coefficients for the negative binomial part of the model.
Finally, you will need:

    predict alpha, xb eq(#3)


    which will generate a variable containing the overdispersion

parameter for the negative binomial. With those three bits, you can geton to simulating.


    Here's my script:

    zinb cignums drug  week, inf(drug  week)
    predict p1 , pr
    predict p2 , xb
    predict lnalpha , xb eq(#3)
    gen alph=exp(lnalpha)
    gen xg=rgamma(1/alph, alph*p2)
    gen pg=rpoisson(xg)
    gen zi=runiform()>p1
    gen newcigs=zi*pg

    zinb newcigs drug  week, inf(drug  week)


    Paul




*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Prev by Date: RE: st: RE: Test of differences between geometric means
Next by Date: RE: st: RE: Test of differences between geometric means
Previous by thread: st: ML for logit/ologit
Next by thread: st: xtpcse - "no time periods are common to all panels"
Index(es):
- Date
- Thread