
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: zero inflated beta [was: st: Information request]

From   "Lachenbruch, Peter" <[email protected]>
To   <[email protected]>
Subject   RE: zero inflated beta [was: st: Information request]
Date   Thu, 13 Aug 2009 08:32:54 -0700

The situation seems to be a hurdle model or two-part model.  It is
related to zero inflated Poisson or negative binomial.  In this case,
the zeros are identifiable.  So the problem is related to checking for
common proportions and equal slopes among the non-zero variables.  Here
are some references to get you started.

Lachenbruch,  P. A., (2001) "Comparison of competitors to the two part
model" Statistics in Medicine 20(8) 1215-1234

Lachenbruch, P. A. (2001) "Power and Sample Size Requirements for
Two-part models" Statistics in Medicine 20(8) 1235-1239

Lachenbruch, P. A. (2002) "Analysis of Data with Excess Zeros"
Statistical Methods in Medical Research 11(4) 297-302

The last reference is an introduction to a special issue of SMMR devoted
to the issues of excess zeros.  Some papers are on mixture models (like
zip or zinb) and some to identifiable models.

Peter A. Lachenbruch
Department of Public Health
Oregon State University
Corvallis, OR 97330
Phone: 541-737-3832
FAX: 541-737-4001

-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Maarten buis
Sent: Thursday, August 13, 2009 1:34 AM
To: [email protected]
Subject: zero inflated beta [was: st: Information request]

--- On Wed, 12/8/09, Fabio Zona wrote:
> I am in the unfortunate situation of running a regression
> analysis, whereby:
> - my dependent variable is a proportion (percentage of
> bonus on total compensation of top managers of 178
> corporations),
> - the majority (more than 50%) of my managers does not have
> any bonus, so the proportion is exact ZERO, that is, my
> dependent variable has many exact zeros.
> How can I estimate this model? do you know the command I
> should use in Stata?
> I know that I cannot use the fractional logit because I
> have many zeros. I have not found any zero-inflated logistic
> regression for situations whereby y are proportion

A zero inflated fractional logit model is hard to identify. A
zero-inflated beta is probably better, but there is obviously
a price (there is no such thing as a free lunch...), and that
is more restrictive assumptions. 

Below is a quick stab at implementing such a model. I haven't 
done any checking or certification on it, so it is up to you 
to determine whether this is program actually does what it is 
supposed to do. As a first step I would build a simulated 
dataset where you know what the parameters should be and 
check whether this program actually finds those.

Hope this helps,

*----------- begin example ---------------
program drop _all
set more off
input      prop str1 site variety
 0.0005    A       1
 0.0000    A       2
 0.0000    A       3
 0.0010    A       4
 0.0025    A       5
 0.0005    A       6
 0.0050    A       7
 0.0130    A       8
 0.0150    A       9
 0.0150    A       10
 0.0000    B       1
 0.0005    B       2
 0.0005    B       3
 0.0030    B       4
 0.0075    B       5
 0.0030    B       6
 0.0300    B       7
 0.0750    B       8
 0.0100    B       9
 0.1270    B       10
 0.0125    C       1
 0.0125    C       2
 0.0250    C       3
 0.1660    C       4
 0.0250    C       5
 0.0250    C       6
 0.0000    C       7
 0.2000    C       8
 0.3750    C       9
 0.2625    C       10
 0.0250    D       1
 0.0050    D       2
 0.0001    D       3
 0.0300    D       4
 0.0250    D       5
 0.0001    D       6
 0.2500    D       7
 0.5500    D       8
 0.0500    D       9
 0.4000    D       10
 0.0550    E       1
 0.0100    E       2
 0.0600    E       3
 0.0110    E       4
 0.0250    E       5
 0.0800    E       6
 0.1650    E       7
 0.2950    E       8
 0.2000    E       9
 0.4350    E       10
 0.0100    F       1
 0.0500    F       2
 0.0500    F       3
 0.0500    F       4
 0.0500    F       5
 0.0500    F       6
 0.1000    F       7
 0.0500    F       8
 0.5000    F       9
 0.7500    F       10
 0.0500    G       1
 0.0010    G       2
 0.0500    G       3
 0.0500    G       4
 0.5000    G       5
 0.1000    G       6
 0.5000    G       7
 0.2500    G       8
 0.5000    G       9
 0.7500    G       10
 0.0500    H       1
 0.1000    H       2
 0.0500    H       3
 0.0500    H       4
 0.2500    H       5
 0.7500    H       6
 0.5000    H       7
 0.7500    H       8
 0.7500    H       9
 0.7500    H       10
 0.1750    I       1
 0.2500    I       2
 0.4250    I       3
 0.5000    I       4
 0.3750    I       5
 0.9500    I       6
 0.6250    I       7
 0.9500    I       8
 0.9500    I       9
 0.9500    I       10

encode site, gen(sitenum)
gen byte left = sitenum <= 4

program define zibeta_lf
	*! MLB 0.0.1 13 Aug 2009
	version 8.2
	args lnf logitmu lnphi zb
	tempvar zero nonzero mu phi

	quietly gen double `zero' = invlogit(`zb')
	quietly gen double `nonzero' = invlogit(-`zb')
	quietly gen double `mu' = invlogit(`logitmu')
	quietly gen double `phi' = exp(`lnphi')

	quietly replace `lnf' =  ln(`nonzero') +    ///
	                         lngamma(`phi') - ///
                               lngamma(`mu'*`phi') - ///
                               lngamma((1-`mu')*`phi') + ///
                               (`mu'*`phi'-1)*ln($ML_y) + ///
                               ((1-`mu')*`phi'-1)*ln(1-$ML_y) ///
                               if $ML_y > 0

	quietly replace `lnf' =  ln(`zero') if $ML_y == 0
xi i.variety
ml model lf zibeta_lf (logitmu: prop = _I*) /lnphi (zg:left), robust
ml check
ml search
ml maximize

*--------------- end example ----------------------

Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen


*   For searches and help try:

*   For searches and help try:

© Copyright 1996–2025 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index