Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: Binary model with many zeros and few ones

From	Cameron McIntosh <[email protected]>
To	STATA LIST <[email protected]>
Subject	RE: st: Binary model with many zeros and few ones
Date	Fri, 6 Jan 2012 08:49:17 -0500

Absolutely right... I would recommend:

Maalouf, M., & Trafalis, T.B. (2011). Robust weighted kernel logistic regression in imbalanced and rare events data. Computational Statistics & Data Analysis, 55(1), 168-183.

Newman, T.B. (1995). If Almost Nothing Goes Wrong, Is Almost Everything All Right? Interpreting Small Numerators. JAMA, 274(13), 1013. 

King, G., & Zeng, L. (2001a). Explaining Rare Events in International Relations. International Organization, 55(3), 693-715.
King, G., & Zeng, L. (2001). Logistic Regression in Rare Events Data. Political Analysis, 9(2), 137-163.
http://gking.harvard.edu/files/0s.pdf ;
Tomz, M., King, G., & Zeng, L. (2003). ReLogit: Rare Events Logistic Regression. Journal of Statistical Software, 8(2).http://www.jstatsoft.org/v08/i02
Quigley, J., & Revie, M. (2011). Estimating the Probability of Rare Events: Addressing Zero Failure Data. Risk Analysis, 31(7), 1120–1132.
Quigley, J., Hardman, G., Bedford, T., & Walls, L. (2011). Merging expert and empirical data for rare event frequency estimation: Pool homogenisation for empirical Bayes models. Reliability Engineering & System Safety, 96(6), 687-695. 
Zelig (R) does this too, for those interested:

Imai, K., King, G., & Lau, O. (January 2, 2012). Everyone’s Statistical Software, Package ‘Zelig’, Version 3.5-1.http://gking.harvard.edu/zelig/docs/zelig.pdfhttp://cran.r-project.org/web/packages/Zelig/index.html

Imai, K., King, G., & Lau, O. (2007). Zelig: Everyone’s Statistical Software. http://GKing.harvard.edu/zelig

Imai, K., King, G., & Lau, O. (2008). Toward A Common Framework for Statistical Analysis and Development. Journal of Computational Graphics and Statistics, 17(4),  892-913.http://gking.harvard.edu/gking/files/z.pdfhttp://gking.harvard.edu/zelig/http://www.r-project.org/user-2006/Slides/ImaiEtAl.pdfhttp://imai.princeton.edu/talk/files/kansas10.pdf

Cam

> Date: Fri, 6 Jan 2012 11:33:36 +0000
> Subject: Re: st: Binary model with many zeros and few ones
> From: [email protected]
> To: [email protected]
> 
> Zero inflation as I understand it applies to situations in which there
> is some kind of mixture of individuals who are zero for one reason and
> individuals who are zero or one for another reason. For example, many
> people never visit football matches and some may visit football
> matches but just didn't do so during some survey period.  I don't
> think your description here justifies that term. Some people might
> want to describe your situation as one of  rare events and you might
> want to Google "Gary King rare events logit". But that said, I would
> certainly try -logit- or -probit- first.
> 
> Nick
> 
> On Fri, Jan 6, 2012 at 11:15 AM, Nikolaos Kanellopoulos
> <[email protected]> wrote:
> 
> > I have a dataset of around 880 thousand observations and I want to measure as accurately as possible the relationship between certain variables and an event described by a binary variable. My dependent variable has very few ones (around 1.5% of the observations).
> >
> > My question, and I apologize in advance if this has been asked in the Statalist before, which is the best way to analyse this “zero inflated” binary variable? Is it OK to use a simple probit or logit model? Any suggestions/references are more than welcome.
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
 		 	   		  
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: Binary model with many zeros and few ones
  - From: Nikolaos Kanellopoulos <[email protected]>
- Re: st: Binary model with many zeros and few ones
  - From: Nick Cox <[email protected]>

Prev by Date: Re: st: stcox in case the ph-assumption is rejected
Next by Date: Re: st: stcox in case the ph-assumption is rejected
Previous by thread: Re: st: Binary model with many zeros and few ones
Next by thread: st: centile with if
Index(es):
- Date
- Thread