Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: Binary model with many zeros and few ones
From
Cameron McIntosh <[email protected]>
To
STATA LIST <[email protected]>
Subject
RE: st: Binary model with many zeros and few ones
Date
Fri, 6 Jan 2012 08:49:17 -0500
Absolutely right... I would recommend:
Maalouf, M., & Trafalis, T.B. (2011). Robust weighted kernel logistic regression in imbalanced and rare events data. Computational Statistics & Data Analysis, 55(1), 168-183.
Newman, T.B. (1995). If Almost Nothing Goes Wrong, Is Almost Everything All Right? Interpreting Small Numerators. JAMA, 274(13), 1013.
King, G., & Zeng, L. (2001a). Explaining Rare Events in International Relations. International Organization, 55(3), 693-715.
King, G., & Zeng, L. (2001). Logistic Regression in Rare Events Data. Political Analysis, 9(2), 137-163.
http://gking.harvard.edu/files/0s.pdf ;
Tomz, M., King, G., & Zeng, L. (2003). ReLogit: Rare Events Logistic Regression. Journal of Statistical Software, 8(2).http://www.jstatsoft.org/v08/i02
Quigley, J., & Revie, M. (2011). Estimating the Probability of Rare Events: Addressing Zero Failure Data. Risk Analysis, 31(7), 1120–1132.
Quigley, J., Hardman, G., Bedford, T., & Walls, L. (2011). Merging expert and empirical data for rare event frequency estimation: Pool homogenisation for empirical Bayes models. Reliability Engineering & System Safety, 96(6), 687-695.
Zelig (R) does this too, for those interested:
Imai, K., King, G., & Lau, O. (January 2, 2012). Everyone’s Statistical Software, Package ‘Zelig’, Version 3.5-1.http://gking.harvard.edu/zelig/docs/zelig.pdfhttp://cran.r-project.org/web/packages/Zelig/index.html
Imai, K., King, G., & Lau, O. (2007). Zelig: Everyone’s Statistical Software. http://GKing.harvard.edu/zelig
Imai, K., King, G., & Lau, O. (2008). Toward A Common Framework for Statistical Analysis and Development. Journal of Computational Graphics and Statistics, 17(4), 892-913.http://gking.harvard.edu/gking/files/z.pdfhttp://gking.harvard.edu/zelig/http://www.r-project.org/user-2006/Slides/ImaiEtAl.pdfhttp://imai.princeton.edu/talk/files/kansas10.pdf
Cam
> Date: Fri, 6 Jan 2012 11:33:36 +0000
> Subject: Re: st: Binary model with many zeros and few ones
> From: [email protected]
> To: [email protected]
>
> Zero inflation as I understand it applies to situations in which there
> is some kind of mixture of individuals who are zero for one reason and
> individuals who are zero or one for another reason. For example, many
> people never visit football matches and some may visit football
> matches but just didn't do so during some survey period. I don't
> think your description here justifies that term. Some people might
> want to describe your situation as one of rare events and you might
> want to Google "Gary King rare events logit". But that said, I would
> certainly try -logit- or -probit- first.
>
> Nick
>
> On Fri, Jan 6, 2012 at 11:15 AM, Nikolaos Kanellopoulos
> <[email protected]> wrote:
>
> > I have a dataset of around 880 thousand observations and I want to measure as accurately as possible the relationship between certain variables and an event described by a binary variable. My dependent variable has very few ones (around 1.5% of the observations).
> >
> > My question, and I apologize in advance if this has been asked in the Statalist before, which is the best way to analyse this “zero inflated” binary variable? Is it OK to use a simple probit or logit model? Any suggestions/references are more than welcome.
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/