Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Nick Cox <njcoxstata@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: Binary model with many zeros and few ones |
Date | Fri, 6 Jan 2012 11:33:36 +0000 |
Zero inflation as I understand it applies to situations in which there is some kind of mixture of individuals who are zero for one reason and individuals who are zero or one for another reason. For example, many people never visit football matches and some may visit football matches but just didn't do so during some survey period. I don't think your description here justifies that term. Some people might want to describe your situation as one of rare events and you might want to Google "Gary King rare events logit". But that said, I would certainly try -logit- or -probit- first. Nick On Fri, Jan 6, 2012 at 11:15 AM, Nikolaos Kanellopoulos <nkkanel@yahoo.gr> wrote: > I have a dataset of around 880 thousand observations and I want to measure as accurately as possible the relationship between certain variables and an event described by a binary variable. My dependent variable has very few ones (around 1.5% of the observations). > > My question, and I apologize in advance if this has been asked in the Statalist before, which is the best way to analyse this “zero inflated” binary variable? Is it OK to use a simple probit or logit model? Any suggestions/references are more than welcome. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/