Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Tom <tommedema@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: xtlogit: panel data transformation's recast to double makes model incomputable |
Date | Wed, 3 Apr 2013 00:51:50 +0200 |
Dear Jay, At first I thought that the problem may indeed be related to near perfect prediction, but this would only be possible with a very small sample size (it is not possible to perfectly predict prices with large samples). Therefore I looked if any of the groups had a small number of observations, and indeed there were a few due to some missing values. I removed these such that every group now has at least 250 observations, but the issue remains and seems to be exactly the same. Therefore I do not think this is related to "near perfect prediction issues". For completeness, this is a log of the (still running) regression showing that it cannot compute: http://pastebin.com/Pe2PYTXw I'm still getting such messages: log likelihood = -1.#INF (initial step bad) What else can I try or what other information can I provide? The methods I know on finding leverage points require me to actually get the regression/estimation results. What approach do you suggest, considering that I cannot get the results? Kind regards, Tom On Tue, Apr 2, 2013 at 9:30 PM, JVerkuilen (Gmail) <jvverkuilen@gmail.com> wrote: > On Tue, Apr 2, 2013 at 3:14 PM, Tom <tommedema@gmail.com> wrote: >> Hi Jay, >> >> Per request these are the results of the "offending IVs" alone: >> > > <snip> > > So it seems like it works on its own. > > >> >> This is real price data, I also verified it several times. There >> appear to be no mistakes in the data. > > Yeah, I'm sure they're real, that's not the issue, it's the fact that > the variable is so crazily skewed. That's what price data are like, of > course. > > >> >> Do you have an explanation why close_g100 would fail whereas close_g30 >> does not? If you look at the summary statistics you'll see that the >> close_g30 variable and close_g5 etc. are actually much more skewed and >> have higher variations. >> > > The best guess I have is that that variable creates a near-perfect > prediction. Collinearity in and of itself is a problem but the big > issue with logistic regression is usually perfect prediction. If you > have that, the whole thing blows up and that's something that Stata > detects automatically. What it will have a much harder time with is > finding near-perfect prediction, where you are *close* but not quite > at perfect prediction. The numerics will start to break down but it > can happen in an unpredictable way that depends on small decisions on > the way routine was programmed that ordinarily make no difference and > thus are unlikely to be noticed in the testing process. > > You might want to look for really super high leverage points in the > relevant design matrix. That might help identify if you have a > problem. Also, start with a model that just has the offending variable > in it and then start adding variables to the model. > > Another might be to experiment with different top coding schemes on the prices. > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/