Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: xtlogit: panel data transformation's recast to double makes model incomputable


From   Tom <[email protected]>
To   [email protected]
Subject   Re: st: xtlogit: panel data transformation's recast to double makes model incomputable
Date   Wed, 3 Apr 2013 00:51:50 +0200

Dear Jay,

At first I thought that the problem may indeed be related to near
perfect prediction, but this would only be possible with a very small
sample size (it is not possible to perfectly predict prices with large
samples). Therefore I looked if any of the groups had a small number
of observations, and indeed there were a few due to some missing
values. I removed these such that every group now has at least 250
observations, but the issue remains and seems to be exactly the same.
Therefore I do not think this is related to "near perfect prediction
issues".

For completeness, this is a log of the (still running) regression
showing that it cannot compute:
http://pastebin.com/Pe2PYTXw

I'm still getting such messages:
log likelihood =    -1.#INF
(initial step bad)

What else can I try or what other information can I provide? The
methods I know on finding leverage points require me to actually get
the regression/estimation results. What approach do you suggest,
considering that I cannot get the results?

Kind regards,
Tom

On Tue, Apr 2, 2013 at 9:30 PM, JVerkuilen (Gmail)
<[email protected]> wrote:
> On Tue, Apr 2, 2013 at 3:14 PM, Tom <[email protected]> wrote:
>> Hi Jay,
>>
>> Per request these are the results of the "offending IVs" alone:
>>
>
> <snip>
>
> So it seems like it works on its own.
>
>
>>
>> This is real price data, I also verified it several times. There
>> appear to be no mistakes in the data.
>
> Yeah, I'm sure they're real, that's not the issue, it's the fact that
> the variable is so crazily skewed. That's what price data are like, of
> course.
>
>
>>
>> Do you have an explanation why close_g100 would fail whereas close_g30
>> does not? If you look at the summary statistics you'll see that the
>> close_g30 variable and close_g5 etc. are actually much more skewed and
>> have higher variations.
>>
>
> The best guess I have is that that variable creates a near-perfect
> prediction. Collinearity in and of itself is a problem but the big
> issue with logistic regression is usually perfect prediction. If you
> have that, the whole thing blows up and that's something that Stata
> detects automatically. What it will have a much harder time with is
> finding near-perfect prediction, where you are *close* but not quite
> at perfect prediction. The numerics will start to break down but it
> can happen in an unpredictable way that depends on small decisions on
> the way routine was programmed that ordinarily make no difference and
> thus are unlikely to be noticed in the testing process.
>
> You might want to look for really super high leverage points in the
> relevant design matrix. That might help identify if you have a
> problem. Also, start with a model that just has the offending variable
> in it and then start adding variables to the model.
>
> Another might be to experiment with different top coding schemes on the prices.
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index