Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Tom <tommedema@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: xtlogit: panel data transformation's recast to double makes model incomputable |
Date | Tue, 2 Apr 2013 19:36:31 +0200 |
Thanks everyone, your assistance is very much appreciated. I shall answer all of your questions, but first some results using `clogit` for a number of key variables showing important debug information. As suggested, I have randomly removed 90% of the 8000 groups to account for possible memory issues, leaving 800 groups left with ~400,000 observations. This did not seem to help though. I worked on this today. I hope the following will give you folks some hints on what is going wrong. A) command: clogit depc_gpf30 close, group(ticker_id) showstep trace results (FAIL): http://pastebin.com/nHdd5EeS note: close is a highly right skewed price B) command: clogit depc_gpf30 close_log, group(ticker_id) showstep trace results (FAIL): http://pastebin.com/jZhMyPLb note: close_log is the log transformation of close because close is a highly right skewed variable C) command: clogit depc_gpf30 close_ihs, group(ticker_id) showstep trace results (FAIL): http://pastebin.com/GF5raqhR note: close_ihs is the inverse hyperbolic sine transformation of close because close is a highly right skewed variable (alternative to log) D) command: clogit depc_gpf30 close_g1 close_g4 close_g7 close_g15 close_g20 close_g30 close_g40 close_g50 close_g60 close_g70 close_g80 close_g90, group(ticker_id) showstep trace results (SUCCESS): http://pastebin.com/ru9euGGu note: at this point it still converges, but if I add one more variable like in E things change.. (and this does not just happen with this one variable, with others too) E) command: clogit depc_gpf30 close_g1 close_g4 close_g7 close_g15 close_g20 close_g30 close_g40 close_g50 close_g60 close_g70 close_g80 close_g90 close_g100, group(ticker_id) showstep trace results (FAIL): http://tinypaste.net/WfSNqD76 note: adding one more variables to D causes the issue, but it does matter which variable. If I change close_g100 (the integer denotes a lag in days) by e.g. close_g5, it does compute. However, the problem is not limited to close_g100. For example, it also doesn't work when I add close_g120. See the output of the same regression but with close_g120 here: http://tinypaste.net/Aa69SrC1 Because there seems to be a difference between variables with lower lags and variables with higher lags I have created summary statistics for close_g5, close_g30, close_g100 and close_g120, please view them here: http://pastebin.com/9cfMiXQH Moreover, it is a combination of the variables that causes this, because the following works just fine: F) command: clogit depc_gpf30 close_g100, group(ticker_id) trace showstep result: SUCCESS.. output omitted So just using close_g100 or close_g120 causes no problems. You might expect that E) follows from multicollinearity problems, but as this does not seem to be the case, see this `collin` output: http://pastebin.com/p9BRP6bf To me these seem the most striking results, but I ask you to look at the complete results too as I may have missed something.