Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | "Joseph Coveney" <jcoveney@bigplanet.com> |
To | <statalist@hsphsun2.harvard.edu> |
Subject | Re: st: SAS vs STATA : why is xtlogit SO slow ? |
Date | Sat, 11 Feb 2012 17:10:46 +0900 |
Francesco wrote (excerpted): clogit seems not to behave well when some individuals show a very (very very) high number of points. [Example omitted] What do you think ? Are you convinced ? I might just say that I am not telling that one package is better than the other : they have all their strenghts and weaknesses. I confess also that I personally prefer to use Stata, so if my hypothesis is right and it is possible to fix this rare problem, I will be very very glad. I wait for your eventual comments before sending the code and explanation to the support team -------------------------------------------------------------------------------- Good job. It looks like some kind of underflow problem in computing the conditioning factor when T exceeds some threshold for a given N-and-T combination. I suppose that with such long Ts, it's a matter of a log of a tiny probability subtracted from another one, or something analogous. I don't know what SAS does; perhaps it implements some critical parts of the algorithm in quad-precision and so can go out further in T before it, too, breaks down. You can find edge cases where -technique(bhhh)- saves the day (see example below, in the do-file and the first manual run after do-file terminates), and that might have been what Yu Xue saw. Yours are way over the threshold at T = 2000+, and the log-likelihood underflows upon the zeroeth iteration (see example below, the second manual run after do-file terminates), and things go downhill from there, regardless of what you try. In the empty model (the third manual run after the do-file terminates), it appears to converge, but with a log-likelihood of -8.99e+307 in the iteration log and missing in the regression table, and so whatever it is seems to involve the conditioning factor and not necessarily something in the linear predictor and its derivatives. Go ahead and send your findings to StataCorp, and see what they think. Joseph Coveney . do "F:\1R.do" . version 11.2 .. clear * . set more off . quietly set obs 25 . set seed `=date("2012-02-11", "YMD")' . generate int id = _n . generate double constant = rnormal() . generate int T = 1075 . expand T (26850 observations created) . generate int row = _n . generate byte predictor = runiform() < 0.5 . generate byte response = runiform() < 0.5 . bysort id (row): generate int t = _n . forvalues t = 1028/1029 { 2. display in smcl _newline(2) as text "T = " as result `t' 3. capture clogit response c.predictor if t <= `t', group(id) 4. if _rc == 430 { 5. display in smcl as error "Fails" 6. local T = `t' 7. continue, break 8. } 9. display in smcl as result "OK" 10. } T = 1028 OK T = 1029 Fails . clogit response c.predictor if t <= `T', group(id) /// > technique(bhhh) trace gradient note: multiple positive outcomes within groups encountered. -------------------------------------------------------------------------------- --------- Iteration 0: Parameter vector: response: predictor r1 .0147443 log likelihood = -17721.933 Gradient vector (length = 1.112278): response: predictor r1 -1.112278 -------------------------------------------------------------------------------- --------- Iteration 1: Parameter vector: response: predictor r1 .0140944 log likelihood = -17721.932 Gradient vector (length = .0693783): response: predictor r1 -.0693783 -------------------------------------------------------------------------------- --------- Conditional (fixed-effects) logistic regression Number of obs = 25725 LR chi2(1) = 0.32 Prob > chi2 = 0.5735 Log likelihood = -17721.932 Pseudo R2 = 0.0000 (Std. Err. adjusted for clustering on id) ------------------------------------------------------------------------------ | OPG response | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- predictor | .0140944 .0203746 0.69 0.489 -.025839 .0540279 ------------------------------------------------------------------------------ . exit end of do-file . clogit response c.predictor if t <= 1029, group(id) note: multiple positive outcomes within groups encountered. Iteration 0: log likelihood = -17721.933 Iteration 1: log likelihood = -1.#IND Hessian is not negative semidefinite r(430); . clogit response c.predictor, group(id) note: multiple positive outcomes within groups encountered. Iteration 0: log likelihood = -1.#INF Iteration 1: log likelihood = -1.#IND Hessian is not negative semidefinite r(430); . clogit response , group(id) note: multiple positive outcomes within groups encountered. Iteration 0: log likelihood = -8.99e+307 Conditional (fixed-effects) logistic regression Number of obs = 26875 LR chi2(0) = . Log likelihood = . Prob > chi2 = . ------------------------------------------------------------------------------ response | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- ------------------------------------------------------------------------------ . * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/