Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: SAS vs STATA : why is xtlogit SO slow ?

From	"Joseph Coveney" <[email protected]>
To	<[email protected]>
Subject	Re: st: SAS vs STATA : why is xtlogit SO slow ?
Date	Sat, 11 Feb 2012 17:10:46 +0900

Francesco wrote (excerpted):

clogit seems not to behave well when some individuals show a very
(very very) high number of points. 

[Example omitted]

What do you think ? Are you convinced ?

I might just say that I am not telling that one package is better than
the other : they have all their strenghts and weaknesses.
I confess also that I personally prefer to use Stata, so if my
hypothesis is right and it is possible to fix this rare problem, I
will be very very glad.

I wait for your eventual comments before sending the code and
explanation to the support team

--------------------------------------------------------------------------------

Good job.  It looks like some kind of underflow problem in computing the
conditioning factor when T exceeds some threshold for a given N-and-T
combination.  I suppose that with such long Ts, it's a matter of a log of a tiny
probability subtracted from another one, or something analogous.  I don't know
what SAS does; perhaps it implements some critical parts of the algorithm in
quad-precision and so can go out further in T before it, too, breaks down.

You can find edge cases where -technique(bhhh)- saves the day (see example
below, in the do-file and the first manual run after do-file terminates), and
that might have been what Yu Xue saw.  Yours are way over the threshold at T =
2000+, and the log-likelihood underflows upon the zeroeth iteration (see example
below, the second manual run after do-file terminates), and things go downhill
from there, regardless of what you try.  In the empty model (the third manual
run after the do-file terminates), it appears to converge, but with a
log-likelihood of -8.99e+307 in the iteration log and missing in the regression
table, and so whatever it is seems to involve the conditioning factor and not
necessarily something in the linear predictor and its derivatives.

Go ahead and send your findings to StataCorp, and see what they think.

Joseph Coveney


. do "F:\1R.do"

. version 11.2

.. clear *

. set more off

. quietly set obs 25

. set seed `=date("2012-02-11", "YMD")'

. generate int id = _n

. generate double constant = rnormal()

. generate int T = 1075

. expand T
(26850 observations created)

. generate int row = _n

. generate byte predictor = runiform() < 0.5

. generate byte response = runiform() < 0.5

. bysort id (row): generate int t = _n

. forvalues t = 1028/1029 {
  2.     display in smcl _newline(2) as text "T = " as result `t'
  3.     capture clogit response c.predictor if t <= `t', group(id)
  4.     if _rc == 430 {
  5.         display in smcl as error "Fails"
  6.         local T = `t'
  7.         continue, break
  8.     }
  9.     display in smcl as result "OK"
 10. }


T = 1028
OK


T = 1029
Fails

. clogit response c.predictor if t <= `T', group(id) ///
>     technique(bhhh) trace gradient
note: multiple positive outcomes within groups encountered.

--------------------------------------------------------------------------------
---------
Iteration 0:
Parameter vector:
     response:
    predictor
r1   .0147443

                                                   log likelihood = -17721.933
Gradient vector (length = 1.112278):
     response:
    predictor
r1  -1.112278
--------------------------------------------------------------------------------
---------
Iteration 1:
Parameter vector:
     response:
    predictor
r1   .0140944

                                                   log likelihood = -17721.932
Gradient vector (length = .0693783):
     response:
    predictor
r1  -.0693783
--------------------------------------------------------------------------------
---------

Conditional (fixed-effects) logistic regression   Number of obs   =      25725
                                                  LR chi2(1)      =       0.32
                                                  Prob > chi2     =     0.5735
Log likelihood = -17721.932                       Pseudo R2       =     0.0000

                                     (Std. Err. adjusted for clustering on id)
------------------------------------------------------------------------------
             |                 OPG
    response |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
   predictor |   .0140944   .0203746     0.69   0.489     -.025839    .0540279
------------------------------------------------------------------------------

. exit

end of do-file


. clogit response c.predictor if t <= 1029, group(id)
note: multiple positive outcomes within groups encountered.

Iteration 0:   log likelihood = -17721.933  
Iteration 1:   log likelihood =    -1.#IND  
Hessian is not negative semidefinite
r(430);

. clogit response c.predictor, group(id)
note: multiple positive outcomes within groups encountered.

Iteration 0:   log likelihood =    -1.#INF  
Iteration 1:   log likelihood =    -1.#IND  
Hessian is not negative semidefinite
r(430);

. clogit response , group(id)
note: multiple positive outcomes within groups encountered.

Iteration 0:   log likelihood = -8.99e+307

Conditional (fixed-effects) logistic regression   Number of obs   =      26875
                                                  LR chi2(0)      =          .
Log likelihood =          .                       Prob > chi2     =          .

------------------------------------------------------------------------------
    response |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
------------------------------------------------------------------------------

.


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: SAS vs STATA : why is xtlogit SO slow ?
  - From: Francesco <[email protected]>

References:
- Re: st: SAS vs STATA : why is xtlogit SO slow ?
  - From: Clyde B Schechter <[email protected]>
- Re: st: SAS vs STATA : why is xtlogit SO slow ?
  - From: Klaus Pforr <[email protected]>
- Re: st: SAS vs STATA : why is xtlogit SO slow ?
  - From: [email protected]
- Re: st: SAS vs STATA : why is xtlogit SO slow ?
  - From: Klaus Pforr <[email protected]>
- Re: st: SAS vs STATA : why is xtlogit SO slow ?
  - From: [email protected]
- Re: st: SAS vs STATA : why is xtlogit SO slow ?
  - From: "Joseph Coveney" <[email protected]>
- Re: st: SAS vs STATA : why is xtlogit SO slow ?
  - From: [email protected]
- Re: st: SAS vs STATA : why is xtlogit SO slow ?
  - From: "Joseph Coveney" <[email protected]>
- Re: st: SAS vs STATA : why is xtlogit SO slow ?
  - From: Francesco <[email protected]>

Prev by Date: RE: st: Interaction model
Next by Date: st: Which matching command works best?
Previous by thread: Re: st: SAS vs STATA : why is xtlogit SO slow ?
Next by thread: Re: st: SAS vs STATA : why is xtlogit SO slow ?
Index(es):
- Date
- Thread