Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Discrepancy between logit and log links in glm


From   [email protected] (Roberto G. Gutierrez, StataCorp)
To   [email protected]
Subject   Re: st: Discrepancy between logit and log links in glm
Date   Tue, 20 Jul 2004 15:13:20 -0500

Clyde Schechter <[email protected]> asks: 

> My situation is this: when I fit a logistic model to my data, everything
> appears to run well, with convergence achieved after 3 iterations using
> -logistic-, -binreg, or- or -glm, fam(binomial) link(logit)-.  My client
> would prefer to express the associations as relative risks, rather than odds
> ratios. When I run the exact same data using -binreg, rr- or -glm,
> fam(binomial) link(log)- the iteration log says that the likelihood function
> is not concave, and after 50 iterations it gives up with convergence not
> achieved.

> Looking at the predicted values of p from the logistic regression, they are
> all between 0.10 and 0.82, so neither the log nor logit functions exhibits
> any particularly pathological behavior in the working range.  In fact, over
> this range, the relationship between log p and logit p is reasonably close
> to linear.  I am wondering why the two models would exhibit such different
> behavior.

Failure to converge with a log link is common and usually the result of a
ridge in the likelihood caused by one or more observations producing a linear
predictor that, when mapped to a predicted probability, falls outside the
admissable range of [0,1].

A log link has an exponential inverse link, and since the range of exp() is
zero to infinity, the possibility exists that a linear predictor could get
mapped to a predicted probability greater than one.  If this happens for just
one observation in your data, you will have problems.  How one solves these
problems I describe in detail below.  It seems that even though your
logit-produced predicted probabilities range form 0.10 to 0.82, that when 
you switch to a log link one or more of the probabilities on the higher end
are getting mapped to be greater than one.  This really doesn't surprise 
me, since once you get past x = 0 (predicted probability 0.5 for logit),
invlogit(x) and exp(x) are very different functions -- the former inflects to
be concave down while the latter keeps rising faster and faster.

> To make matters more interesting, if I precede the whole thing with -version
> 6- (this problem actually came from a colleague who is running version 6),
> the -glm, fam(binomial) link(log)- run converges after 6 iterations, but
> -binreg, rr- (which I thought is a wrapper for -glm-) still fails.

Your assumption is correct -- they should be the same when running under
version 6, and I would be interested to see a log of the above experiment.  In
any case, an explanation of why -glm/binreg- differs from version 6 to version
8.2 is given below, and it may help to explain in general why you sometimes do
not converge.

There are several key differences between -glm-/-binreg- pre-version 7 and
Stata 7 and beyond, and these differences tend to hit hardest when using 
-family(binomial) link(log)-.  The differences are summarized as follows.

There are four different animals to consider here:  

1.  -glm- prior to Stata 7, which from now on I will call -glm6-.

2.  -binreg- prior to Stata 7, which from now on I will call -binreg6-.

3.  Modern, Stata 7 and beyond -glm-, which I will just call -glm-.

4.  Modern, Stata 7 and beyond -binreg-, which I will just call -binreg-.

---------------------------------------------------------------------------

1.  -glm6- is Stata's older version of -glm-, which has since be rewritten
    with the release of Stata 7.  It is, however, still available under
    version control.   Under a modern Stata, when you type

    . version 6: glm ...

    you get -glm6-.

    What distinguishes -glm6- from modern -glm-?  Two things mainly: 
    -glm6- estimates entirely using iterative reweighted least squares (IRLS)
    while -glm- allows for both maximum likelihood estimation (ML) and IRLS,
    with ML being the default.  Also, the IRLS algorithm from -glm6- was
    enhanced in -glm- in order to (among other things) allow user-defined
    links and variance functions.  In most cases, you won't see a difference
    between -glm6- and -glm, irls-, but sometimes when a model is
    non-convergent you may see different behavior between the two.  For
    example, -glm6- may "converge" but with missing standard errors, whereas
    -glm, irls- will just fail to converge.

    Both -glm6- and -glm, irls- use the following method for handling inverse
    links that map outside [0,1] in binomial models.  They simply tweak the
    inverse link so that if (for example) it wants to be greater than one it
    makes it just less than one.  This helps ease estimation along during 
    early iterations, but as you converge this creates a ridge in the
    likelihood where the linear predictor is allowed to get arbitrarily large
    while the inverse link remains constant at just below one.  This causes
    the algorithm to not converge.

2.  Under Stata 6, non-convergent -glm6- prompted the invention of -binreg6-.
    -binreg6- was not an official Stata command.  It was published (under
    the name -binreg-) in STB-50 by James Hardin and Mario Cleves, who were
    employed at StataCorp. at the time.

    -binreg6- fits binomial models with various choices for links, logit 
    (for odds ratios), log (for risk ratios), log-complement (for health 
    ratios), and identity (for risk differences).  It fits these models via
    IRLS.  The algorithm as coded in -binreg6- is basically the same as that
    in -glm6-, with one exception.

    The log and identity links proved problematic in that their inverse links
    can map to predicted probabilities outside of [0,1].  In order to handle
    this, Hardin and Cleves employed the technique of Wacholder (1986) which
    directly manipulates problematic linear predictors to ensure that a
    predicted probability in [0,1] is always produced.  This is relatively
    easy to do within the confines of an IRLS algorithm and produces
    convergent results.  The difference between this algorithm and that used
    by -glm6- is that with -binreg6- the linear predictor is itself changed to
    map to within [0,1], whereas -glm6- does not change the linear predictor
    but instead simply maps it into the admissable range.  The difference 
    is subtle, but significant.

    One must be careful, however, when this happens.  How does one interpret
    the resulting coefficients, risk ratios, etc., since they are not based on
    the original model, but instead on a model that allows for the
    manipulation of the linear predictor?  This is not clear, nor is it even
    easy to detect when manipulation took place unless you predict the
    probabilities and check them yourself post-estimation.  If you had to
    manipulate the linear predictor in order to obtain convergence, are you
    really getting a risk ratio?  If you manipulated sparingly, maybe so, but
    how do you know?   

    In my opinion, the tweaking done by -binreg6- is dangerous because it
    masks the problem that your data may really not be appropriate for a log
    link.  The tweaking done by -glm6-, however, is not as dangerous because
    it is designed mainly to overcome problems early in estimation, such as 
    lousy staring values.  If you indeed have data that are not appropriate for
    a log link, the tweaking done in -glm6- is not going to save you -- you 
    will still get a ridge in the likelihood and the non-convergence (or 
    missing standard errors) will clue you to the fact that a log link may 
    not be appropriate.

    Now when I say "not appropriate" I don't mean that risk ratios don't 
    exist for these models.  They do, it's just that you'll be hard pressed to
    estimate them with -glm-, because in order to do so you are forced to use
    a log link which causes estimation problems for high probability events.
    What, then, do you do?  Later in this message I list three alternatives.

3.  Modern -glm- is an entire rewrite of -glm6-, and it was done so in order
    to allow user-defined link and variance functions, and to allow a myriad 
    of variance options -- Huber Robust, jacknife, and bootstrap just to name
    a few.

    -glm- fits models via ML by default, and optionally you can fit models via
    IRLS by specifying option -irls-.

    Whether you use ML or IRLS, the technique for handling linear predictors
    which map outside of [0,1] is the same as that in -glm6- and, as a result, 
    your binomial models with log links will often not converge when you have
    high probability outcomes.

4.  Because of the concerns raised in 2., it was decided (by not only myself, 
    but also by Dr. Hardin and Dr. Cleves) that modern -binreg- not be a
    direct adoption of -binreg6-, but instead a wrapper for -glm, irls-.  As a
    result, in modern -binreg- you will see the same non-convergent behavior
    in log link models.

> To make matters still more interesting still, some of the independent
> variables in the model are dummies created using xi.  The results described
> above are achieved using -xi i.depvars- as a separate command.  If, however,
> I run (after -version 6-) -xi: glm i.depvars, fam(binomial) link(log)-,
> there is once more failure of convergence.  (Without -version 6-, it doesn't
> matter how -xi- is used, convergence fails either way.)

Again a mystery, but possibly due to a change in the omitted dummy variable.
If Clyde can send me a log file, I'll be glad to take a closer look.

> What is going on here?  Is there anything I can do to achieve convergence
> under version 8.2 with a log link? 

When I began this reply, I had three alternatives in mind.  Since then, Roger
Webb <[email protected]> has mentioned the first to the list:  use
-poisson- with the -robust- option.  He states that the results are the same
as -glm, fam(binomial) link(log)-.  They are not exactly the same, but nearly
the same in most cases.  The discrepancy comes from the fact that Poisson
models allow for counts bigger than 0/1, but when your data are all 0/1, the
probability of counts bigger than 1 is usually going to be small, and
oftentimes too small to make a difference.  Note, however, that the difference
between binomial/log and Poisson increases with the probability of an event,
so the discrepancy can be higher in cases where binomial/log would fail to
converge in the first place.  Be cautious.

Secondly, if the number of problematic observations is small, you can simply
drop them from the analysis.  Begin with a logit model, remove the one or two
observations with the highest predicted probabilities, then try the log link
model and see if you get convergence.  Repeat as necessary.  Of course, you
will be dropping observations so the risk ratios are now conditional upon that
fact.  The intracicies/issues/dangers of doing this in an automated way are
too numerous to mention, so I'll just leave it at that.

Thirdly, as the problem of obtaining reliable risk ratios comes up quite
frequently on this list, Jean-Marie Linhart <[email protected]> and I have
been working on an alternative to the tweaking that takes place in old and new
versions of -glm/binreg-.  Our idea is as follows:  If you have to tweak the
estimation in order to get convergence, why not tweak in a way which you can 
directly control and that allows for you to incorporate what you did into the
iterpretation of the risk ratios.

With this in mind, and because we can program our own link functions into
modern -glm-, we came up with a new link function, the loglogit link.  The
loglogit link is basically the log link, but at a high predicted probability
(.95, say) the function turns smoothly and continuously into a scaled logit
link.  This ensures that all linear predictors get mapped into [0,1].  With
this link, you get convergence AND you also get to incorporate what you did
into the interpretation of the "risk ratios".  For example, if you switch from
log to logit at .95 (the user can specify the switching point), the
interpretation is simply "When the probability of an event is less than .95
the risk ratio is _________, otherwise _________ is an odds ratio."

As our schedules allow, we plan on submitting this to the Stata Journal.  In
the meantime, if anyone is interested in the code, send me an email.  Any
feedback would be much appreciated.

> And is the output of the convergent -glm, fam(binomial) link(log)- run
> credible?

In its modern implementation, the answer is yes, because we will not mask
any problems that arise in the estimation.  Thus, if you get convergent
results you can feel safe in the interpretation of risk ratios.  Even if by
some rarity you get a convergent model yet a predicted probability greater
than one, you will be issued a warning that you may click on for an
explanation of what happened.  If you wish to read this explanation, type

      . help j_glmadmiss

--Bobby
[email protected]
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2025 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index