Constantine Daskalakis <[email protected]> makes several
observations concerning convergence properties and predicted probabilities
based on binomial models with identity links, namely models of the form
. glm y x1 x2 ..., fam(bin) link(identity)
His general observation is that these models, when fit in Stata, often have
difficulty converging and often warn of predicted probabilities outside the
admissible range. These two problems are really one and the same. They are
borne out of the use of the identity link, which has a range encompassing the
entire real line in contrast to a response probability constrained to be in
the range [0,1].
Constantine also compares these behaviors to those of SAS, and observes SAS to
be more cooperative in models where Stata fails to converge. In what follows
I provide more details but I'll begin with a summary: In such situations,
regardless of software used, you have a non-convergent model unworthy of
serious interpretation. Different softwares have different ways of telling
you this, with Stata taking the most direct approach.
Consantine writes:
> Here's what I've found:
> (1) Convergence
> Stata often gets bogged down ("backed up") after a few iterations and does
> not converge.
> Specifying Fisher scoring for some iterations in the beginning helps. After
> Newton-Raphson takes over from Fisher scoring, it occasionally does
> converge. Most often, I have to use Fisher scoring throughout to get
> convergence. But see point #3 below.
What is happening here is that the maximum-likelihood algorithm is producing
parameter estimates that produce linear predictors in one or more observations
that bump up on the boundaries of [0,1]. Since a value outside of [0,1] is an
inadmissible probability, a constant probability just above zero or just below
one is used instead. If this occurs on a few observations, this isn't much of
a problem. If it occurs too much, however, using constants will produce a
ridge in the likelihood making convergence of Hessian-based ML difficult.
Such behavior and the resulting non-convergent model should serve as a signal
that your data are not appopriate for an identity link.
Convergence could be forced by using any number of alternate methods,
including
(a) deleting the offending observations from the analysis
(b) relaxing the convergence criterion
(c) gerrymandering regression coefficients so as to not produce inadmissible
predicted probabilities
just to name a few. You can mimic these behaviors in Stata through the
appropriate options or through some creative link-function programming, but we
do not recommend that. Any one of the above methods would help convergence,
but the price is one of model interpretibility. The resulting estimates would
not have the properties of standard MLE's, since they don't really maximize
the model likelihood.
> SAS does seem to often converge (on the basis of parameter vector
> convergence), but also warns that the "relative Hessian convergence
> criterion" has not been achieved and that "convergence is questionable"
> (indicating that the likelihood has not really converged sufficiently).
Both SAS and Stata are telling you the same thing. You have a non-convergent
model. A table of parameter estimates does not change that.
> (2) Likelihood of final model
> The log-likelihood of the final Stata model is often somewhat better than
> that of the final SAS model. This might suggest that the Stata results are
> "better". However, see the drawback in point #4 below.
This pretty much seals that the SAS results are not MAXIMUM likelihood.
> (3) Estimated coefficients and standard errors
> Naturally, when Stata and SAS give different final models, their estimated
> coefficients are different.
> But beware using Fisher's scoring throughout to get convergence and a final
> model. Sometimes, this final model will have absurdly small standard errors
> (with p < 0.001 for all variables). If something like this happens, it might
> be useful to compute standard errors using the option "OPG":
> - glm y x1 x2 ..., fam(bin) link(i) search fisher(#) opg
> [There are special complications when there are covariate levels that have
> observed probability of 0 or 1 (ie, all observations are "0s" or "1s"), but
> I'll leave this issue aside.]
Using Fisher scoring can help convergence, but standard errors based on Fisher
scoring require the additional assumption that your mean function is specified
correctly. If these standard errors are absurdly small, then this demonstrates
a violation of this assumption, providing further evidence that your data are
poorly suited to this model.
> (4) Estimated probabilities
> When Stata has convergence trouble (and sometimes when it does not), it
> warns that some "parameter estimates produce inadmissible mean estimates in
> one or more observations."
> SAS gives no such warnings.
We don't know why SAS sometimes keeps predictions in the [0,1], but don't read
too much into that. Either you have a non-convergent model in both SAS and
Stata, in which case nothing is interpretable, or perhaps SAS has used one the
methods (a), (b), (c), or another ad hoc adjustment. Even if we knew the
exact adjustment being made, it would be almost impossible to measure its
impact on model interpretibility.
--Bobby --Vince
[email protected] [email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/