Jay Kaufman <[email protected]> asks:
> The -binreg- routine fits generalized linear models for the binomial family.
> It is presumably preferred over fitting the same model in -glm-, not only
> for the convenience of not having to specify the distributional family in
> the command line, but also because in iteratively seeking the estimates it
> checks to make sure that they are consistent with the range of allowable
> probabilities (i.e. 0 to 1), as described on page 138 of the manual [Ref
> A-G]. So my question is, why does -binreg- appear to be so bad at this
> checking?
Actually, with Stata 7 -binreg- and -glm- are one in the same; -binreg- now
only serves as a front end to a -glm- call. The only advantage of using
-binreg- (besides not having to specify -family(binomial)- to -glm-) is that
you do not have to know which link goes with which type of estimate, i.e. use
a log link to get risk ratios.
With Stata 7, -glm- was overhauled with one of the improvements being the
inclusion of all the -binreg- special code to bump predicted probabilities
back into the range (0,1) before applying the link function during the
iterative estimation. As such, this made -binreg- obsolete, except for
serving as a frontend to the more powerful -glm-.
The reason you saw a difference in output between -binreg- and -glm- is that
-binreg- uses IRLS exclusively, yet the default for -glm- is to use
Newton-Raphson maximum likelihood. If you add the -irls- option to your
-glm- commands you'll see that there is no difference at all.
> Take a very simple model using the auto.dta.
> . use "C:\Stata\auto.dta", clear
> (1978 Automobile Data)
> . binreg foreign mpg, rr
This is precisely the same as
. glm for mpg, fam(binom) link(log) irls eform
[output omitted]
> . predict phat, mu
> . sum phat
>
> Variable | Obs Mean Std. Dev. Min Max
> -------------+-----------------------------------------------------
> phat | 74 .3008965 .22691 .1072727 1.580984
> Clearly a predicted probability > 1.5 is not a good estimate. Did I do
> something wrong? Or did -binreg- do something wrong? Or is this simply
> another example of why linear models of the logit and probit have dominated
> analysis of binary data for decades?
The "bumping predicted probabilities back into the range (0,1)" only occurs
during the iterative estimation, and thus one of the dangers of this is that
you end up with parameter estimates that produce linear predictors that
produce inverse link transformations that really want to be outside the range
(0,1). The fact that you were bumping these inverse links back into (0,1) was
done so that you are able to calculate a likelihood (or deviance) and have
something to work with. The only other alternative is to produce an error.
At the convergence step it is hoped that no "bumping" was necessary, but if it
was then you get the behavior above. Since an exponentiated linear predictor
can take on any positive value, such difficulty is just a fact of life when
using the log link.
You have a good point: This is one good reason that logit and probit links
are dominant.
In your analysis, I'll note there is only one problematic point: observation
71, and all other observations give predicted probabilities in the proper
range. This observation has the largest value of -mpg- (41) and it can be
argued that this value is an outlier (or influential point in the context of
this model).
In summary, such behavior is not uncommon when using the binomial family with
a log link, especially in the case of outliers/influential points.
> By the way, note that if I fit the exact same model using -glm-, this same
> observation gets a predicted probability of 1.43, so -binreg- actually seems
> to do worse.
See the above.
--Bobby
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/