|
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: Binomial regression
Vince Wiggins and I responded on this thread yesterday, in which one of our
main points was that forcing convergence when an identity link may not be
appropriate results in a loss of model interpretation. This is a case where
not only is there no free lunch, but the exact price of that lunch is unclear.
We mentioned various ways that convergence could be achieved. One method I
did not mention is the result of similar work I have done working with the log
link function. This method carries with it an ease of interpretation (akin to
knowing the price of your lunch) and involves using a modified version of the
identity link, something I call the "Identity/logit" link.
The Identity/Logit link is a simple modification of the standard identity
link that turns into a logit link outside a prespecified range, say outside
(0.01, 0.99), the default. Inside this range, you have the identity function;
outside you have a logit function translated and scaled so as to be continuous
and have continuous derivatives at the endpoints.
Since the link function turns to logit at the endpoints, the range of the
identity/logit inverse link is within (0,1), meaning that you won't get
inadmissible predicted probabilities and convergence can be achieved over a
much wider variety of datasets. The only drawback is one of interpretation,
however, the interpretation here is clear: When the predicted probability is
in the range (0.01, 0.99), your coefficients reflect risk differences,
otherwise, they are (rescaled) log(odds ratio)'s. In other words, you need
only qualify your treatment of risk differences to state that they are valid
for risks in the range (0.01, 0.99) rather than (0,1). To me this seems
better than trying to interpret the fruits of an ad hoc adjustment.
I attach the code for -identitylogit- below. A simple demonstration
. sysuse auto
. glm for price mpg, fam(bin) link(identity)
< no convergence >
. glm for price mpg, fam(bin) link(identitylogit) nolog
Generalized linear models No. of obs = 74
Optimization : ML Residual df = 71
Scale parameter = 1
Deviance = 70.50849514 (1/df) Deviance = .9930774
Pearson = 68.72329876 (1/df) Pearson = .9679338
Variance function: V(u) = u*(1-u) [Bernoulli]
Link function : g(u) = u -> logit(u) [Identity (0.010,0.990)]
AIC = 1.033899
Log likelihood = -35.25424757 BIC = -235.0801
------------------------------------------------------------------------------
| OIM
foreign | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
price | .0000516 .0000157 3.29 0.001 .0000209 .0000823
mpg | .0471231 .0093466 5.04 0.000 .0288041 .0654421
_cons | -1.046464 .2299337 -4.55 0.000 -1.497126 -.5958023
------------------------------------------------------------------------------
The identity/logit link also accepts an optional argument, p, for controlling
the range of the identity part of this link. For example,
. glm for price mpg, fam(bin) link(identitylogit 0.05)
would give a valid range for risk differences of (0.05, 0.95) rather than
the default (0.01, 0.99).
--Bobby
[email protected]
--------------------------------begin indentitylogit.ado---------------------
*! version 1.0.0 03aug2007
program define identitylogit
version 8.2
args todo eta mu return
if `todo' == -1 { /* Title */
if "$SGLM_p" == "" {
global SGLM_p 0.01
}
else {
confirm number $SGLM_p
if $SGLM_p <= 0 | $SGLM_p >= 0.5 {
di as error ///
"link parameter p must be between 0 and 0.5"
exit 459
}
}
if "$SGLM_m" == "1" {
global SGLM_lf "u -> logit(u)"
}
else {
global SGLM_lf "u/$SGLM_m -> logit(u/$SGLM_m)"
}
local a1 : di %5.3f $SGLM_p
local a2 : di %5.3f 1-$SGLM_p
global SGLM_lt "Identity (`a1',`a2')"
exit
}
tempname a b c d gp p
scalar `p' = $SGLM_p
scalar `gp' = logit(`p')
scalar `b' = `p'*(1-`p')
scalar `a' = `p' - `b'*`gp'
scalar `d' = `b'
scalar `c' = 1-`p' - `d'*logit(1-`p')
if `todo' == 0 { // eta = g(mu)
tempvar eta2
gen double `eta' = `mu'/$SGLM_m
gen double `eta2' = `eta'
replace `eta' = logit(`mu'/$SGLM_m)*`b' + `a' ///
if `eta2' <= `p'
replace `eta' = logit(`mu'/$SGLM_m)*`d' + `c' ///
if `eta2' >= 1-`p'
exit
}
if `todo' == 1 { // mu = g^-1(eta)
gen double `mu' = $SGLM_m*`eta'
replace `mu' = $SGLM_m*invlogit((`eta'-`a')/`b') ///
if `eta' <= `p'
replace `mu' = $SGLM_m*invlogit((`eta'-`c')/`d') ///
if `eta' >= 1-`p'
exit
}
if `todo' == 2 { // (d mu)/(d eta)
gen double `return' = $SGLM_m
replace `return' = (`mu'*(1-`mu'/$SGLM_m)) / `b' ///
if `eta' <= `p'
replace `return' = (`mu'*(1-`mu'/$SGLM_m)) / `d' ///
if `eta' >= 1-`p'
exit
}
if `todo' == 3 { // (d^2 mu)(d eta^2)
gen double `return' = 0
replace `return' = (`mu'*(1-`mu'/$SGLM_m)* ///
(1-2*`mu'/$SGLM_m)) / (`b'^2) ///
if `eta' <= `p'
replace `return' = (`mu'*(1-`mu'/$SGLM_m)* ///
(1-2*`mu'/$SGLM_m)) / (`d'^2) ///
if `eta' >= 1-`p'
exit
}
noi di as err "Unknown call to glm link function"
exit 198
end
---------------------------------end identitylogit.ado-----------------------
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/