[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: Binomial regression

From	[email protected] (Roberto G. Gutierrez, StataCorp LP)
To	[email protected]
Subject	Re: st: Binomial regression
Date	Fri, 03 Aug 2007 16:23:53 -0500
Vince Wiggins and I responded on this thread yesterday, in which one of our
main points was that forcing convergence when an identity link may not be
appropriate results in a loss of model interpretation.  This is a case where
not only is there no free lunch, but the exact price of that lunch is unclear.

We mentioned various ways that convergence could be achieved.  One method I
did not mention is the result of similar work I have done working with the log
link function.  This method carries with it an ease of interpretation (akin to
knowing the price of your lunch) and involves using a modified version of the
identity link, something I call the "Identity/logit" link.

The Identity/Logit link is a simple modification of the standard identity 
link that turns into a logit link outside a prespecified range, say outside
(0.01, 0.99), the default.  Inside this range, you have the identity function;
outside you have a logit function translated and scaled so as to be continuous
and have continuous derivatives at the endpoints.

Since the link function turns to logit at the endpoints, the range of the
identity/logit inverse link is within (0,1), meaning that you won't get
inadmissible predicted probabilities and convergence can be achieved over a
much wider variety of datasets.  The only drawback is one of interpretation,
however, the interpretation here is clear:  When the predicted probability is
in the range (0.01, 0.99), your coefficients reflect risk differences,
otherwise, they are (rescaled) log(odds ratio)'s.  In other words, you need
only qualify your treatment of risk differences to state that they are valid
for risks in the range (0.01, 0.99) rather than (0,1).  To me this seems
better than trying to interpret the fruits of an ad hoc adjustment.

I attach the code for -identitylogit- below.  A simple demonstration

. sysuse auto

. glm for price mpg, fam(bin) link(identity)

  < no convergence >

. glm for price mpg, fam(bin) link(identitylogit) nolog

Generalized linear models                          No. of obs      =        74
Optimization     : ML                              Residual df     =        71
                                                   Scale parameter =         1
Deviance         =  70.50849514                    (1/df) Deviance =  .9930774
Pearson          =  68.72329876                    (1/df) Pearson  =  .9679338

Variance function: V(u) = u*(1-u)                  [Bernoulli]
Link function    : g(u) = u -> logit(u)            [Identity (0.010,0.990)]

                                                   AIC             =  1.033899
Log likelihood   = -35.25424757                    BIC             = -235.0801

------------------------------------------------------------------------------
             |                 OIM
     foreign |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       price |   .0000516   .0000157     3.29   0.001     .0000209    .0000823
         mpg |   .0471231   .0093466     5.04   0.000     .0288041    .0654421
       _cons |  -1.046464   .2299337    -4.55   0.000    -1.497126   -.5958023
------------------------------------------------------------------------------

The identity/logit link also accepts an optional argument, p, for controlling
the range of the identity part of this link.  For example,

. glm for price mpg, fam(bin) link(identitylogit 0.05)

would give a valid range for risk differences of (0.05, 0.95) rather than 
the default (0.01, 0.99).

--Bobby					
[email protected]			

--------------------------------begin indentitylogit.ado---------------------
*! version 1.0.0   03aug2007
program define identitylogit		
	version 8.2 
	args todo eta mu return

        if `todo' == -1 {                       /* Title */
		if "$SGLM_p" == "" {
			global SGLM_p 0.01
		}
		else {
			confirm number $SGLM_p
			if $SGLM_p <= 0 | $SGLM_p >= 0.5 {
				di as error ///
			            "link parameter p must be between 0 and 0.5"
				exit 459
			}
		}
		if "$SGLM_m" == "1" {
			global SGLM_lf "u -> logit(u)"
		}
		else    {
			global SGLM_lf "u/$SGLM_m -> logit(u/$SGLM_m)"
		}
		local a1 : di %5.3f $SGLM_p
		local a2 : di %5.3f 1-$SGLM_p
                global SGLM_lt "Identity (`a1',`a2')"
                exit
        }

	tempname a b c d gp p
	scalar `p' = $SGLM_p
	scalar `gp' = logit(`p')
        scalar `b' = `p'*(1-`p')
        scalar `a' = `p' - `b'*`gp'
        scalar `d' = `b'
        scalar `c' = 1-`p' - `d'*logit(1-`p')

	if `todo' == 0 {			// eta = g(mu) 
		tempvar eta2
		gen double `eta' = `mu'/$SGLM_m
		gen double `eta2' = `eta'
		replace `eta' = logit(`mu'/$SGLM_m)*`b' + `a'	///
		 	  	if `eta2' <= `p'
		replace `eta' = logit(`mu'/$SGLM_m)*`d' + `c'	///
		 	  	if `eta2' >= 1-`p'
		exit 
	}
	if `todo' == 1 {			// mu = g^-1(eta) 
		gen double `mu' = $SGLM_m*`eta'
		replace `mu' = $SGLM_m*invlogit((`eta'-`a')/`b') ///
		 	  	if `eta' <= `p'
		replace `mu' = $SGLM_m*invlogit((`eta'-`c')/`d') ///
		 	  	if `eta' >= 1-`p'
		exit 
	}
	if `todo' == 2 {			// (d mu)/(d eta) 
		gen double `return' = $SGLM_m
		replace `return' = (`mu'*(1-`mu'/$SGLM_m)) / `b' ///
		 	  	if `eta' <= `p'
		replace `return' = (`mu'*(1-`mu'/$SGLM_m)) / `d' ///
		 	  	if `eta' >= 1-`p'
		exit 
	}
	if `todo' == 3 {			// (d^2 mu)(d eta^2) 
		gen double `return' = 0
		replace `return' = (`mu'*(1-`mu'/$SGLM_m)* ///
			   (1-2*`mu'/$SGLM_m)) / (`b'^2)   ///
			   if `eta' <= `p'
		replace `return' = (`mu'*(1-`mu'/$SGLM_m)* ///
			   (1-2*`mu'/$SGLM_m)) / (`d'^2)   ///
			   if `eta' >= 1-`p'
		exit
	}
	noi di as err "Unknown call to glm link function"
	exit 198
end
---------------------------------end identitylogit.ado-----------------------
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Prev by Date: Re: st: Binomial regression
Next by Date: st: Constrained ARCH in STATA
Previous by thread: Re: st: Binomial regression
Next by thread: Re: st: Binomial regression
Index(es):
- Date
- Thread