RE: st: Binomial regression

Date   Sun, 5 Aug 2007 17:15:53 +0100

Two suggestions re binomial regression:

Suggestion 1: For what it's worth, confidence intervals for risk differences (in some cases) can be reported using the -somersd- package, downloadable from SSC using the -ssc- command. Given 2 binary (0,1) variables x and y, the user can type

somersd x y, transf(z) tdist

and get a confidence interval for the risk difference

Pr(y==1|x==1) - Pr(y==1|x==0)

This method has the advantage (compared to -binreg-, -glm- etc) of using the Normalizing and variance-stabilizing hyperbolic arctangent or z-transformation, recommended by Edwardes (1995) for the general Somers' D for binary X-variates (including the special case where the y-variate is also binary).

If there is a categorical confounding variable w, then the user can type

somersd x y, transf(z) tdist wstrata(w)

and get a confidence interval for a within-strata risk difference for pairs of observations with the same value of w. The user can alternatively specify multiple categorical confounding w-variables, and/or w-variables which specify propensity groups based on a propensity score for x==1 calculated from multiple confounding variables.

Suggestion 2: To output confidence intervals for baseline odds with confidence intervals for odds ratios, the user can specify a baseline variate of ones, and then enter it into the model with the -noconst- option. For instance, the user can type:

gene byte baseline=1
logit y baseline x, noconst or

This trick can also be used with geometric means and their ratios. See Newson (2003).

I hope this helps.



Edwardes, M. D. d. B. 1995. A confidence interval for Pr(X < Y) − Pr(X > Y) estimated from simple cluster samples. Biometrics 51: 571–578.

Newson R. 2003. Stata tip 1: The eform() option of regress. The Stata Journal 3(4): 445. Download post-publication update from

Sent: 04 August 2007 07:47
--- Marcello Pagano <[email protected]> wrote:
> I agree wholeheartedly that the risk difference is sometimes
> preferable to the odds ratio. Witness what is currently going on with
> the attack on Avandia.  Rather than report a risk difference of 0.2% 
> in the MI rate, we are faced with a risk INCREASE of 40% -- the 
> effect of going from 0.5% to 0.7%.  If reported as a risk difference
> it would probably not have made the headlines it has nor created the 
> furor it has.

At this point I think that there is room for improvement in Stata
output. When reporting odds ratios after -logit-, Stata will not report
the baseline odds (-exp(_cons)-), So Stata reports that the odds
increased with 40%, but not that the baseline odds is .005 (at these
low probabilities risks and odds are almost the same). I would like to
see the baseline odds and the odds ratios, because both give very
useful information about the size of the effect, as Marcello's
example illustrates.

