Responding to a question posted by Scott Millis, Nick Cox wrote:
---------------------------begin posting----------------------------------------
Scott Millis asked
> Collett discusses the L-statistic in "Modelling Binary Data." Is it
> possible to calculate this in Stata?
These statistics, which occur as a set, appear to be the differences
between the squares of the likelihood residuals calculated from
two logit models, one of whose terms is a subset of the other's.
So one recipe is something like
glm <model 1>, link(logit) family(binomial)
predict r1, likelihood
glm <model 2>, link(logit) family(binomial)
predict r2, likelihood
gen li = r1^2 - r2^2
where for <model 1> and <model 2> you
must plug in the varlists of the two models.
However, in trying to reproduce the worked
example in Collett's book (1/e; the 2/e I do not have
access to), I find one anomaly. In one case
a likelihood residual is reported by Stata
as missing (I guess wildly that the problem
is a request to take the square root of a negative
number). Collett's results for that case imply
that he replaced such a residual by zero.
Perhaps someone can illuminate this. For
all I know it may be standard practice or justifiable.
The data can be downloaded from David Collett's website
(he works at the University of Reading in the
UK).
-----------------------------end posting----------------------------------------
I don't know how much illumination the following provides, but I have the second edition
and it has two examples of the L-statistic, one in using "Artificial data-1" (Table 5.10 on
Page 162) and the other using "Determination of the ESR" (Table 5.15 on Page 181).
Analysis of the latter dataset does produce an anomaly that Nick describes.
According to the book (Page 132), likelihood residuals are weighted averages of the
deviance and Pearson residuals, and it seems that the reason for the missing value
from Stata in this example is that the Pearson residual is missing for this observation
for the full (saturated) model, which includes both linear and quadratic terms. (See the
do-file and and log file below.) The Pearson residual is missing because the predicted
proportion is exactly 1 for this observation and that causes a division-by-zero error
when calculating the Pearson residual.
The author, David Collett, plots Pearson residuals for the reduced model (which doesn't
produce a missing residual or a predicted proportion of 1) in Figure 5.20 (Page 175),
but he switches to deviance residuals when discussing the full, quadratic model (the
one that produces a missing Pearson residual) in Figure 5.21 on the next page. In that
figure, the deviance residual appears to be zero, which would indicate that the author
got similar results to what we get for the Pearson residual.
Like Nick, I couldn't find any discussion of what standard practice is when such
residuals arise, but it does look like the author replaced the missing Pearson residual by
zero which, with the zero value for the corresponding deviance residual, would give
zero for the likelihood residual, their weighted average.
In comparing the Pearson deviance reported in the header of -glm-'s output and the
sum of the squared Pearson residuals, treating the missing-value residual as effectively
zero seems to be what Stata does here, too.
Joseph Coveney
--------------------------------------------------------------------------------
clear
set more off
tempfile tmp
insheet using "Determination of the ESR.dat"
replace v1 = subinstr(subinstr(v1, " ", " ", .), " ", ",", .)
outsheet using `tmp', comma nonames noquote
insheet using `tmp', comma names clear
erase `tmp'
// Creating squared fibrinogen levels
generate float fib2 = fib * fib
// Reduced model (intercept and fibrinogen)
glm y fib, family(binomial) link(logit) nolog
predict r1, likelihood
// Quadratic model (intercept, fibrinogen and fibrinogen squared)
glm y fib fib2, family(binomial) link(logit) nolog
scalar pearson_deviance = e(deviance_p)
predict r2, likelihood
// Creating L-statistics for each observation
generate float L = r1^2 - r2^2
display r1[13]^2 - 0^2
// Creating predicted proportion for full quadratic model
predict pi2, mu
predict d2, deviance
predict p2, pearson
display pi2[13] // is exactly 1
display pi2[29] // is not exactly 1
// Comparing Pearson Deviance with sum of squared Pearson residuals
generate double p22 = p2 * p2
summarize p22, meanonly
display r(sum) - scalar(pearson_deviance)
format r* L d* p* %6.3f
format fib %4.2f
format y %1.0f
capture log close
capture erase table5_15.log
log using table5_15
clist fib y r1-p2
log close
exit
--------------------------------------------------------------------------------
Edited log is below
--------------------------------------------------------------------------------
fib y r1 r2 L pi2 d2 p2
1. 2.52 0 -0.455 -0.181 0.174 0.016 -0.179 -0.127
2. 2.56 0 -0.471 -0.166 0.194 0.013 -0.164 -0.116
3. 2.19 0 -0.340 -0.687 -0.356 0.197 -0.663 -0.496
4. 2.18 0 -0.337 -0.727 -0.415 0.216 -0.699 -0.526
5. 3.41 0 -0.957 -1.179 -0.473 0.426 -1.055 -0.862
6. 2.46 0 -0.432 -0.213 0.141 0.022 -0.211 -0.150
7. 3.22 0 -0.819 -0.411 0.502 0.077 -0.401 -0.289
8. 2.21 0 -0.346 -0.615 -0.258 0.164 -0.598 -0.442
9. 3.15 0 -0.773 -0.303 0.507 0.043 -0.297 -0.213
10. 2.60 0 -0.487 -0.154 0.214 0.012 -0.153 -0.108
11. 2.29 0 -0.372 -0.411 -0.030 0.078 -0.404 -0.292
12. 2.35 0 -0.392 -0.316 0.054 0.047 -0.312 -0.223
13. 5.06 1 0.457 . . 1.000 0.000 .
14. 3.34 1 1.560 1.878 -1.092 0.234 1.705 1.810
15. 2.38 1 2.384 2.797 -2.138 0.038 2.562 5.059
16. 3.15 0 -0.773 -0.303 0.507 0.043 -0.297 -0.213
17. 3.53 1 1.418 0.759 1.434 0.812 0.645 0.481
18. 2.68 0 -0.522 -0.140 0.253 0.010 -0.139 -0.099
19. 2.60 0 -0.487 -0.154 0.214 0.012 -0.153 -0.108
20. 2.23 0 -0.353 -0.552 -0.181 0.136 -0.540 -0.396
21. 2.88 0 -0.618 -0.147 0.360 0.011 -0.146 -0.104
22. 2.65 0 -0.509 -0.144 0.238 0.010 -0.143 -0.101
23. 2.09 1 2.658 1.484 4.865 0.466 1.236 1.071
24. 2.28 0 -0.368 -0.430 -0.050 0.086 -0.423 -0.306
25. 2.67 0 -0.517 -0.141 0.248 0.010 -0.140 -0.099
26. 2.29 0 -0.372 -0.411 -0.030 0.078 -0.404 -0.292
27. 2.15 0 -0.328 -0.872 -0.652 0.285 -0.818 -0.631
28. 2.54 0 -0.463 -0.173 0.184 0.015 -0.171 -0.122
29. 3.93 1 1.135 0.012 1.288 1.000 0.012 0.009
30. 3.34 0 -0.904 -0.768 0.227 0.234 -0.730 -0.552
31. 2.99 0 -0.678 -0.181 0.426 0.016 -0.179 -0.127
32. 3.32 0 -0.889 -0.686 0.320 0.194 -0.657 -0.491
--------------------------------------------------------------------------------
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/