Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Doug Hess <douglasrhess@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | st: interpretting the estat gof commands and Hosmer Lemeshow version of it |
Date | Sun, 18 Sep 2011 14:09:40 -0400 |
Given all the cautions in Hosmer & Lemeshow's book, I'm a bit confused as to what role and what interpretation should be given to the tests that -estat gof- produces with and with the -group- option. The results are below. Without the grouping option, Peason chi2 gives P>chi2= 0.9999. However, with groups (and the number of groups doesn't seem to matter unless you have a very large number), the Hosmer Lemeshow method gives P>chi2=0.0000. From the R manual (p.958-9) and Hosmer & Lemeshow's book (p.150 of the 2000 edition) I gather that the null hypothesis is the same for both. So, why the large difference? Is one more appropriate, or do both have problems when the outcome is somewhat rare (11 percent of observations have y=1 in my case). I see in Stata's R manual it says "However, the number of covariate patterns is close to the number of observations, making the applicability of the Pearson chi 2 test questionable but not necessarily inappropriate" (p. 958). I have roughly 140,000 observations (households) and roughly 109,000 covariate patterns. If this difference is important in deciding which of these tests to use, what is the threshold for close are far distance between number of observations and number of patterns? (It may help to know that there are only roughly 70,000 covariate patterns, half the sample size number, if I remove a half dozen continuous variables (which I am thinking of doing by collapsing them into one or two scales or factors).) If it helps, here are some additional details: My logistic model (11 percent of observations are y=1) has an optimal cutoff point for maximizing the senstivity and specifcity at 0.10, which gives approximately 75 percent for both senstivity and specifcity. The area under the ROC curve is 0.83. I'm using Stata 12. . estat gof number of observations = 143585 number of covariate patterns = 108638 Pearson chi2(108575) = 106784.16 Prob > chi2 = 0.9999 . estat gof, g(10) table number of observations = 143585 number of groups = 10 Hosmer-Lemeshow chi2(8) = 322.31 Prob > chi2 = 0.0000 Decile Pred Prob Obs y=1 Exp y=1 Total Diff % diff 1 0.019 115 190 14,359 75 65% 2 0.025 194 315 14,358 121 62% 3 0.034 305 419 14,359 114 37% 4 0.044 443 560 14,359 117 26% 5 0.055 671 704 14,361 33 5% 6 0.072 864 904 14,355 40 5% 7 0.100 1,379 1,213 14,359 166 12% 8 0.163 2,122 1,827 14,358 295 14% 9 0.302 3,615 3,207 14,359 408 11% 10 0.856 6,175 6,543 14,358 368 6% Sum= 15,883 15,883 143,585 I removed the observed and expected columns for y=0 for formatting/simplicity. The column diff is the absolute value of Obs minus Exp. The last column is that previous value as a percentage of Obs y=1. Thank you. -Doug * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/