Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: estat gof (Hosmer & Lemeshow) after svy:logistic (survey)
From
Steve Samuels <[email protected]>
To
[email protected]
Subject
Re: st: estat gof (Hosmer & Lemeshow) after svy:logistic (survey)
Date
Mon, 22 Jul 2013 13:21:54 -0400
Ángel, you wrote:
"The problem is that estat gof results are very different if I use svy
and I don't use it, even when the models are not that different."
Why is this a problem? If you use non-survey logistic on survey data,
you are probably ignoring the clustering and stratification. Standard
errors and p-values will different (and wrong) for all estimates and
tests, not just the test result from -estat gof-.
In Stata 13, -estat gof- works after -svy: logit-, and you appear to
have used it after-svy:logit- in Stata 12 (not on my machine any more).
Otherwise, try the contributed command -svylogitgof- ("findit").
. sysuse auto, clear
. gen mkr= substr(make,1,2)
. svyset mkr [pw = turn]
. svy: logit foreign length if mpg>15
. estat gof
. svylogitgof
You can also get ROCs with Roger Newson's
-somersd- and -senspec- (SSC). See:
http://www.stata.com/statalist/archive/2009-01/msg00689.html
Steve
On Jul 22, 2013, at 5:21 AM, Ángel Rodríguez Laso wrote:
Dear Steve & Tim,
Tim is right. The 12 version manual states that estat gof is not
appropiate after svy. I was told that it was in this Stata version.
In this case I did as the archives Steve recommended say: I used if
instead of subpop. The problem is that estat gof results are very
different if I use svy and I don't use it, even when the models are
not that different.
Maybe the reason is that using estat gof after svy is not correct.
Would it be a correct alternative to check for the goodness of fit
after svy this procedure from Korn & Graubard, Analysis of Health
Surveys 1999 John Wiley& Sons New York, p 106:
svy: logistic
predict p
xtile decile = p [pweight=w], nq(10)
bysort decile: egen sumw=sum(w)
gen pw=p*w
bysort decile: egen sumpw=sum(pw)
gen meanpw=sumpw/sumw
gen ow=vardep*w
bysort decile: egen sumow=sum(ow)
gen meanow=sumow/sumw
gen difmean=meanpw-meanow
bysort decile: gen percentil=_n
list meanpw meanow difmean if percentil==1
Thank you very much.
Angel Rodriguez-Laso
2013/7/18 Steve Samuels <[email protected]>:
> See: http://www.stata.com/statalist/archive/2011-03/msg00550.html
>
> Steve
> [email protected]
>
> On Jul 17, 2013, at 5:23 AM, Ángel Rodríguez Laso wrote:
>
> Dear Statalisters,
>
> Working with Stata 12.1.
>
>
> If I carry out the following logistic regression in a survey setting
> and then type estat gof I get:
>
>
> . svy, subpop(if disdesjub==1 & disdestr==1 & trab==1 & dismy50==1 &
> proxy==2 & edad_c>=60): logistic discAVD edad_c i.sexo i. estud4
> i.difinmes3
> (running logistic on estimation sample)
>
> Survey: Logistic regression
>
> Number of strata = 41 Number of obs = 1727
> Number of PSUs = 234 Population size = 1347,0862
> Subpop. no. of obs = 710
> Subpop. size = 563,75
> Design df = 193
> F( 7, 187) = 8,32
> Prob > F = 0,0000
>
> ------------------------------------------------------------------------------
> | Linearized
> discAVD | Odds Ratio Std. Err. t P>|t| [95% Conf. Interval]
> -------------+----------------------------------------------------------------
> edad_c | 1,10 0,02 4,42 0,000 1,05 1,15
> |
> sexo |
> 1 | 1,00 (base)
> 2 | 2,60 0,82 3,02 0,003 1,39 4,84
> |
> estud4 |
> 0 | 1,00 (base)
> 1 | 0,87 0,32 -0,38 0,704 0,43 1,78
> 2 | 0,90 0,40 -0,24 0,807 0,37 2,16
> 3 | 0,60 0,27 -1,14 0,257 0,24 1,47
> |
> difinmes3 |
> 0 | 1,00 (base)
> 1 | 1,59 0,57 1,31 0,190 0,79 3,21
> 2 | 3,33 1,20 3,35 0,001 1,64 6,77
> |
> _cons | 0,00 0,00 -5,88 0,000 0,00 0,00
> ------------------------------------------------------------------------------
>
> .
> end of do-file
>
> . estat gof
> estat gof is not allowed after subpopulation estimations
> r(198);
>
>
>
> Then I change if statements for my subpopulation especifications:
>
>
> . svy: logistic discAVD edad_c i.sexo i.estud4 i.difinmes3 if
> disdesjub==1 & disdestr==1 & trab==1 & dismy50==1 & proxy==2 &
> edad_c>=60
> (running logistic on estimation sample)
>
> Survey: Logistic regression
>
> Number of strata = 41 Number of obs = 710
> Number of PSUs = 193 Population size = 563,75
> Design df = 152
> F( 7, 146) = 8,35
> Prob > F = 0,0000
>
> ------------------------------------------------------------------------------
> | Linearized
> discAVD | Odds Ratio Std. Err. t P>|t| [95% Conf. Interval]
> -------------+----------------------------------------------------------------
> edad_c | 1,10 0,02 4,41 0,000 1,05 1,15
> |
> sexo |
> 1 | 1,00 (base)
> 2 | 2,60 0,82 3,02 0,003 1,39 4,85
> |
> estud4 |
> 0 | 1,00 (base)
> 1 | 0,87 0,32 -0,38 0,707 0,42 1,79
> 2 | 0,90 0,40 -0,25 0,807 0,37 2,16
> 3 | 0,60 0,27 -1,15 0,254 0,24 1,46
> |
> difinmes3 |
> 0 | 1,00 (base)
> 1 | 1,59 0,56 1,32 0,189 0,79 3,21
> 2 | 3,33 1,18 3,39 0,001 1,65 6,72
> |
> _cons | 0,00 0,00 -5,88 0,000 0,00 0,00
> ------------------------------------------------------------------------------
>
> . estat gof
>
> Logistic model for discAVD, goodness-of-fit test
>
> F(9,144) = 110,29
> Prob > F = 0,0000
>
>
>
> But if I get rid of the survey especifications, I get:
>
> . logistic discAVD edad_c i.sexo i.estud4 i.difinmes3 if disdesjub==1
> & disdestr==1 & trab==1 & dismy50==1 & proxy==2 & edad_c>=60
>
> Logistic regression Number of obs = 710
> LR chi2(7) = 65,87
> Prob > chi2 = 0,0000
> Log likelihood = -210,78135 Pseudo R2 = 0,1351
>
> ------------------------------------------------------------------------------
> discAVD | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
> -------------+----------------------------------------------------------------
> edad_c | 1,10 0,02 5,28 0,000 1,06 1,14
> |
> sexo |
> 1 | 1,00 (base)
> 2 | 1,96 0,56 2,36 0,018 1,12 3,44
> |
> estud4 |
> 0 | 1,00 (base)
> 1 | 0,87 0,29 -0,42 0,676 0,45 1,69
> 2 | 0,88 0,40 -0,28 0,781 0,36 2,14
> 3 | 0,52 0,25 -1,37 0,170 0,21 1,32
> |
> difinmes3 |
> 0 | 1,00 (base)
> 1 | 1,89 0,61 1,97 0,049 1,00 3,57
> 2 | 3,84 1,39 3,70 0,000 1,88 7,83
> |
> _cons | 0,00 0,00 -7,01 0,000 0,00 0,00
> ------------------------------------------------------------------------------
>
> . estat gof
>
> Logistic model for discAVD, goodness-of-fit test
>
> number of observations = 710
> number of covariate patterns = 350
> Pearson chi2(342) = 328,89
> Prob > chi2 = 0,6852
>
>
> The last two models don't look terribly different, so what is the
> reason for a such a large change in the Hosmer&Lemeshow result? Which
> one should I trust?
>
> Thank you for your time and attention.
>
> Angel Rodriguez-Laso
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/