Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: estat gof (Hosmer & Lemeshow) after svy:logistic (survey)
From
Steve Samuels <[email protected]>
To
[email protected]
Subject
Re: st: estat gof (Hosmer & Lemeshow) after svy:logistic (survey)
Date
Wed, 17 Jul 2013 18:05:24 -0500
See: http://www.stata.com/statalist/archive/2011-03/msg00550.html
Steve
[email protected]
On Jul 17, 2013, at 5:23 AM, Ángel Rodríguez Laso wrote:
Dear Statalisters,
Working with Stata 12.1.
If I carry out the following logistic regression in a survey setting
and then type estat gof I get:
. svy, subpop(if disdesjub==1 & disdestr==1 & trab==1 & dismy50==1 &
proxy==2 & edad_c>=60): logistic discAVD edad_c i.sexo i. estud4
i.difinmes3
(running logistic on estimation sample)
Survey: Logistic regression
Number of strata = 41 Number of obs = 1727
Number of PSUs = 234 Population size = 1347,0862
Subpop. no. of obs = 710
Subpop. size = 563,75
Design df = 193
F( 7, 187) = 8,32
Prob > F = 0,0000
------------------------------------------------------------------------------
| Linearized
discAVD | Odds Ratio Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
edad_c | 1,10 0,02 4,42 0,000 1,05 1,15
|
sexo |
1 | 1,00 (base)
2 | 2,60 0,82 3,02 0,003 1,39 4,84
|
estud4 |
0 | 1,00 (base)
1 | 0,87 0,32 -0,38 0,704 0,43 1,78
2 | 0,90 0,40 -0,24 0,807 0,37 2,16
3 | 0,60 0,27 -1,14 0,257 0,24 1,47
|
difinmes3 |
0 | 1,00 (base)
1 | 1,59 0,57 1,31 0,190 0,79 3,21
2 | 3,33 1,20 3,35 0,001 1,64 6,77
|
_cons | 0,00 0,00 -5,88 0,000 0,00 0,00
------------------------------------------------------------------------------
.
end of do-file
. estat gof
estat gof is not allowed after subpopulation estimations
r(198);
Then I change if statements for my subpopulation especifications:
. svy: logistic discAVD edad_c i.sexo i.estud4 i.difinmes3 if
disdesjub==1 & disdestr==1 & trab==1 & dismy50==1 & proxy==2 &
edad_c>=60
(running logistic on estimation sample)
Survey: Logistic regression
Number of strata = 41 Number of obs = 710
Number of PSUs = 193 Population size = 563,75
Design df = 152
F( 7, 146) = 8,35
Prob > F = 0,0000
------------------------------------------------------------------------------
| Linearized
discAVD | Odds Ratio Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
edad_c | 1,10 0,02 4,41 0,000 1,05 1,15
|
sexo |
1 | 1,00 (base)
2 | 2,60 0,82 3,02 0,003 1,39 4,85
|
estud4 |
0 | 1,00 (base)
1 | 0,87 0,32 -0,38 0,707 0,42 1,79
2 | 0,90 0,40 -0,25 0,807 0,37 2,16
3 | 0,60 0,27 -1,15 0,254 0,24 1,46
|
difinmes3 |
0 | 1,00 (base)
1 | 1,59 0,56 1,32 0,189 0,79 3,21
2 | 3,33 1,18 3,39 0,001 1,65 6,72
|
_cons | 0,00 0,00 -5,88 0,000 0,00 0,00
------------------------------------------------------------------------------
. estat gof
Logistic model for discAVD, goodness-of-fit test
F(9,144) = 110,29
Prob > F = 0,0000
But if I get rid of the survey especifications, I get:
. logistic discAVD edad_c i.sexo i.estud4 i.difinmes3 if disdesjub==1
& disdestr==1 & trab==1 & dismy50==1 & proxy==2 & edad_c>=60
Logistic regression Number of obs = 710
LR chi2(7) = 65,87
Prob > chi2 = 0,0000
Log likelihood = -210,78135 Pseudo R2 = 0,1351
------------------------------------------------------------------------------
discAVD | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
edad_c | 1,10 0,02 5,28 0,000 1,06 1,14
|
sexo |
1 | 1,00 (base)
2 | 1,96 0,56 2,36 0,018 1,12 3,44
|
estud4 |
0 | 1,00 (base)
1 | 0,87 0,29 -0,42 0,676 0,45 1,69
2 | 0,88 0,40 -0,28 0,781 0,36 2,14
3 | 0,52 0,25 -1,37 0,170 0,21 1,32
|
difinmes3 |
0 | 1,00 (base)
1 | 1,89 0,61 1,97 0,049 1,00 3,57
2 | 3,84 1,39 3,70 0,000 1,88 7,83
|
_cons | 0,00 0,00 -7,01 0,000 0,00 0,00
------------------------------------------------------------------------------
. estat gof
Logistic model for discAVD, goodness-of-fit test
number of observations = 710
number of covariate patterns = 350
Pearson chi2(342) = 328,89
Prob > chi2 = 0,6852
The last two models don't look terribly different, so what is the
reason for a such a large change in the Hosmer&Lemeshow result? Which
one should I trust?
Thank you for your time and attention.
Angel Rodriguez-Laso
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/