Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: differences in -svylogitgof- results
From
Steve Samuels <[email protected]>
To
[email protected]
Subject
Re: st: differences in -svylogitgof- results
Date
Mon, 12 Aug 2013 17:18:16 -0400
Imogen:
The format of your post was rich text. The FAQ state that you "must
communicate with Statalist in plain text", so please set your mailer to
do so.
The issue that you discovered is not a problem of -svylogitgof- only,
but of the Hosmer-Lemeshow test in general, including Stata's
implementation for ordinary logistic models in -estat gof- (see code
below). The HL tests also have a reputation for low power, so I suggest
that you avoid them. Use -linktest- instead and plot estimates of the
empirical probabilities against predicted probabilities. See
Harrell (2001, p 249) for other approaches.
Steve
Ref: Harrell, Frank E. 2001. Regression modeling strategies : with
applications to linear models, logistic regression, and survival
analysis. New York: Springer.
************* GOF TEST PROBLEMS*************
sysuse auto, clear
gen dom = 1 - foreign
logit foreign length
estat gof, table group(10)
linktest
logit dom length
estat gof, table group(10)
linktest
*****************************************
*************RECOMMENDED*************
sysuse auto, clear
gen mkr = substr(make,1,2)
svyset mkr [pw = turn]
svy: logit foreign length
predict phat
linktest
lowess foreign phat, aspect(1) ///
addplot(function y = x)
**************CODE ENDS**************
> On Aug 12, 2013, at 1:07 AM, Imogen Jones wrote:
>
> Hi All
>
> I have noticed in the emails lately that some people have been having trouble with -svylogitgof-
>
> I have been having a problem that I'm hoping somebody has already figured out an answer to...
>
> I am using logistic regression with employment status as my dependent variable (dichotomous) and "multimorbidity" (presence of two or more chronic health conditions) as my independent variable.
> (Multimorbidity is three levels - 0/1=No multimorbidity, 2=2 chronic health conditions, 3=3 or more chronic health conditions).
>
> At first, I had my employment status variable coded as 0=employed 1=not employed. This gave me this output:
>
> (Notice the p value of –svylogitgof-)
>
> . svy: logistic empstat0 multimorbidity
> (running logistic on estimation sample)
>
> Survey: Logistic regression
>
> Number of strata = 1 Number of obs = 8841
> Number of PSUs = 8841 Population size = 16015345
> Design df = 8840
> F( 1, 8840) = 189.94
> Prob > F = 0.0000
>
> --------------------------------------------------------------------------------
> | Linearized
> empstat0 | Odds Ratio Std. Err. t P>|t| [95% Conf. Interval]
> ---------------+----------------------------------------------------------------
> multimorbidity | 3.491233 .3167115 13.78 0.000 2.922472 4.170683
> _cons | .4568083 .0154654 -23.14 0.000 .4274766 .4881526
> --------------------------------------------------------------------------------
>
> . svylogitgof
> Number of observations = 8841
> F-adjusted test statistic = F(1,8840) = 0.172
> Prob > F = 0.679
>
>
>
> Then when I reversed the coding for my employment status variable to 1=employed 0=not employed, the –svylogitgof- result changed:
>
> . svy: logistic empstat multimorbidity
> (running logistic on estimation sample)
>
> Survey: Logistic regression
>
> Number of strata = 1 Number of obs = 8841
> Number of PSUs = 8841 Population size = 16015345
> Design df = 8840
> F( 1, 8840) = 189.94
> Prob > F = 0.0000
>
> --------------------------------------------------------------------------------
> | Linearized
> empstat | Odds Ratio Std. Err. t P>|t| [95% Conf. Interval]
> ---------------+----------------------------------------------------------------
> multimorbidity | .2864318 .025984 -13.78 0.000 .2397689 .342176
> _cons | 2.189102 .0741126 23.14 0.000 2.04854 2.339309
> --------------------------------------------------------------------------------
>
> . svylogitgof
> Number of observations = 8841
> F-adjusted test statistic = F(1,8840) = 0.000
> Prob > F = 1.000
>
> Can anybody tell me why this happens, and more specifically what it means in this instance to get a p=1.0? Does it simply mean the model is a bad fit, or is there something else going on?
>
> Also, when I changed the syntax to show the levels of multimorbidity, this happened:
> . svy: logistic empstat0 i.multimorbidity
> (running logistic on estimation sample)
>
> Survey: Logistic regression
>
> Number of strata = 1 Number of obs = 8841
> Number of PSUs = 8841 Population size = 16015345
> Design df = 8840
> F( 2, 8839) = 110.26
> Prob > F = 0.0000
>
> ----------------------------------------------------------------------------------------------
> | Linearized
> empstat0 | Odds Ratio Std. Err. t P>|t| [95% Conf. Interval]
> -----------------------------+----------------------------------------------------------------
> multimorbidity |
> 2 Chronic Illnesses | 4.026233 .4977376 11.27 0.000 3.159772 5.13029
> 3 or more chronic illnesses | 8.237624 1.707088 10.18 0.000 5.487604 12.36577
> |
> _cons | .4539598 .0154738 -23.17 0.000 .4246188 .4853283
> ----------------------------------------------------------------------------------------------
>
> . svylogitgof
> Number of observations = 8841
> F-adjusted test statistic = F(1,8840) = 0.000
> Prob > F = 1.000
>
> Does –svylogitgof- not allow for polychotomous IV’s?
>
> Thanks in advance!
>
> Imogen Jones
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/