Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: differences in -svylogitgof- results

From	Steve Samuels <[email protected]>
To	[email protected]
Subject	Re: st: differences in -svylogitgof- results
Date	Mon, 12 Aug 2013 17:18:16 -0400

Imogen:

The format of your post was rich text. The FAQ state that you "must
communicate with Statalist in plain text", so please set your mailer to
do so.

The issue that you discovered is not a problem of -svylogitgof- only,
but of the Hosmer-Lemeshow test in general, including Stata's
implementation for ordinary logistic models in -estat gof- (see code
below). The HL tests also have a reputation for low power, so I suggest
that you avoid them. Use -linktest- instead and plot estimates of the
empirical probabilities against  predicted probabilities. See
Harrell (2001, p 249) for other approaches.

Steve

Ref: Harrell, Frank E. 2001. Regression modeling strategies : with
applications to linear models, logistic regression, and survival
analysis. New York: Springer.

************* GOF TEST PROBLEMS*************
sysuse auto, clear
gen dom = 1 - foreign

logit foreign length
estat gof, table group(10)
linktest

logit dom length
estat gof, table group(10)
linktest
*****************************************

*************RECOMMENDED*************
sysuse auto, clear
gen mkr = substr(make,1,2)
svyset mkr [pw = turn]
svy: logit foreign length
predict phat
linktest
lowess foreign phat, aspect(1) ///
      addplot(function y = x)
**************CODE ENDS**************


> On Aug 12, 2013, at 1:07 AM, Imogen Jones wrote:
> 
> Hi All
>  
> I have noticed in the emails lately that some people have been having trouble with -svylogitgof-
>  
> I have been having a problem that I'm hoping somebody has already figured out an answer to...
>  
> I am using logistic regression with employment status as my dependent variable (dichotomous) and "multimorbidity" (presence of two or more chronic health conditions) as my independent variable. 
> (Multimorbidity is three levels - 0/1=No multimorbidity, 2=2 chronic health conditions, 3=3 or more chronic health conditions).
>  
> At first, I had my employment status variable coded as 0=employed 1=not employed.  This gave me this output:
>  
> (Notice the p value of –svylogitgof-)
>  
> . svy: logistic empstat0 multimorbidity
> (running logistic on estimation sample)
>  
> Survey: Logistic regression
>  
> Number of strata   =         1                  Number of obs      =      8841
> Number of PSUs     =      8841                  Population size    =  16015345
>                                                 Design df          =      8840
>                                                 F(   1,   8840)    =    189.94
>                                                 Prob > F           =    0.0000
>  
> --------------------------------------------------------------------------------
>                |             Linearized
>       empstat0 | Odds Ratio   Std. Err.      t    P>|t|     [95% Conf. Interval]
> ---------------+----------------------------------------------------------------
> multimorbidity |   3.491233   .3167115    13.78   0.000     2.922472    4.170683
>          _cons |   .4568083   .0154654   -23.14   0.000     .4274766    .4881526
> --------------------------------------------------------------------------------
>  
> . svylogitgof
>    Number of observations =                            8841
> F-adjusted test statistic = F(1,8840) =               0.172
>                  Prob > F =                           0.679
>  
>  
>  
> Then when I reversed the coding for my employment status variable to 1=employed 0=not employed, the –svylogitgof- result changed:
>  
> . svy: logistic empstat multimorbidity
> (running logistic on estimation sample)
>  
> Survey: Logistic regression
>  
> Number of strata   =         1                  Number of obs      =      8841
> Number of PSUs     =      8841                  Population size    =  16015345
>                                                 Design df          =      8840
>                                                 F(   1,   8840)    =    189.94
>                                                 Prob > F           =    0.0000
>  
> --------------------------------------------------------------------------------
>                |             Linearized
>        empstat | Odds Ratio   Std. Err.      t    P>|t|     [95% Conf. Interval]
> ---------------+----------------------------------------------------------------
> multimorbidity |   .2864318    .025984   -13.78   0.000     .2397689     .342176
>          _cons |   2.189102   .0741126    23.14   0.000      2.04854    2.339309
> --------------------------------------------------------------------------------
>  
> . svylogitgof
>    Number of observations =                            8841
> F-adjusted test statistic = F(1,8840) =               0.000
>                  Prob > F =                           1.000
>  
> Can anybody tell me why this happens, and more specifically what it means in this instance to get a p=1.0?  Does it simply mean the model is a bad fit, or is there something else going on?
>  
> Also, when I changed the syntax to show the levels of multimorbidity, this happened:
> . svy: logistic empstat0 i.multimorbidity
> (running logistic on estimation sample)
>  
> Survey: Logistic regression
>  
> Number of strata   =         1                  Number of obs      =      8841
> Number of PSUs     =      8841                  Population size    =  16015345
>                                                 Design df          =      8840
>                                                 F(   2,   8839)    =    110.26
>                                                 Prob > F           =    0.0000
>  
> ----------------------------------------------------------------------------------------------
>                              |             Linearized
>                     empstat0 | Odds Ratio   Std. Err.      t    P>|t|     [95% Conf. Interval]
> -----------------------------+----------------------------------------------------------------
>               multimorbidity |
>         2 Chronic Illnesses  |   4.026233   .4977376    11.27   0.000     3.159772     5.13029
> 3 or more chronic illnesses  |   8.237624   1.707088    10.18   0.000     5.487604    12.36577
>                              |
>                        _cons |   .4539598   .0154738   -23.17   0.000     .4246188    .4853283
> ----------------------------------------------------------------------------------------------
>  
> . svylogitgof
>    Number of observations =                            8841
> F-adjusted test statistic = F(1,8840) =               0.000
>                 Prob > F =                           1.000
>  
> Does –svylogitgof- not allow for polychotomous IV’s?
>  
> Thanks in advance!
>  
> Imogen Jones
> 

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: differences in -svylogitgof- results
  - From: Imogen Jones <[email protected]>

Prev by Date: st: Spatial VAR with time component
Next by Date: Re: st: Spatial VAR with time component
Previous by thread: st: differences in -svylogitgof- results
Next by thread: Re: st: differences in -svylogitgof- results
Index(es):
- Date
- Thread