Ashwin:
Actually I am pretty sceptical about tests of fit. A perfect fitting
model is just the datamatrix. We use a statistical model because it is
too hard to read or draw conclussions from the raw data. If your test
says that your model is not significantly different from a perfect fit,
than your model is either too complex or the patern in the data is so
obvious that no model is needed.
My very first post on the statalist involved this same issue and also
contains some comments on the Hosmer Lemeshow test:
http://www.stata.com/statalist/archive/2004-09/msg00533.html .
More recently there was quite a long thread on the dangers of stepwise
regression:
http://www.stata.com/statalist/archive/2006-09/msg00017.html .
HTH,
Maarten
--- "Feiveson, Alan H. (JSC-SK311)" <[email protected]> wrote:
> Ashwin - Logistic regression assumes a particular functional form for
> the probability of "success", say P(S), as a function of the
> explanatory variables. Like any statistical model, there is no
> guarantee
> that the logistic regression model exactly applies to real-world
> data.
> At best it may be an approximate representation. It follows that with
> large sample sizes any discrepancy between the model and the data
> will
> be magnified, resulting in small p-values for a goodness of fit test.
> However you can investigate other models (such as probit) that use
> different functional forms for P(S). You can even write your own link
> function to try alternative custom-made models.
>
> Al Feiveson
>
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Ashwin
> Ananthakrishnan
> Sent: Thursday, September 07, 2006 7:29 AM
> To: [email protected]
> Subject: st: Hosmer-Lemeshow sticky issues
>
> Hi,
>
> I"m trying to confirm goodness of fit for a logistic regression model
> I'm working on, but I keep ending up with very small p-values
> implying
> poor fit.
>
> My outcome is a dichotomous variable - screened vs.
> not screened. My predictor variables are age category (in 5 year
> intervals), income tertiles, race, and gender. My final model
> constructed through stepwise backward elimination includes all the
> variables and some interaction terms. However, when I try to run the
> goodness of fit test (Pearson or Hosmer Lemeshow), I keep getting
> extremely small p-values.
>
> Can someone explain to me what this means? Does this mean that the
> model
> is not valid, and the odds ratios are incorrect?
>
> Can you get poor fit simply as a marker of large sample size (my
> sample
> size is 500 000)
>
> I'm not able to understand why the model doesn't fit when it has been
> constructed from the data stepwise backward elimination, and all the
> variables are univariately significant?
>
>
> Thanks.
> Ashwin
>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam? Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
> *
> * For searches and help try:
> * http://www.stata.com/support/faqs/res/findit.html
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
> *
> * For searches and help try:
> * http://www.stata.com/support/faqs/res/findit.html
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
-----------------------------------------
Maarten L. Buis
Department of Social Research Methodology
Vrije Universiteit Amsterdam
Boelelaan 1081
1081 HV Amsterdam
The Netherlands
visiting adress:
Buitenveldertselaan 3 (Metropolitan), room Z434
+31 20 5986715
http://home.fsw.vu.nl/m.buis/
-----------------------------------------
___________________________________________________________
Try the all-new Yahoo! Mail. "The New Version is radically easier to use" � The Wall Street Journal
http://uk.docs.yahoo.com/nowyoucan.html
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/