Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: significant gof with poisson in large N regression

From	"Steve Rothenberg" <[email protected]>
To	<[email protected]>
Subject	st: significant gof with poisson in large N regression
Date	Mon, 14 May 2012 14:07:00 -0500

I'm running -poisson- to model data with over 4300 observation periods with
an "exposure" variable.  The goal of the model is to get good predictions of
mean counts given the explanatory variables.  The models are stratified by
age range and counts run from 0 to greater than 14.  On the preliminary
analysis using all data, unstratified by age, -estat gof- returns a
non-significant chi2.  On several of the age-stratified models, -estat gof-
returns results similar to below:
	Goodness-of-fit chi2 = 4902.563
	Prob > chi2(4334) = 0.0000
The model contains many highly significant variables.  Predicted Pr(y=k)
from Poisson differs from observed by at most less than 0.01 and for most
count bins less than 0.001.

I've read the manual entries on count data regressions as well as for -estat
gof-.  I've searched Statalist and have found other large N Poisson
regressions with similarly significant gof statistics, but no discussion
about the power of -estat gof- for large N regressions.

I've tried applying constraints for groups of related categorical variables
to fix coefficients at a common value across categories for categories with
nearly identical coefficients, as suggested in the manual, without success
in lowering the gof statistic.  I've also (reluctantly) tried trimming the
models of non-significant variables, again without changing the gof
statistic much.  I've tried various transformations of the continuous
explanatory variables, again without success.  I've tried running -nbreg-
but the alpha statistic is somewhere in the neighborhood of 5 e-07, close
enough to zero to rule out negative binomial.  

There is considerable heterogeneity of residuals around some of the
continuous explanatory variables, an expected pattern given the nature of
these variables.  I've used robust estimation of standard errors, though I
don't think that -estat gof- takes this into account.

I've thought of using the non-stratified data set with variables to identify
the counts of age strata plus the interactions of the age strata with key
explanatory variables, but rearranging the data set to include these
variables would be a very long process.  I am also wary of interaction terms
in non-linear models where different arms of the interaction have different
variances.

I've seen enough "diagnostic" statistics, such as normality tests, return
highly significant results when the N was very large, even though
graphically there appears to be little deviation from the theoretical
distribution.  Is it possible that the power of -estat gof- is such that,
with large N analyses, even small deviations from expected values will
return significant gof statistics?

I'd appreciate advice on whether to go with my eyeball comparison of
predicted versus observed probabilities of counts and ignore the -estat gof-
statistic or if that statistic in these circumstances really indicates a
poorly fitted model.  Of course additional advice on other steps I could
take to reduce the gof statistic would also be welcome.

Best,
Steve Rothenberg
Instituto Nacional de Salud Publica
Cuernavaca, Morelos, Mexico
Stata 12.1 MP, Windows 7  





*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Prev by Date: Re: st: estat firststage stata 11 does not show coefficients table anymore :S
Next by Date: st: New version of -chardef- on SSC
Previous by thread: st: mata function for "lookup" or find rank if observation not in the ranked sample
Next by thread: st: New version of -chardef- on SSC
Index(es):
- Date
- Thread