Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Goodness of Fit Measure for Generalized Linear Models with Adjustment for the Number of Parameters
From
Nick Cox <[email protected]>
To
"[email protected]" <[email protected]>
Subject
Re: st: Goodness of Fit Measure for Generalized Linear Models with Adjustment for the Number of Parameters
Date
Mon, 10 Mar 2014 09:57:51 +0000
Our views seem close. When I first encountered information criteria it
was already apparent that there were several to choose from and I kept
reading statements of the form, "People often use ?IC, but it is well
known that it often gives a misleading answer". I guess I line up with
the idea that criteria can be trusted except in so far as they can't.
Besides, I tend to want to choose models at least partly on whether
they make physical sense (read economic, medical, ...) and no IC helps
with that. But then I don't fit more models than I can think about
individually. Some people are, or feel, obliged to automate model
choice.
The core of my advice is twofold (if, from your comments, likely to
seem redundant).
1. Whatever you do you might well have to defend, so think how you
would explain this to a sceptical/hostile adviser, reviewer, reader or
listener.
2. Formulas don't necessarily keep their meaning outside their
original and natural habitat.
Nick
[email protected]
On 10 March 2014 08:44, Roberto Liebscher <[email protected]> wrote:
> Thanks, Nick. Clearly a R-squared from an OLS model is not comparable with a
> R-squared from a GLM as computed in the before mentioned way. I understand
> your point that for the purpose of comparing non-nested models information
> criteria seem preferable in this case. However, I am not a big fan of
> information criteria because contrary to R-squared they do not offer an
> intuitive understanding. The correlation between predicted and actual values
> adjusted for the number of parameters is easier to grasp than minus two
> times the log likelihood plus two times the number of parameters. When I
> state the adjusted R-squared with the number of observations and parameters
> in the model the reader can easily backout the "initial" R-squared.
> Especially when I fit different dependent variables to the same model and
> report the results in one table this procedure is (at least for me) easier
> to understand and allow for the comparison of these models with different
> endogenous variables. But I got your point that this is somewhat a stretch
> to avoid using AIC or BIC.
Am 05.03.2014 20:44, schrieb Nick Cox:
>> The impulse here is a little puzzling to me. Others here will have a
>> deeper mathematical statistics grasp of this than I do, but as I think
>> no one has commented I will jump in.
>>
>> The model you're fitting is estimated using a pseudo- or quasi-maximum
>> likelihood procedure. That doesn't rule out calculating an R-squared
>> measure as a descriptive or heuristic indicator of goodness of fit,
>> which I've been positive about elsewhere e.g.
>> http://www.stata.com/support/faq/statistics/r-squared/index.html That
>> is a stretch insofar as your fitting is strictly not equivalent to
>> maximising R-squared, which is one view of regression. But as long as
>> you use words like "heuristic" people may not be too harsh about that.
>>
>> However, if you now consider the general idea that you should consider
>> penalising yourself for using several predictors, the impulse to
>> adjust R-squared seems even more of a stretch. If there is a need to
>> think about the trade-off between simplicity and fit it is perhaps
>> better done using AIC or BIC.
>>
>> Note that everything is at least a little controversial in this
>> territory: most people are moderately fond of some information
>> criterion, but there is essentially no agreement that one is best.
On 4 March 2014 17:04, Roberto Liebscher <[email protected]> wrote:
>>> I model a fractional response variable with a GLM similar to Papke, L.E.,
>>> Wooldridge, J.M., 1996. Econometric Methods for Fractional Response
>>> Variables with an Application to 401(K) Plan Participation Rates. Journal
>>> of
>>> Applied Econometrics 11 (1). 619-632.
>>>
>>> I would like to obtain a goodness-of-fit measure that incorporates the
>>> number of parameters in a fashion similar to the adjusted R-squared. It
>>> is
>>> tempting to compute the correlation between the predicted and the
>>> observed
>>> values (like in Christopher F Baum's example here:
>>> http://fmwww.bc.edu/EC-C/S2013/823/EC823.S2013.nn06.slides.pdf ) and
>>> compute
>>> the adjusted R-squared according to the formula
>>> $R^2-(1-R^2)\frac{p}{n-p-1}$. Since I have never seen something similar
>>> in
>>> papers so far my question is if there is something wrong about it?
>>>
>>> Moreover, from a computational point of view one could also estimate the
>>> quasi log likelihood function of the unrestricted and the restricted
>>> model
>>> and follow McFadden's procedure (McFadden's adjusted R^2:
>>> http://www.ats.ucla.edu/stat/mult_pkg/faq/general/Psuedo_RSquareds.htm ).
>>> If
>>> the only goal is to compare non-nested models is there any reason not to
>>> use
>>> such a measure?
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/