I agreed strongly with Richard before his last paragraph.
My own bias is to try to steer the discussion in
the opposite direction, away from all ideas of "best":
* That discussion goes in a circle with a discussion of criteria
for "best", and there are lots, as everyone knows. After all,
we go round and round on preferred measures of location, scale,
shape, association in two-way tables, rank correlation, and so
forth.
* There are all sorts of theoretical and practical arguments for
saying that in many fields far too much emphasis is already placed
on single-number figures of merit (as compared with looking
at graphs, looking at residuals, detailed discussion of the
scientific and practical issues behind variable choice, model
structure, etc.). Sometimes it seems that researchers will
spend a very long time producing or collating data, formatting
it for software, writing programs, ..., and then expect to make a
quick decision on model virtues based on a few magic numbers!
* These questions of which measures to use
seem to arise primarily when response variables are categorical
(wide sense). The even wider context including measured responses
is, I hope everyone will agree, vital. After all, the history
presumably is that people wanted measures fulfilling the same
role as R^2 in (say) multiple regression -- even if that role
is often aggressive, not analytical, using R^2 to intimidate,
rather than to inform.
There are two simple ideals, it seems to me: that everyone
should state clearly what definition of R^2 they are
using; and that in principle enough information should be
provided to allow other measures to be calculated. Beyond
that, if measures fail to agree numerically, then choosing
one as best requires a special argument (which,
for all I know, could be "this is what people use in this
field, so I'll use it too").
There are more platitudes posing as homespun wisdom at
http://www.stata.com/support/faqs/stat/rsquared.html
(and also some references and some code fragments).
Nick
[email protected]
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]]On Behalf Of Richard
> Williams
> Sent: 28 October 2003 02:32
> To: [email protected]
> Subject: Re: st: definition of pseudo R^2 for dprobit or probit
>
>
> At 08:04 PM 10/27/2003 -0600, Scott Merryman wrote:
>
> >[R] maximize, Methods and Formulas section
> >
> >Pseudo R2 = 1 - L1/L0, where L1 is the log likelihood of
> the full model and L0
> >is the log likelihood of the constant-only model.
>
> That is one of a couple of equivalent formulas but probably
> the simplest to
> write in an email message! Certainly clearer than what I
> wrote earlier.
>
> As a sidelight, this is one of many statistics that claims
> the name of
> "Pseudo R2". It would be nice if Stata explicitly labeled it as
> McFadden's R2, and perhaps reported a couple of the other
> alternatives in
> case anybody wants them.
>
> Of the various alternatives, McFadden's R2 seems to have
> emerged as the
> favorite and best. Anybody strongly disagree and think
> something else is
> better?
>
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/