Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
AW: st: probit vs. logit
From
"Martin Weiss" <[email protected]>
To
<[email protected]>
Subject
AW: st: probit vs. logit
Date
Tue, 25 May 2010 13:34:01 +0200
<>
" and after -probit-
. predict probit_p "probit prediction""
I bet that is a shorthand for
*************
predict probit_p
label var probit_p "probit prediction"
*************
? Otherwise, Stata complains about "too many variables specified"...
HTH
Martin
-----Ursprüngliche Nachricht-----
Von: [email protected]
[mailto:[email protected]] Im Auftrag von Nick Cox
Gesendet: Dienstag, 25. Mai 2010 13:20
An: [email protected]
Betreff: RE: st: probit vs. logit
The answers "In principle, you can definitely prefer one to the other"
and "In practice, the results may be very close" can both be true. (A
statistical version of complementarity to please the ghost of Niels
Bohr?)
Developing Michael's line of argument, one simple thing I see less done
than it might be is just to calculate predictions and compare. Thus
predictions on a probability scale (p, say) can be got after -logit-
. predict logit_p
. label var logit_p "logit prediction"
and after -probit-
. predict probit_p "probit prediction"
Then any number of graphical and numerical comparisons are possible. The
scatter plot
. scatter logit_p probit_p
is the propaganda or sales pitch plot "Look, the predictions are the
same, really!" while to turn a magnifying-glass on the fine structure of
disagreement it may make as much or more sense to compare using log p,
log(1 - p), logit p or yet other scales.
Here the science underlying what is being done, assuming that there is
some, is important in guiding assessment.
Nick
[email protected]
Michael N. Mitchell
I agree with Martin, that the choice of "logit" vs. "probit" appears to
be largely
discipline specific. If this is for publication or presentation, then it
might be useful
to see what the customs are for your audience.
If someone gets picky with you and really wants to see a comparison of
the model fit of
the two models, I think you could use -estimates store- and -estimates
stats- (as shown
below) to compare the fit of the models using the AIC and/or BIC (where
a smaller value
means better fit). As in the example below, the two values are nearly
identical, and I
think we all expect that this would generally be the case.
--- snip ---
. sysuse auto
(1978 Automobile Data)
. logit foreign mpg price weight
Iteration 0: log likelihood = -45.03321
Iteration 1: log likelihood = -22.244792
Iteration 2: log likelihood = -18.069284
Iteration 3: log likelihood = -17.184699
Iteration 4: log likelihood = -17.161975
Iteration 5: log likelihood = -17.161893
Iteration 6: log likelihood = -17.161893
Logistic regression Number of obs =
74
LR chi2(3) =
55.74
Prob > chi2 =
0.0000
Log likelihood = -17.161893 Pseudo R2 =
0.6189
------------------------------------------------------------------------
------
foreign | Coef. Std. Err. z P>|z| [95% Conf.
Interval]
-------------+----------------------------------------------------------
------
mpg | -.1210918 .0956855 -1.27 0.206 -.308632
.0664483
price | .0009264 .0003074 3.01 0.003 .000324
.0015288
weight | -.0068497 .0019996 -3.43 0.001 -.0107688
-.0029306
_cons | 14.42237 5.414367 2.66 0.008 3.81041
25.03434
------------------------------------------------------------------------
------
. estimates store model1
. probit foreign mpg price weight
teration 0: log likelihood = -45.03321
Iteration 1: log likelihood = -20.083125
Iteration 2: log likelihood = -17.363271
Iteration 3: log likelihood = -17.152935
Iteration 4: log likelihood = -17.151715
Iteration 5: log likelihood = -17.151715
Probit regression Number of obs =
74
LR chi2(3) =
55.76
Prob > chi2 =
0.0000
Log likelihood = -17.151715 Pseudo R2 =
0.6191
------------------------------------------------------------------------
------
foreign | Coef. Std. Err. z P>|z| [95% Conf.
Interval]
-------------+----------------------------------------------------------
------
mpg | -.0723615 .0556501 -1.30 0.193 -.1814337
.0367106
price | .0005185 .0001651 3.14 0.002 .000195
.0008421
weight | -.0038232 .0010392 -3.68 0.000 -.00586
-.0017864
_cons | 8.150001 2.962982 2.75 0.006 2.342664
13.95734
------------------------------------------------------------------------
------
. estimates store model2
. estimates stats model1 model2
------------------------------------------------------------------------
-----
Model | Obs ll(null) ll(model) df AIC
BIC
-------------+----------------------------------------------------------
-----
model1 | 74 -45.03321 -17.16189 4 42.32379
51.54005
model2 | 74 -45.03321 -17.15171 4 42.30343
51.51969
------------------------------------------------------------------------
-----
Note: N=Obs used in calculating BIC; see [R] BIC note
--- snip ----
I hope that helps,
On 2010-05-24 11.36 PM, Maarten buis wrote:
> --- On Mon, 24/5/10, SR Millis wrote:
>> Logistic regression is generally preferred over the probit
>> model because of the wider variety of fit statistics. Also,
>> exponentiated logit coefficients can be interpreted as odds
>> ratios---which is not the case with probit coefficients.
>
> A general preference for one or the other is to a large
> extend discipline dependent. For example, within economics
> the probit is the "default" method. I like interpreting
> effects in terms of odds ratios as a way of identifying the
> scale, which is unidentified in a probit model (it is
> identified by fixing the residual variance to one, which
> has all kinds of nasty consequences when interpreting
> interaction terms). So, I tend to use the -logit-.
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/