Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

AW: st: probit vs. logit

From	"Martin Weiss" <[email protected]>
To	<[email protected]>
Subject	AW: st: probit vs. logit
Date	Tue, 25 May 2010 13:34:01 +0200

<> 

" and after -probit- 

. predict probit_p "probit prediction""




I bet that is a shorthand for 


*************
predict probit_p 
label var probit_p "probit prediction"
*************

? Otherwise, Stata complains about "too many variables specified"...


HTH
Martin


-----Ursprüngliche Nachricht-----
Von: [email protected]
[mailto:[email protected]] Im Auftrag von Nick Cox
Gesendet: Dienstag, 25. Mai 2010 13:20
An: [email protected]
Betreff: RE: st: probit vs. logit

The answers "In principle, you can definitely prefer one to the other"
and "In practice, the results may be very close" can both be true. (A
statistical version of complementarity to please the ghost of Niels
Bohr?) 

Developing Michael's line of argument, one simple thing I see less done
than it might be is just to calculate predictions and compare. Thus
predictions on a probability scale (p, say) can be got after -logit- 

. predict logit_p 
. label var logit_p "logit prediction" 

and after -probit- 

. predict probit_p "probit prediction" 

Then any number of graphical and numerical comparisons are possible. The
scatter plot

. scatter logit_p probit_p 

is the propaganda or sales pitch plot "Look, the predictions are the
same, really!" while to turn a magnifying-glass on the fine structure of
disagreement it may make as much or more sense to compare using log p,
log(1 - p), logit p or yet other scales. 
Here the science underlying what is being done, assuming that there is
some, is important in guiding assessment. 

Nick 
[email protected] 

Michael N. Mitchell

I agree with Martin, that the choice of "logit" vs. "probit" appears to
be largely 
discipline specific. If this is for publication or presentation, then it
might be useful 
to see what the customs are for your audience.

If someone gets picky with you and really wants to see a comparison of
the model fit of 
the two models, I think you could use -estimates store- and -estimates
stats- (as shown 
below) to compare the fit of the models using the AIC and/or BIC (where
a smaller value 
means better fit). As in the example below, the two values are nearly
identical, and I 
think we all expect that this would generally be the case.

--- snip ---

. sysuse auto
(1978 Automobile Data)

. logit  foreign mpg price weight

Iteration 0:   log likelihood =  -45.03321
Iteration 1:   log likelihood = -22.244792
Iteration 2:   log likelihood = -18.069284
Iteration 3:   log likelihood = -17.184699
Iteration 4:   log likelihood = -17.161975
Iteration 5:   log likelihood = -17.161893
Iteration 6:   log likelihood = -17.161893

Logistic regression                               Number of obs   =
74
                                                   LR chi2(3)      =
55.74
                                                   Prob > chi2     =
0.0000
Log likelihood = -17.161893                       Pseudo R2       =
0.6189

------------------------------------------------------------------------
------
      foreign |      Coef.   Std. Err.      z    P>|z|     [95% Conf.
Interval]
-------------+----------------------------------------------------------
------
          mpg |  -.1210918   .0956855    -1.27   0.206     -.308632
.0664483
        price |   .0009264   .0003074     3.01   0.003      .000324
.0015288
       weight |  -.0068497   .0019996    -3.43   0.001    -.0107688
-.0029306
        _cons |   14.42237   5.414367     2.66   0.008      3.81041
25.03434
------------------------------------------------------------------------
------

. estimates store model1

. probit  foreign mpg price weight

teration 0:   log likelihood =  -45.03321
Iteration 1:   log likelihood = -20.083125
Iteration 2:   log likelihood = -17.363271
Iteration 3:   log likelihood = -17.152935
Iteration 4:   log likelihood = -17.151715
Iteration 5:   log likelihood = -17.151715

Probit regression                                 Number of obs   =
74
                                                   LR chi2(3)      =
55.76
                                                   Prob > chi2     =
0.0000
Log likelihood = -17.151715                       Pseudo R2       =
0.6191

------------------------------------------------------------------------
------
      foreign |      Coef.   Std. Err.      z    P>|z|     [95% Conf.
Interval]
-------------+----------------------------------------------------------
------
          mpg |  -.0723615   .0556501    -1.30   0.193    -.1814337
.0367106
        price |   .0005185   .0001651     3.14   0.002      .000195
.0008421
       weight |  -.0038232   .0010392    -3.68   0.000      -.00586
-.0017864
        _cons |   8.150001   2.962982     2.75   0.006     2.342664
13.95734
------------------------------------------------------------------------
------

. estimates store model2

. estimates stats model1 model2

------------------------------------------------------------------------
-----
        Model |    Obs    ll(null)   ll(model)     df          AIC
BIC
-------------+----------------------------------------------------------
-----
       model1 |     74   -45.03321   -17.16189      4     42.32379
51.54005
       model2 |     74   -45.03321   -17.15171      4     42.30343
51.51969
------------------------------------------------------------------------
-----
                Note:  N=Obs used in calculating BIC; see [R] BIC note

--- snip ----

I hope that helps,

On 2010-05-24 11.36 PM, Maarten buis wrote:
> --- On Mon, 24/5/10, SR Millis wrote:
>> Logistic regression is generally preferred over the probit
>> model because of the wider variety of fit statistics. Also,
>> exponentiated logit coefficients can be interpreted as odds
>> ratios---which is not the case with probit coefficients.
>
> A general preference for one or the other is to a large
> extend discipline dependent. For example, within economics
> the probit is the "default" method. I like interpreting
> effects in terms of odds ratios as a way of identifying the
> scale, which is unidentified in a probit model (it is
> identified by fixing the residual variance to one, which
> has all kinds of nasty consequences when interpreting
> interaction terms). So, I tend to use the -logit-.
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- RE: st: probit vs. logit
  - From: "Nick Cox" <[email protected]>

References:
- Re: st: probit vs. logit
  - From: Maarten buis <[email protected]>
- Re: st: probit vs. logit
  - From: "Michael N. Mitchell" <[email protected]>
- RE: st: probit vs. logit
  - From: "Nick Cox" <[email protected]>

Prev by Date: st: Outcome effect calculation with Propensity Score matching
Next by Date: RE: st: probit vs. logit
Previous by thread: RE: st: probit vs. logit
Next by thread: RE: st: probit vs. logit
Index(es):
- Date
- Thread