--- Martin Weiss <[email protected]> wrote:
> *********
> sysuse auto, clear
> probit foreign weight, nolog
> estat clas
> probit foreign weight mpg, nolog
> estat clas
> *********
>
> can I prefer one specification of covariates in a probit model
> over the other on the basis of the correctly classified cases as
> provided at the bottom of the classification table? If so, is there
> a confidence interval that would let me judge whether the difference
> between two models is significant?
There is another option: model 2 is model 1 when the coefficient of mpg
is equal to 0. This is an assumption you can test using the wald test
(the test that is immediately displayed in the output of -probit-), or
if you have multiple variables, the likelihood ratio test (-lrtest-).
The problem with the proportion correctly classified is that it depends
on the distribution of your dependent variable: if success is rare and
everybody is classified as a failure than the proportion correclty
specified is still large. In that case, adding an explanatory variable
isn't going to do much. This characteristic of the proportion correctly
specified is illustrated in the example below. The effect of x is the
same in each probit, all that is different is the constant, that is,
the proportion of successes. This dramatically influences how much
adding x to the model increasses the proportion correctly specified,
even though the effect of x is the same in all models.
*------------ begin example ---------------------
set more off
capture program drop sim
program define sim, rclass
drop _all
set obs 500
gen x = invnorm(uniform())
gen byte y1 = uniform() < normal(x)
probit y1
estat class
local p1 = r(P_corr)
probit y1 x
estat class
return scalar diff1 = r(P_corr) - `p1'
gen byte y2 = uniform() < normal(x-1)
probit y2
estat class
local p2 = r(P_corr)
probit y2 x
estat class
return scalar diff2 = r(P_corr) - `p2'
gen byte y3 = uniform() < normal(x-2)
probit y3
estat class
local p3 = r(P_corr)
probit y3 x
estat class
return scalar diff3 = r(P_corr) - `p3'
end
simulate diff1=r(diff1) ///
diff2=r(diff2) ///
diff3=r(diff3), ///
reps(100): sim
sum
*---------------- end example ----------------------
(For more on how to use examples I sent to the Statalist, see
http://home.fsw.vu.nl/m.buis/stata/exampleFAQ.html )
In general when it comes to selecting a model I would not rely on a
single statistic. Some quotes along this line can be found here:
http://www.stata.com/statalist/archive/2004-09/msg00535.html
The book "Regression Models for Categorical Dependent Variables Using
Stata" by J. Scott Long and Jeremy Freese
http://www.stata.com/bookstore/regmodcdvs.html contains a good
discussion of all the things you should take into account when
selecting a model.
Hope this helps,
Maarten
-----------------------------------------
Maarten L. Buis
Department of Social Research Methodology
Vrije Universiteit Amsterdam
Boelelaan 1081
1081 HV Amsterdam
The Netherlands
visiting address:
Buitenveldertselaan 3 (Metropolitan), room Z434
+31 20 5986715
http://home.fsw.vu.nl/m.buis/
-----------------------------------------
___________________________________________________________
Rise to the challenge for Sport Relief with Yahoo! For Good
http://uk.promotions.yahoo.com/forgood/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/