Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: RE: probit questions


From   "Shehzad Ali" <[email protected]>
To   <[email protected]>
Subject   st: RE: RE: probit questions
Date   Wed, 25 Jun 2008 17:53:27 +0100

This is an interesting discussion on wald-chi2 statistics. I just thought to
share a related point. I had a similar problem when I was using cluster
sampling weights (option -cluster-) in my probit model. Data has 25 clusters
and about the same number variables with 900 observations in total. I wonder
if the wald-statistics issue also arises when the number of clusters and
variables are almost the same?

Cheers,

Shehzad
 

-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Verkuilen, Jay
Sent: 25 June 2008 17:35
To: [email protected]
Subject: st: RE: probit questions

Sun, Yan (IFPRI) wrote:
>I have couple of questions about the Probit model. My dependent
variable is a 0/1 binary choice (1=invest in technology, 0=no
investment) for user groups, independent variables are user groups'
characteristics (around 20).

>1) Which model is correct one: Probit or Logit? What is the STATA
command for checking this?

Unless you have very large samples (which you don't), they are nearly
indistinguishable. In general there is reason to prefer logit to probit
when you have potentially extreme probabilities. The logistic
distribution is very much like a t with 10 df in shape. 

The classic example of being able to tell the difference appears in
chess ranking. The Elo system is, essentially, based on logistic
regression. It was originally based on probit but in practice it turned
out that the probit didn't make enough extreme predictions. 


>2) I have small observations (total 170 observations, but valid obs. Is
only around 60 for all independent 
>variables), sometimes the regression does not report report "wald chi2"
statistics. What is the reason for this?
>3) I got a note after right after the regression, which says "8
failures and 7 successes completely determined", >what does this means?

Simply put you have too many independent variables for your sample. It
sounds like you may have some missing data as well, since the number of
valid observations is much smaller than the number of observations. The
standard errors and Wald statistics failing is one sign. The perfect
predictions is another. You need to deal with the missing data (-findit
ice-) and even then, you have WAY too many independent variables for 170
observations. Very roughly speaking, you should have 10 observations per
variable, and probably more for binary data, which don't have that much
information per observation. Either get more data or get rid of
variables. 


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

No virus found in this incoming message.
Checked by AVG. 
Version: 8.0.101 / Virus Database: 270.4.1/1518 - Release Date: 25/06/2008
09:46

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index