Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | "stefan.duke@gmail.com" <stefan.duke@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: linear probability model |
Date | Wed, 23 Jun 2010 23:35:22 +0200 |
As usual it depends a bit on which part of the forest you are coming from and the tools and experience you have. When you data is not very extreme, i.e. no too discriminant predictors, than the linear regression approximates the (middle part) of the logistic curve pretty well (see http://en.wikipedia.org/wiki/Logistic_function for a picture). So the estimation of probabilities for well behaved data doesn't differ much (and OLS runs better on old, say 20 year old, software). As you (should) use well behaved data your standard errors should be sufficiently approximat. normally distributed and hence you can draw inference (test for significance) from your OLS model, in particular when sample size goes to infinity (i.e. is large). On the other hand your model is not robust (for less well-behaved data) and a better, more appropriate model (logit, probit) is out there for which you need to check for less assumptions and it never gives you implausible probabilities. So to put it in a nutshell, if you have a large sample analyzing the effect of gender on smoking behavior in an advanced market society for young cohorts (not too discriminant) and do the analyze for , say, a political scientist who learnt some applied statistics 30 years age and since then stopped reading statistics books, the linear probability should work well enough. If you, on the other hand, analyze the effect of gender on consumption of, say, lipsticks in a society which has more backward gender roles (I hope this isn't too sexist) and do the analyze for somebody who got his phd in econometrics some 5 years ago you will be in trouble. For everything between the two extremes you are on your own. HTH, Stefan some On Wed, Jun 23, 2010 at 7:11 PM, Nick Cox <n.j.cox@durham.ac.uk> wrote: > I think this is far from the central issue. With continuous responses it > can be just as important as with binary responses to ensure that > predictions stay within the bounds of 0 and 1. > > Conversely a linear model might seem justifiable if predictions outside > those bounds only occurred way beyond the range of the data and if > linear, logit and probit give similar predictions. > > This is like anything else. I often argue, especially to students, that > choosing a qualitatively correct model precedes estimating the > parameters and focusing on quantitative fit. But little in this > territory seems absolute. I wouldn't turn down a Gaussian fit to human > heights if it fitted well merely because it predicts a positive > probability of negative heights, even though that is completely > unbiological. > > Nick > n.j.cox@durham.ac.uk > > Scott Millis > > The fundamental issue is the type of response variable that you have. > If it is binary, you would want to use a logit or probit model---not a > linear model. If your response variable is continuous, you would use a > linear model. > > --- On Wed, 6/23/10, dk <statad27@googlemail.com> wrote: > >> What are the advantages of linear >> probability model over probit and >> logit. i have read some where that linear probability model >> fits best >> for very large sample, where maximum likelihood with probit >> and logit >> does not work can any one explain this. > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/