Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: MIXLOGIT: marginal effects
From
Maarten Buis <[email protected]>
To
[email protected]
Subject
Re: st: MIXLOGIT: marginal effects
Date
Tue, 7 Feb 2012 10:46:23 +0100
On Tue, Feb 7, 2012 at 8:50 AM, Clive Nicholas wrote:
> However, both of you, IMVHO, are wrong, wrong, wrong about the linear
> probability model. There is no justification for the use of this model
> _at all_ when regressing a binary dependent variable on a set of
> regressors. Pampel's (2000) excellent introduction on logistic
> regression spent the first nine or so pages carefully explaining just
> why it is inappropriate (imposing linearity on a nonlinear
> relationship; predicting values out of range; nonadditivity; etc).
> Since when was it in vogue to advocate its usage? I'm afraid that I
> don't really understand this.
>
> Pampel FC (2000) Logistic Regression: A Primer (Sage University Papers
> Series on QASS, 07-132), Thousand Oaks, CA: Sage
There is one situation where the linear probability model is
completely unproblematic and that is when you have a completely
saturated model, i.e. when all your explanatory variables are
categorical and all interaction terms are included. In that case the
predictions of a linear probability model will exactly correspond with
the prediction of a logit model, as you can see below:
*---------------- begin example ------------------
sysuse nlsw88, clear
gen byte goodjob = occupation < 3 if occupation < .
logit goodjob i.collgrad##i.south##i.union
predict pr_logit
reg goodjob i.collgrad##i.south##i.union, vce(robust)
predict pr_reg
tab pr_*
*----------------- end example -------------------
The residuals in a linear probability model are heteroskedastic, but
you can easily get around that by specifying the -vce(robust)- option.
If you do that than both -logit- and -regress- will give valid
inference:
*------------ begin simulation ---------------
tempname trd
scalar `trd' = invlogit(.5)-invlogit(-.5)
di as txt "The true risk difference is " ///
as result `trd'
program drop _all
program define sim, rclass
drop _all
set obs 500
gen x = _n < 251
gen y = runiform() < invlogit(-.5 + x)
logit y x
return scalar lor = _b[x]
return scalar lse = _se[x]
reg y x, vce(robust)
return scalar rd = _b[x]
return scalar rse = _se[x]
end
simulate lor=r(lor) lse=r(lse) ///
rd=r(rd) rse=r(rse), ///
reps(20000) : sim
// logit works fine:
simsum lor, true(1) se(lse)
// linear probability model works fine too:
simsum rd, true(`trd') se(rse)
*------------- end simulation ----------------
So in case of a fully saturated model, it is really a matter of
whether you want your parameters in terms of differences in
probabilities or ratios of odds.
In models that do not include all interactions or where you add a
continuous explanatory variable the linear probability model is more
restrictive. However, that does not bother me too much; a model is
after all supposed to be a simplification of reality. You obviously do
want to check that the deviations from linearity or the predictions
outside the [0,1] interval are not getting too much out of hand, but I
think there will be many situations where the linear probability model
is perfectly adequate.
Having said all that, in my own research I still almost always use a
logit rather than a linear probability model, but that is a choice not
a necessity.
Hope this helps,
Maarten
--------------------------
Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen
Germany
http://www.maartenbuis.nl
--------------------------
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/