Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: "Can Your Results be Replicated?" (Stata error?)

From	Richard Williams <[email protected]>
To	[email protected], "[email protected]" <[email protected]>
Subject	Re: st: "Can Your Results be Replicated?" (Stata error?)
Date	Wed, 18 Sep 2013 20:54:53 -0500

At 05:01 PM 9/18/2013, Stas Kolenikov wrote:

Jeff,

of other models/estimators, besides those brought up by Joao, -mlogit-
will also be affected by the perfect prediction problems. How does
Stata treat -mlogit- in this respect? I vaguely remember jumping
through the hoops of specifying a bunch of dummy variable "this
category vs. all others combined" to deal with it a few years back.

-- Stas Kolenikov, PhD, PStat (ASA, SSC)
-- Senior Survey Statistician, Abt SRBI
-- Opinions stated in this email are mine only, and do not reflect the
position of my employer
-- http://stas.kolenikov.name


Well, while we are at it, we can go after glm too:

webuse repair, clear
logit foreign b3.repair
glm foreign b3.repair, link(logit) family(binomial)

My own user-written programs, oglm and gologit2, also have troublewith perfect prediction. My experience has been that when there is aproblem, you get monstrous standard errors and extremely small teststatistics. I don't know if that is what always happens, but it ishopefully some sort of clue that there is a problem somewhere.

On Fri, Sep 13, 2013 at 3:12 PM, Jeff Pitblado, StataCorp LP
<[email protected]> wrote:

> This post is rather long. For those interested in cutting to theaction items

> we will take as a result of this thread, please scroll to the end.
>
> Philip Jones <[email protected]> writes about a Twitter feed he
> received today:
>
>> I found a link on my Twitter feed this AM, purporting to show how
>> Stata "made a mistake" that R did not make:
>>
>> http://www.r-bloggers.com/can-your-results-be-replicated/
>>
>> which actually points to:
>>

>>http://politicalsciencereplication.wordpress.com/2013/09/11/guest-blog-how-to-persuade-journals-to-accept-your-replication-paper/

>>
>> I realize that "r-bloggers" is likely not the most bias-free site when
>> it comes to reviewing/rating stats packages, but has anyone got an
>> idea as to what is actually going on here? Is Stata really at fault?
>
> Joerg Luedicke <[email protected]> looked at the paper:
>
>> After having a quick glance at their paper

>>(http://jcr.sagepub.com/content/early/2013/08/19/0022002713499718.abstract?papetoc)

>> it seems that they are talking about a problem with Stata's -xtgee-
>> command which, in the case of separation in a logit model, provides
>> nonsense results as opposed to omitting predictors or the like. Below
>> is a toy example showing what seems to be the problem. However,
>> finding an effect of something like "x is 3 million times less likely
>> than y" and not getting suspicious rather looks like sloppy research
>> to me in the first place.
>
> Anders Alexandersson <[email protected]> followed up with one of the
> authors and also found out the "mistake" was related to using the -xtgee-
> command in fitting a logit model with a separation problem (perfect
> predictors).
>

> Joerg's example simulates a logit model with random effects, butensures that

> the simulated data perfectly predicts a zero response for one level of a
> factor variable covariate.  Here is Joerg's example:
>
> *------------------------
> clear
> set obs 100
> set seed 123
>
> gen id = _n
> gen ui = rnormal(0,0.5)
>
> expand 10
> bys id : gen year = _n
> gen x = cond(mod(_n-1, 3) == 1, 1, cond(mod(_n-1, 3) == 0, 2, 3))
> tab x, gen(x_)
>
> gen xb = 1 / (1 + exp(-(0.3*x_2 + 0.3*x_3 + ui)))
> gen y = rbinomial(1,xb)
> replace y = 0 if x_1 == 1
> tab y x
>
> xtset id year
> xtgee y i.x, fam(binomial) link(logit)
> melogit y i.x || id:
> logit y i.x
> *------------------------
>
> Here are some comments:
>
> 1. -logit- has some code that specifically checks for perfect predictors.
>    When a perfect predictor is found, the associated covariate is typically

> omitted or a range of observations are marked out of theestimation sample

>    (usually both actions are taken).
>
>    -logit-'s -asis- option turns off this feature.  If there are prefect
>    predictors, -logit- with the -asis- option usually fails to converge
>    because the model is not identified without the above actions.
>
> 2. -melogit- assumes -asis- by default.  There is a NOT documented -noasis-
>    option that will cause -melogit- to use the perfect predictor code
>    described in the previous comment.  Here is the result of the model fit
>    using Joerg's simulated data:
>
> ***** BEGIN:
> . melogit y i.x, noasis || id:
> note: 1.x != 0 predicts failure perfectly
>       1.x dropped and 333 obs not used
>
> note: 3.x omitted because of collinearity
>
> Fitting fixed-effects model:
>
> Iteration 0:   log likelihood = -450.55022
> Iteration 1:   log likelihood = -450.08693
> Iteration 2:   log likelihood = -450.08688
> Iteration 3:   log likelihood = -450.08688
>
> Refining starting values:
>
> Grid node 0:   log likelihood = -458.42515
>
> Fitting full model:
>
> Iteration 0:   log likelihood = -458.42515  (not concave)
> Iteration 1:   log likelihood = -450.03937
> Iteration 2:   log likelihood = -449.71284
> Iteration 3:   log likelihood = -449.35817
> Iteration 4:   log likelihood = -449.35741
> Iteration 5:   log likelihood = -449.35741
>

> Mixed-effects logistic regression Number ofobs = 667> Group variable: id Number ofgroups = 100

> Obs per group:min = 6>avg = 6.7>max = 7

> Integration method: mvaghermite Integrationpoints = 7

> Waldchi2(1) = 1.74> Log likelihood = -449.35741 Prob >chi2 = 0.1875>------------------------------------------------------------------------------> y | Coef. Std. Err. z P>|z| [95%Conf. Interval]>-------------+----------------------------------------------------------------

>            x |
>           1  |          0  (empty)

> 2 | .211456 .160449 1.32 0.188-.1030183 .5259302

>           3  |          0  (omitted)
>              |

> _cons| .2773084 .1179143 2.35 0.019 .0462006 .5084161>-------------+----------------------------------------------------------------

> id           |

> var(_cons)| .1253563 .1189478.0195191 .805068>------------------------------------------------------------------------------> LR test vs. logistic regression: chibar2(01) = 1.46Prob>=chibar2 = 0.1135

> ***** END:
>

> Notice that 1/3 of the data are dropped from the estimationsample because

>    x==1 is a perfect predictor for y==0.
>
> 3. Joerg simulated the data using a zero for the intercept.  In order to
>    recover the intended model parameters, we can refit using 'bn.x' and the

> -noconstant- option. 'bn.x' specifies that 'x' is a factorvariable, but

>    prevents Stata from identifying a base level.
>
>         . melogit y bn.x, noasis noconstant || id:
>
>    This will yield an equivalent model fit to the one above, except the

> coeffienct on 3.x will be .2773084 and the interceptconstrained to zero.

>
> 4. -xtgee- does not currently have any logic for dealing with perfect

> predictors for the -logit- model. For that matter, neitherdoes -xtlogit,

>    re-.
>

> 5. All this perfect predictor stuff for the -logit- model alsoapplies to the

>    -probit- model.
>
> Action items:
>
> 1. We will re-evaluate our choice of NOT documenting the -noasis- option in
>    -melogit- and -meprobit-.  This may result in us changing their default
>    behavior (under version control) to match -logit- and -probit-, with a
>    newly documented -asis- option.
>
> 2. We will work on getting logic for determining perfect predictors into
>
>    * -xtgee- for -logit- models, also known as -xtlogit, pa-
>    * -xtgee- for -probit- models, also known as -xtprobit, pa-
>    * -xtlogit, re-
>    * -xtprobit, re-
>
> All of these actions will be provided in a future update to Stata 13.
>
> --Jeff
> [email protected]
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
OFFICE: (574)631-6668, (574)631-6463
HOME:   (574)289-5227
EMAIL:  [email protected]
WWW:    http://www.nd.edu/~rwilliam

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

Prev by Date: Re: st: Predicted probabilities for logistic regression after svy
Next by Date: RE: st: data management question
Previous by thread: Re: st: "Can Your Results be Replicated?" (Stata error?)
Next by thread: Re: st: "Can Your Results be Replicated?" (Stata error?)
Index(es):
- Date
- Thread