Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: "Can Your Results be Replicated?" (Stata error?)

From	"Santos Silva, J.M.C." <[email protected]>
To	<[email protected]>
Subject	Re: st: "Can Your Results be Replicated?" (Stata error?)
Date	Sat, 14 Sep 2013 18:32:53 +0100
Dear Jeff,

Thank you for your detailed explanation and for looking into the issue.

Since Stata will work on the detection of perfect predictors on a number
of estimators, may I suggest that you do the same also for count data
models and for Tobit-like models? In both of these cases regressors
that perfectly predict the zeros generate the same problem and I am not
aware of any software that deals with the issue (apart from the ppml 
command for Stata that I have co-authored); it would be great if Stata
could lead the way here. 

Somewhat off-topic but still on the issue of Stata being self-consistent,
I am always troubled by the fact that using _rmcoll before reg can 
substantially change the results because _rmcoll and reg can make 
very different decisions on what variables are collinear. The following
code (inspired by a situation I have found "in the wild") illustrates the 
problem.

*------------------------
clear all
set obs 100
set seed 123
qui g y =rnormal()
forvalues i = 1/10 {
qui g x`i' =rnormal(3000,0.1)
}
reg   y x* , 
_rmcoll x* ,
reg   y `r(varlist)'
*------------------------

All the best,

Joao


> This post is rather long.  For those interested in cutting to the action items
> we will take as a result of this thread, please scroll to the end.
> 
> Philip Jones <[email protected]> writes about a Twitter feed he
> received today:
> 
>> I found a link on my Twitter feed this AM, purporting to show how
>> Stata "made a mistake" that R did not make:
>> 
>> http://www.r-bloggers.com/can-your-results-be-replicated/
>> 
>> which actually points to:
>> 
>> http://politicalsciencereplication.wordpress.com/2013/09/11/guest-blog-how-
>> to-persuade-journals-to-accept-your-replication-paper/
>> 
>> I realize that "r-bloggers" is likely not the most bias-free site when
>> it comes to reviewing/rating stats packages, but has anyone got an
>> idea as to what is actually going on here? Is Stata really at fault?
>> 
> Joerg Luedicke <[email protected]> looked at the paper:
> 
>> After having a quick glance at their paper
>> (http://jcr.sagepub.com/content/early/2013/08/19/0022002713499718.abstract?papet
>> oc)
>> it seems that they are talking about a problem with Stata's -xtgee-
>> command which, in the case of separation in a logit model, provides
>> nonsense results as opposed to omitting predictors or the like. Below
>> is a toy example showing what seems to be the problem. However,
>> finding an effect of something like "x is 3 million times less likely
>> than y" and not getting suspicious rather looks like sloppy research
>> to me in the first place.
>> 
> Anders Alexandersson <[email protected]> followed up with one of the
> authors and also found out the "mistake" was related to using the -xtgee-
> command in fitting a logit model with a separation problem (perfect
> predictors).
> 
> Joerg's example simulates a logit model with random effects, but ensures that
> the simulated data perfectly predicts a zero response for one level of a
> factor variable covariate.  Here is Joerg's example:
> 
> *------------------------
> clear
> set obs 100
> set seed 123
> 
> gen id = _n
> gen ui = rnormal(0,0.5)
> 
> expand 10
> bys id : gen year = _n
> gen x = cond(mod(_n-1, 3) == 1, 1, cond(mod(_n-1, 3) == 0, 2, 3))
> tab x, gen(x_)
> 
> gen xb = 1 / (1 + exp(-(0.3*x_2 + 0.3*x_3 + ui)))
> gen y = rbinomial(1,xb)
> replace y = 0 if x_1 == 1
> tab y x
> 
> xtset id year
> xtgee y i.x, fam(binomial) link(logit)
> melogit y i.x || id:
> logit y i.x
> *------------------------
> 
> Here are some comments:
> 
> 1. -logit- has some code that specifically checks for perfect predictors.
> When a perfect predictor is found, the associated covariate is typically
> omitted or a range of observations are marked out of the estimation sample
> (usually both actions are taken).
> 
> -logit-'s -asis- option turns off this feature.  If there are prefect
> predictors, -logit- with the -asis- option usually fails to converge
> because the model is not identified without the above actions.
> 
> 2. -melogit- assumes -asis- by default.  There is a NOT documented -noasis-
> option that will cause -melogit- to use the perfect predictor code
> described in the previous comment.  Here is the result of the model fit
> using Joerg's simulated data:
> 
> ***** BEGIN:
> . melogit y i.x, noasis || id:
> note: 1.x != 0 predicts failure perfectly
> 1.x dropped and 333 obs not used
> 
> note: 3.x omitted because of collinearity
> 
> Fitting fixed-effects model:
> 
> Iteration 0:   log likelihood = -450.55022
> Iteration 1:   log likelihood = -450.08693
> Iteration 2:   log likelihood = -450.08688
> Iteration 3:   log likelihood = -450.08688
> 
> Refining starting values:
> 
> Grid node 0:   log likelihood = -458.42515
> 
> Fitting full model:
> 
> Iteration 0:   log likelihood = -458.42515  (not concave)
> Iteration 1:   log likelihood = -450.03937
> Iteration 2:   log likelihood = -449.71284
> Iteration 3:   log likelihood = -449.35817
> Iteration 4:   log likelihood = -449.35741
> Iteration 5:   log likelihood = -449.35741
> 
> Mixed-effects logistic regression               Number of obs      =       667
> Group variable:              id                 Number of groups   =       100
> 
> Obs per group: min =         6
> avg =       6.7
> max =         7
> 
> Integration method: mvaghermite                 Integration points =         7
> 
> Wald chi2(1)       =      1.74
> Log likelihood = -449.35741                     Prob > chi2        =    0.1875
> ------------------------------------------------------------------------------
> y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
> -------------+----------------------------------------------------------------
> x |
> 1  |          0  (empty)
> 2  |    .211456    .160449     1.32   0.188    -.1030183    .5259302
> 3  |          0  (omitted)
> |
> _cons |   .2773084   .1179143     2.35   0.019     .0462006    .5084161
> -------------+----------------------------------------------------------------
> id           |
> var(_cons)|   .1253563   .1189478                      .0195191     .805068
> ------------------------------------------------------------------------------
> LR test vs. logistic regression: chibar2(01) =     1.46 Prob>=chibar2 = 0.1135
> ***** END:
> 
> Notice that 1/3 of the data are dropped from the estimation sample because
> x==1 is a perfect predictor for y==0.
> 
> 3. Joerg simulated the data using a zero for the intercept.  In order to
> recover the intended model parameters, we can refit using 'bn.x' and the
> -noconstant- option.  'bn.x' specifies that 'x' is a factor variable, but
> prevents Stata from identifying a base level.
> 
> . melogit y bn.x, noasis noconstant || id:
> 
> This will yield an equivalent model fit to the one above, except the
> coeffienct on 3.x will be .2773084 and the intercept constrained to zero.
> 
> 4. -xtgee- does not currently have any logic for dealing with perfect
> predictors for the -logit- model.  For that matter, neither does -xtlogit,
> re-.
> 
> 5. All this perfect predictor stuff for the -logit- model also applies to the
> -probit- model.
> 
> Action items:
> 
> 1. We will re-evaluate our choice of NOT documenting the -noasis- option in
> -melogit- and -meprobit-.  This may result in us changing their default
> behavior (under version control) to match -logit- and -probit-, with a
> newly documented -asis- option.
> 
> 2. We will work on getting logic for determining perfect predictors into
> 
> * -xtgee- for -logit- models, also known as -xtlogit, pa-
> * -xtgee- for -probit- models, also known as -xtprobit, pa-
> * -xtlogit, re-
> * -xtprobit, re-
> 
> All of these actions will be provided in a future update to Stata 13.
> 
> --Jeff
> [email protected]
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
Follow-Ups:
- Re: st: Count data with known upper limit
  - From: "Santos Silva, J.M.C." <[email protected]>
Prev by Date: st: log-likelihood comparison of logit, loglog and cloglog?
Next by Date: Re: st: Dropping right-censored spells in the Cox model
Previous by thread: Re: st: "Can Your Results be Replicated?" (Stata error?)
Next by thread: Re: st: Count data with known upper limit
Index(es):
- Date
- Thread