Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Rabe-Hesketh's gllamm: multivariate multilevel dropout model

From	Jeph Herrin <[email protected]>
To	[email protected]
Subject	Re: st: Rabe-Hesketh's gllamm: multivariate multilevel dropout model
Date	Fri, 24 May 2013 13:46:40 -0400

Your original post seemed to place the blame on the code, and the authors of it, for not working the way you thought itshould, before you made the effort to understand what was going on. But when a model does not converge for your data,yes, it is a reasonable response to suggest you consider the data, and whether the model is appropriate for it, ratherthan confirming that "what [you] think is really needed" is for the authors of the code to "clarify their 2002 work".

However, I appreciate that there might have been some miscommunication, so I'll expand on what I hope is the most usefulpart of my reply. The errors you are seeing ("flat or discontinuous region encountered") are generated because theprogram is unable to converge on a single solution. This message is not unique to -gllamm- but is reported by any Stataroutine that uses maximum likelihood estimation.

Generally, one resolves this kind of problem by starting with a simpler but analogous model that does converge and thenadding complexity to the model to determine what parameter(s) are causing the problem. It is not clear to me how to dothat here, so some other questions:

Why did you need to introduce x_i1, when there was not one in the original model? When you say it will not work withoutit, what do you mean? Because it is also not working with it, apparently, and it seems that in doing so you are changingthe meaning of several of the options, such as geqs().


How many observations did you simulate? Complex models typically don't converge as easily for small datasets.

Where does i2 appear? You refer to it, but it is not in the model.

What is the link you mention? Might be helpful if we could compare with the model you were trying to replicate.

Where did you get the matrix B? It specifies a starting point (as it were) for finding a solution, and if it is poorlyspecified, it will cause trouble.


cheers,

J


On 5/24/2013 11:24 AM, Kyle Fluegge wrote:

There might be some misunderstanding about what I have done with this. This is not my data per se or my model even (in terms of applying results to real-world situations). I have not even used my real data that I would like to model. May I ask how you initially go about investigating a model or program that you would like to use? I always like to use a programmer's code and sample data to ensure their model does what it says before even attempting to apply to my own situation. If it does not run, then I become skeptical and ask for clarification or seek a resolution via another channel (i.e., statalist) before using. This is a case of that. Your response is not particularly helpful or a significant contribution in the search for a resolution. Asking for the original author's code might have been a more appropriate response in advocating a resolution to the matter at hand. Having different data is a problem every researcher has. You would have to admonish almost everyone who

  posts to Statalist because the root problem for their difficulties is essentially a different dataset, no?

The model (as specified by the authors) requires a single response vector with continuous and binary outcomes. They give the code to create this, and that is what I used. I created a data set that met this single criterion. I did not create a vector of all constants, all zeros, letters, or anything of the sort. I believe you might be minimizing the extent of the problem as I see it. While I could have easily made a mistake (I have admitted this several times), I do not think it is as simple a misunderstanding as you have described. The authors also give a sample view of their datasheet (10 or so observations) created with this code and it appears to match very similarly to what I created (albeit, not exactly, which we have already established is a limitation). I should find a way to post the URL of this so you all can view for yourselves what is available (I have tried before and my message would not post). I would then invite you to replicate something similar to see if it

  runs for you (I hope it would, then I know my error is indeed fixable).

I suppose the broader and main point with this is if the data has to be so particular (that only the original authors can truly produce what is needed to run), how useful and/or generalizable can this particular aspect of gllamm for multivariate modeling really be? And if the data generation is so sensitive, why not mention it as a limitation or significant caveat to model implementation?

-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Jeph Herrin
Sent: Friday, May 24, 2013 10:33 AM
To: [email protected]
Subject: Re: st: Rabe-Hesketh's gllamm: multivariate multilevel dropout model

The errors you are seeing are those that would be generated if -gllamm- could not find a solution. If the model you are specifying is not appropriate for your data, for example, which is the most likely explanation.

More generally, it is somewhat misleading to claim that you are trying to fit the 'exact same model' when you have a different dataset. An analogy would be if I showed you a slide where I estimated

   logit y x1 x2

with some results and then you tried to run the same model on a dataset where all the variables are constant - you would see lots of errors. If you wrote me to complain that my model wouldn't converge on your dataset, I would likely not respond either!

Hope this helps,
Jeph

On 5/23/2013 9:42 PM, Kyle Fluegge wrote:

I agree with your clarity and caution note. I am not assigning blame to anyone. Simply noting that a modeling framework has been marketed within gllamm by its authors, but does not appear to run. That is the only thing I can say with the information I have at my disposal. The error could be a data issue or some other error I have yet to recognize. I will say that other gllamm models I have run do work; this is not criticism of the gllamm framework as a whole, just in this particular case.

I do not have the dataset that Sophia used. That is a crucial detail as you note. My apologies for the confusion: the replica was meant to refer to syntax (which can indeed mean very little if the data is different). Within the Rabe-Hesketh note on this particular model, she and her co-authors provided coding to shape the data in a long form required to run the model. I used that code to shape the dataset I used. Everything on that front worked perfectly. I had presumed their "x" (used in their code) referred to one explanatory variable. I created an explanatory variable (the "x") and used that. Other that this alteration, everything else is the exact same. Whether that is the reason the model is not converging remains an open issue and perhaps worthy of further discussion on this list, I do not know.

I have attempted to follow-up with the authors regarding it (=the data issue), but to no avail. So I am left to interpret. I completely acknowledge the error could be (and probably is) my own; that is why I posted to the Statalist in attempt to resolve it with others' help. No suggestion is too minor.

I am not a very experienced Stata programmer, another limitation I wholly acknowledge. I do what I can, but programming errors are not as noticeable to me as other more experienced programmers.

-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Nick Cox
Sent: Thursday, May 23, 2013 9:21 PM
To: [email protected]
Subject: Re: st: Rabe-Hesketh's gllamm: multivariate multilevel
dropout model

I think we need complete clarity and considerable caution here.

Your previous post claimed that you are using an exact replica of Sophia [Rabe-Hesketh]'s model, except that you changed something. I don't know these models and so cannot judge whether your change was trivial or substantive, but on the face of it one of those statements is wrong or at least confusing.

Are you using exactly the same dataset as Sophia used? That is a crucial detail.

I certainly agree that exactly the same model on exactly the same dataset should produce the same results now with -gllamm- as in 2002, and if not there should be an explanation why. -gllamm- has changed and Stata has changed, meanwhile, and no one can be confident with large complicated programs that something might not have been broken.

I don't know how much experience you have in Stata programming, but I have some. There are certainly programs of mine in the public domain that might not converge with particular datasets; I've had that experience myself and typically conclude from graphical evidence that I was trying to get a cat to pretend it was a dog, and that was a bad idea. With your kind of model such checks are, as I understand it, typically not available.

It's my impression that Sophia gets far more requests for -gllamm- support than she can possibly handle. That's a tough call all round.
She's not an active member of Statalist.
Nick
[email protected]


On 24 May 2013 01:51, Kyle Fluegge <[email protected]> wrote:

The notable problem is that this is not my model, exactly. I have simulated the minimum number of variables to make it run. This is the model provided by Rabe-Hesketh and colleagues at Stata User Group Meeting in Maastricht, May 2002. Thus, not being able to replicate it may or may not signify a broader problem here. Hopefully, if others who have attempted to run it have noted similar problems, they can speak up within this list to contribute their alterations to the code I have provided or to provide incentive for Rabe-Hesketh and colleagues to perhaps clarify their 2002 work in a more general sense. The latter is what I think is really needed. I have not seen this model used in the literature (or at least from what I have read; there are probably papers out there somewhere), which may lend credibility to the fact that the gllamm simply cannot estimate a model like this, contrary to what Rabe-Hesketh and colleagues have proclaimed. Thank you for your assistance.

-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Nick Cox
Sent: Thursday, May 23, 2013 8:36 PM
To: [email protected]
Subject: Re: st: Rabe-Hesketh's gllamm: multivariate multilevel
dropout model

The short answer is likely to be that you are doing nothing wrong that we can identify for you.

-gllamm- (SSC) is a very general, indeed highly versatile, command that is more like a family of commands. However, many of the models it covers are difficult to fit -- or conversely many of the models are often applied to data that aren't suitable. Where to put the blame is an open and delicate matter. Naturally it is usually impossible to be clear about suitability before trying a fit, but having correct syntax is not a guarantee of anything but having correct syntax.

People who are familiar with your kind of model may well be able to
add more specific comments. Means of binary variables being very near
0 or very near 1 can be problematic.

The recent thread starting here has other advice, some specific:

http://www.stata.com/statalist/archive/2013-05/msg00665.html

Nick
[email protected]

On 24 May 2013 00:53, Kyle Fluegge <[email protected]> wrote:

Dear Statalisters,

I am attempting to model a multivariate multilevel dropout model with gllamm. The data set is in long form, with response vector including both binary and continuous data. As for notation, x_i1 is a dichotomous variable predicting the continuous outcome, i1 is variable denoting records within the substantive model, i2 is variable denoting records within the dropout/selection model (probit), y0_i2do is variable referring to concurrent continuous outcome's impact on dropout, and y1_i2 is lagged variable referring to previous continuous outcome's impact on current dropout. The model syntax is below (it is an exact replica of Rabe-Hesketh's dropout model):

gllamm resp x_i1 i1 y0_i2d0 i2 y1_i2, i(t id) eqs(eta1_1 eta2_1)
nocons  /* */ family(gauss binom) fv(var) link(ident probit) lv(var)
bmatrix(B) geqs(f1_1) frload(1) constr(1/5)/* */ nats nip(7) adapt
trace

When running this model, it is not converging and produces errors that "numerical derivatives are approximate" and "flat or discontinuous region encountered". I am curious to know what I am doing wrong. The only thing that I have changed from Rabe-Hesketh's model in the link is that x_i1 is a dichotomous explanatory variable (and that is because the model will not run without an "x"). Everything else is exactly the same. Why is this not running? I have contacted the authors of gllamm, who have not responded. Has anyone else been able to run this model as Rabe-Hesketh et al. have written and had success?

Sincerely,
    kyle



*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/



*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/



*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/



*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- RE: st: Rabe-Hesketh's gllamm: multivariate multilevel dropout model
  - From: tshmak <[email protected]>

References:
- st: Rabe-Hesketh's gllamm: multivariate multilevel dropout model
  - From: Kyle Fluegge <[email protected]>
- Re: st: Rabe-Hesketh's gllamm: multivariate multilevel dropout model
  - From: Nick Cox <[email protected]>
- RE: st: Rabe-Hesketh's gllamm: multivariate multilevel dropout model
  - From: Kyle Fluegge <[email protected]>
- Re: st: Rabe-Hesketh's gllamm: multivariate multilevel dropout model
  - From: Nick Cox <[email protected]>
- RE: st: Rabe-Hesketh's gllamm: multivariate multilevel dropout model
  - From: Kyle Fluegge <[email protected]>
- Re: st: Rabe-Hesketh's gllamm: multivariate multilevel dropout model
  - From: Jeph Herrin <[email protected]>
- RE: st: Rabe-Hesketh's gllamm: multivariate multilevel dropout model
  - From: Kyle Fluegge <[email protected]>

Prev by Date: Re: st: RE: Adding a text to a histogram
Next by Date: Re: st: Difficulty with importing form Excel
Previous by thread: RE: st: Rabe-Hesketh's gllamm: multivariate multilevel dropout model
Next by thread: RE: st: Rabe-Hesketh's gllamm: multivariate multilevel dropout model
Index(es):
- Date
- Thread