Thank you all very much for all these comments. Needless to say, I'm
learning a lot.
On my specific model: it might easily be the case that this is the
wrong approach. But in particular, the models that converged (with
-fmm-) had mean mixing probabilities around 50-70% for one of the
components.
In the meantime, we realized that it might make much more sense to
frame our data as a latent class problem. Not simply because that is a
generalization of mixture models, but because we actually have
identifiers on the units that must have determined which model applies
to a given individual observation.
In the specific setting, some intermediaries were trained to implement
our experimental treatment, and we have serious doubts that many did
not comply. But then whole clusters are non-compliers, so mixing is
not on the observation level. I hope it means that we need a latent
class model and I can estimate one using, say, -gllamm-. It would be
nice to still have some power: with -fmm-, we had surprisingly large
standard errors even in a sample of more than 30,000. We had cca. 1500
clusters though.
Thank you again,
Laszlo
On Thu, Sep 17, 2009 at 6:02 PM, Verkuilen, Jay <[email protected]> wrote:
>
> As I recall, EM is really good when you're quite far away from the solution but is not so good near it, and thus it is very useful for getting a decent solution to refine with Newton. But nothing is going to be all that great when you have a multimodal likelihood, which is why "protect" optimizations and multi-starts, going all the way to simulated annealing and other such methods are good ideas. Even in the comparatively benign world of SEM doing protect optimizations (which Stata does for you upon request in the exploratory factor analysis program -factor-) is a really good idea, and something many programs don't enable.
>
> Partha Deb already indicated that big problems with -fmm- (and indeed other latent variable procedures) are most likely to happen when you have a poor model. Example: I know on some data I use for class to illustrate a factor analysis (and hence has been analyzed A LOT, both by me and dozens of students) there is a valid solution that's a local optimum while what I believe the global optimum to be is a boundary solution with a variance = 0. This model is mis-specified because it asks for an extra factor and so it's where trouble is likely. I found it by using something like 1000 random protect optimizations. Stata usually finds the interior point but every once in a while....
>
> I'd say that one area where there's been a lot of attention to this issue is in the multidimensional scaling world, because the objective functions in nonmetric MDS are wretched and plagued by local optima. The book by Patrick Groenen and Ingwer Borg (Modern Multidimensional Scaling, 2nd Edition, Springer, 2005) has a lot of discussion of this issue. I believe what they recommend is to try many different random configurations for a relatively small number of iterations (cheap to compute) and refine from the more promising locations.
>
> Jay
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/