Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: nlogitrum and nlogit: unbalanced data


From   "Anders Alexandersson" <[email protected]>
To   [email protected]
Subject   Re: st: nlogitrum and nlogit: unbalanced data
Date   Tue, 9 Jan 2007 12:27:56 -0500

I suggest to to simplify the model by using some kind of constraints
like Rich Gates originally suggested; I tried the difficult option but
it didn't help. For example, using the restaurant data I added the
nlogitrum option -ivc(fast=1)- and John's last model converged after
only 4 iterations. How to justify one constraint over another, I guess
is a logical follow-up question to which I don't have a good answer.

Anders Alexandersson
[email protected]


John Fulton <[email protected]> wrote:
Anders Alexandersson requested a .do file and an example, as well as
suggested I try nlogitdn. [...]

Here is the set of data manipulation commands on my actual data.

One thing I should mention is that into the degenerate branch I group a
bunch of different covariate values: I am studying migrants, and people
who do not change residence over a time period are lumped together in
the degenerate branch.  People who do change residence are modeled as
choosing one of 48 U.S. state destinations.  The states are nested for
unobservable correlates of choice.

One effect of this is that the degenerate nest contains lots of
different values on the covariates, as opposed to the classic examples
where the degenerate branch is, e.g., "train."  I don't think this is a
problem, because degenerate "train," e.g., also is modeled with
different choice- and chooser-specific characteristics; so my approach
is more a question of degree than of kind.

Another effect is obviously that the choice sets vary between choosers.
Each person has C-1 non-degenerate choices, but any two individuals in
the population can share as few as C-1-1 choices in the nondegenerate
nests, and over the total population there are no mutually completely
shared choice sets.

That said, here's the code.  I've put pseudo-code in <> for the sake of
saving space and promoting clarity:

    *#delimit ;
    nlogitgen top = bottom(nonmovers3:99,
     movers:<a lot of values>);
    /* Note there are 49 values.  "Nonmovers" contains 1 value (99), and
    "movers" contains
       the other 48.
    */;
    nlogitgen middle = bottom(nonmovers2:99,
     group2_1:<31 or 32 unique values>,
     group2_2:<17 or 16 unique values>)
    /* Note the group2_1 and group2_2 values are mutually exclusive and
    exhaustive subsets
     of all the "movers" values listed in the "top" nest.
     The number of values summed across group2_1 and group2_2 is 48,
    with one choice from
     either nest categorized in the degenerate nest for each case.
    */;
    gen var2=(middle==2)*black1;
    gen var3=(top==2)*cpi1965;
    nlogit chosen (bottom=var1) (middle=var2) (top=var3) [fw=count],
    group(group_id);*

*    nlogitrum chosen var1 var2 var3 [fw=count], group(group_id)
nests(bottom middle top);*

nlogit appears to run the job fine.  At least, it gives no errors;
whether the results are useful is another matter I'm trying to determine.
nlogitrum gives an "unbalanced data" error.

I tried to recreate the problem using a degenerate nest in the
restaurant data:

    *nlogitgen type = restaurant(fast: 1,  family: 2| 3 | 4 | 5, fancy:
    6 | 7)*
    *gen incFast=(type==1)*income*
    *gen incFancy=(type==3)*income*
    *gen kidFast=(type==1)*kids*
    *gen kidFancy=(type==3)*kids*
    *nlogitrum chosen cost rating distance incFast incFancy kidFast
    kidFancy, group(family_id) nests(rest type)*

The result is not an "unbalanced data" error, but rather "not concave"
in the max likelihood function.  I don't let it run very long, so I
don't know if it ever converges; nor do I monkey with the max options.
Nlogit runs the model fine.

I didn't run nlogitdn because I was looking at the degenerate models in
table 5, for which Heiss doesn't use nlogitdn either (as far as I can tell).
In fact, unless I'm flying completely backwards here, nlogitdn is part
of the method one uses to "trick" nlogit (i.e. NNNL) into producing
rum-consistent estimations (i.e. RUMNL).  At least that's what Heiss
(SJ'02) appears to say.  It is not what one uses to structure data to
get nlogitrum to run.  But I am not very familiar with the "ins and
outs" of this.
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2025 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index