Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: nlogitrum and nlogit: unbalanced data


From   John Fulton <[email protected]>
To   [email protected]
Subject   Re: st: nlogitrum and nlogit: unbalanced data
Date   Tue, 09 Jan 2007 15:14:37 -0500

Anders Alexandersson suggested constraining the IV parameter(s):
Thank you very much Anders.

Do you happen to know of any papers in which different constraints are used, and their results compared?
I am particularly concerned with the effects of constraints on the unconstrained coefficients of the other nests.

In my original nlogit model, I used no constraints, yet the IV params for the degenerate nest came out at "1" anyway.
Would it speed things up in the future to specify this restriction ahead of time? I could sure use the speed, especially since
this is starting to look like an exercise in fitting.

Thanks again,
John.



Anders Alexandersson wrote:

I suggest to to simplify the model by using some kind of constraints
like Rich Gates originally suggested; I tried the difficult option but
it didn't help. For example, using the restaurant data I added the
nlogitrum option -ivc(fast=1)- and John's last model converged after
only 4 iterations. How to justify one constraint over another, I guess
is a logical follow-up question to which I don't have a good answer.

Anders Alexandersson
[email protected]


John Fulton <[email protected]> wrote:
Anders Alexandersson requested a .do file and an example, as well as
suggested I try nlogitdn. [...]

Here is the set of data manipulation commands on my actual data.

One thing I should mention is that into the degenerate branch I group a
bunch of different covariate values: I am studying migrants, and people
who do not change residence over a time period are lumped together in
the degenerate branch. People who do change residence are modeled as
choosing one of 48 U.S. state destinations. The states are nested for
unobservable correlates of choice.

One effect of this is that the degenerate nest contains lots of
different values on the covariates, as opposed to the classic examples
where the degenerate branch is, e.g., "train." I don't think this is a
problem, because degenerate "train," e.g., also is modeled with
different choice- and chooser-specific characteristics; so my approach
is more a question of degree than of kind.

Another effect is obviously that the choice sets vary between choosers.
Each person has C-1 non-degenerate choices, but any two individuals in
the population can share as few as C-1-1 choices in the nondegenerate
nests, and over the total population there are no mutually completely
shared choice sets.

That said, here's the code. I've put pseudo-code in <> for the sake of
saving space and promoting clarity:

*#delimit ;
nlogitgen top = bottom(nonmovers3:99,
movers:<a lot of values>);
/* Note there are 49 values. "Nonmovers" contains 1 value (99), and
"movers" contains
the other 48.
*/;
nlogitgen middle = bottom(nonmovers2:99,
group2_1:<31 or 32 unique values>,
group2_2:<17 or 16 unique values>)
/* Note the group2_1 and group2_2 values are mutually exclusive and
exhaustive subsets
of all the "movers" values listed in the "top" nest.
The number of values summed across group2_1 and group2_2 is 48,
with one choice from
either nest categorized in the degenerate nest for each case.
*/;
gen var2=(middle==2)*black1;
gen var3=(top==2)*cpi1965;
nlogit chosen (bottom=var1) (middle=var2) (top=var3) [fw=count],
group(group_id);*

* nlogitrum chosen var1 var2 var3 [fw=count], group(group_id)
nests(bottom middle top);*

nlogit appears to run the job fine. At least, it gives no errors;
whether the results are useful is another matter I'm trying to determine.
nlogitrum gives an "unbalanced data" error.

I tried to recreate the problem using a degenerate nest in the
restaurant data:

*nlogitgen type = restaurant(fast: 1, family: 2| 3 | 4 | 5, fancy:
6 | 7)*
*gen incFast=(type==1)*income*
*gen incFancy=(type==3)*income*
*gen kidFast=(type==1)*kids*
*gen kidFancy=(type==3)*kids*
*nlogitrum chosen cost rating distance incFast incFancy kidFast
kidFancy, group(family_id) nests(rest type)*

The result is not an "unbalanced data" error, but rather "not concave"
in the max likelihood function. I don't let it run very long, so I
don't know if it ever converges; nor do I monkey with the max options.
Nlogit runs the model fine.

I didn't run nlogitdn because I was looking at the degenerate models in
table 5, for which Heiss doesn't use nlogitdn either (as far as I can tell).
In fact, unless I'm flying completely backwards here, nlogitdn is part
of the method one uses to "trick" nlogit (i.e. NNNL) into producing
rum-consistent estimations (i.e. RUMNL). At least that's what Heiss
(SJ'02) appears to say. It is not what one uses to structure data to
get nlogitrum to run. But I am not very familiar with the "ins and
outs" of this.
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2025 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index