Mike Brewer <[email protected]> asks whether -ml- method -lf- is applicable in
a particular econometric choice modeling application. He knows that method
-d0- can be used, but prefers -lf- for speed.
Mike's data are organized in the typical way for fixed-effects logistic
regression (clogit) -- that is to say in long form where each possible choice
for each person is an "observation". Mike shows us data for three
individuals (id's) each facing 3 options (choice) where each choice as some
characteristic used as a covariate (attribute) and each individual chooses
only one on options (flagged by a 1 in the variable choice):
id choice attribute choose
1 1 5 0
1 2 10 1
1 3 15 0
2 1 6 1
2 2 12 0
2 3 18 0
3 1 7 0
3 2 14 0
3 3 21 1
Mike notes the -ml- manual suggests that such models (those similar to clogit)
are not amenable to method -lf- because they do not meet the linear form
restrictions. He goes on to ask,
> But is it not equally possible to arrange data in wide format, ie:
id choose attribute1 attribute2 attribute3
1 2 5 10 15
2 1 6 12 18
3 3 7 14 21
> and then code an lf evaluator?
>
> I'll answer my own question: I think it is logistically possible [...]
Yes, in this case, Mike can reorganize his data to wide form and get the same
likelihood optimized using method -lf- as he would optimize using -d0- with
the data in long form. If properly constructed the estimates should be the
same within the tolerance of machine precision for the two methods. Without
going over the likelihood for Mike's model, even with method -lf-, he will
need three equations, one for the attributes of each choice, because these
terms enter likelihood nonlinearly.
Things become more complicated if individuals face different choice sets. It
can still be done with -lf- in wide form, but you need a set of columns for the
attributes of each choice and individuals get columns full of 0's for any
choice they do not face. If the choice sets differ widely from individual to
individual, the dataset can get very wide and hard to manage. It is even more
complicated if individuals can make more than one choice. In such cases there
are m_i choose k_i contributions to the likelihood for each individual i
(where m_i are the choices faced by individual i and k_i the selected
choices) and that number of contributions large quickly. In such cases, the
"long" data organization is clearly preferred.
-- Vince
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/