I am puzzling from what I judge as diverging results (different sign) of
interaction terms in a multinomial logit model and predicted probabilities,
as generated through prtab and shown graphically through prgen and graph.
I am doing research on the returns of human capital investment in terms of
occupational attainment. For some theoretical reasons, my dependent variable
(occup_att_2, see below) is built as follows:
1. Managers
2. Professionals
3. Associate Professionals
4. Clerks,
5. Lower service and other occupations
?Clerks? is my reference category in the dependent variable.
I have applied a multinomial logit model to the sample of one of my national
cases of study. My data set is the result of merging different
cross-sectional surveys corresponding to eight different years; I am using
labour force surveys for up to eight years.
Since I am especially interested in looking at the TREND in the returns of
human capital investment, I have made interactions of the variable ?year?
(capturing the different years included in the data) and educational
Here, I present the results of one my models. I have excluded the
coefficients corresponding to other indep vars I'm not so interested in.
. xi: mlogit occup_att_2 i.tert_ed*year_3 sex_2 mstatus_2 age national_2_2
national_2_3 tenure per
> m_2_2 perm_2_3, b(4) nolog
i.tert_ed _Itert_ed_1-5 (naturally coded; _Itert_ed_3 omitted)
i.tert~d*year_3 _IterXyear__# (coded as above)
Multinomial logistic regression Number of obs =
LR chi2(68) =
Prob > chi2 =
Log likelihood = -414328.03 Pseudo R2 =
occup_att_2 | Coef. Std. Err. z P>|z| [95% Conf.
Managers |
_Itert_ed_1 | .8369161 .1164738 7.19 0.000 .6086317
_Itert_ed_2 | -.5057224 .1448549 -3.49 0.000 -.7896329
_Itert_ed_4 | -.1405644 .1441132 -0.98 0.329 -.4230211
_Itert_ed_5 | .4043363 .1040787 3.88 0.000 .2003458
year_3 | .0006106 .009962 0.06 0.951 -.0189145
_IterXyear~1 | .024218 .0122985 1.97 0.049 .0001133
_IterXyear~2 | .033616 .0151505 2.22 0.026 .0039216
_IterXyear~4 | -.0013059 .0143873 -0.09 0.928 -.0295046
_IterXyear~5 | .000611 .0112359 0.05 0.957 -.0214109
_cons | -3.208653 .0965292 -33.24 0.000 -3.397847
Profession~s |
_Itert_ed_1 | 3.870921 .1599636 24.20 0.000 3.557398
_Itert_ed_2 | -.5270488 .1966783 -2.68 0.007 -.9125312
_Itert_ed_4 | -.3029058 .2470132 -1.23 0.220 -.7870428
_Itert_ed_5 | -2.130236 .24443 -8.72 0.000 -2.60931
year_3 | -.0705025 .0171069 -4.12 0.000 -.1040313
_IterXyear~1 | .058865 .0177997 3.31 0.001 .0239782
_IterXyear~2 | .1749493 .021033 8.32 0.000 .1337253
_IterXyear~4 | .0403727 .0249201 1.62 0.105 -.0084698
_IterXyear~5 | .0838841 .0262562 3.19 0.001 .0324228
_cons | -3.485732 .1563276 -22.30 0.000 -3.792128
Associate ~s |
_Itert_ed_1 | .5250349 .088949 5.90 0.000 .3506981
_Itert_ed_2 | .2204853 .0970563 2.27 0.023 .0302584
_Itert_ed_4 | .219423 .1074958 2.04 0.041 .0087351
_Itert_ed_5 | -.2276642 .0876818 -2.60 0.009 -.3995174
year_3 | .0345487 .0073366 4.71 0.000 .0201691
_IterXyear~1 | .0072084 .0093549 0.77 0.441 -.0111268
_IterXyear~2 | .0260399 .0102047 2.55 0.011 .0060391
_IterXyear~4 | -.0186982 .0106685 -1.75 0.080 -.0396081
_IterXyear~5 | -.0133634 .0093698 -1.43 0.154 -.0317279
_cons | -.8378662 .0728109 -11.51 0.000 -.980573
Low servic~r |
_Itert_ed_1 | -.6625195 .0883424 -7.50 0.000 -.8356674
_Itert_ed_2 | .7419491 .0849377 8.74 0.000 .5754743
_Itert_ed_4 | 2.149201 .0871306 24.67 0.000 1.978429
_Itert_ed_5 | 2.418502 .0701088 34.50 0.000 2.281091
year_3 | .0643842 .0063551 10.13 0.000 .0519284
_IterXyear~1 | -.0171698 .0091566 -1.88 0.061 -.0351165
_IterXyear~2 | -.0339005 .0089674 -3.78 0.000 -.0514763
_IterXyear~4 | -.1356851 .0087742 -15.46 0.000 -.1528822
_IterXyear~5 | -.0473144 .0075397 -6.28 0.000 -.0620919
_cons | .2592752 .0623827 4.16 0.000 .1370073
(occup_att_2==Clerks is the base outcome)
As you see, the coefficient of the interaction of time (year_3) and the
dummy variable corresponding to the highest educational attainment
(university degree) has a positive sign for the category 'Professionals' in
the dependent variable. A university degree not only seems to increase the
likelihood of being in this category, vis-à-vis the category of reference,
but also that time seems to have an effect increasing this likelihood
(versus the likelihood of increasing the possibility of finding yourself in
the reference category (?Clerks?).
For the sake of presenting graphically this trend, a) I have run another
multinomial logistic model excluding interactions of time and educational
attainment dummies. Please, note that I have JUST excluded the interactions
of time and educational attainment from the previous model; apart from that,
both models are identical.
b) I have used the prgen command to generate the predicted probabilities
corresponding to the variable 'year_3' time when the dummy variable
corresponding to university degree (_Itert_ed_1) is 1, the other dummies
corresponding to other educational attainment levels are 0 and (by default)
the rest of independent variables are kept to the mean;
prgen year_3, x(_Itert_ed_1=1 _Itert_ed_2=0 _Itert_ed_4=0 _Itert_ed_5=0)
f(6) t(13) gen(univ)
and c) I have generated graph by means of...
graph twoway (scatter univp1 univp2 univp3 univp5 univp4 univx, connect(l l
l l l) xtitle(University) ytitle(probability))
Now, the trend devised by the graph (not show here) reveals a DECLINING
expected probability of being 'Professional' when you have a university
It corresponds to the decreasing predicted probabilities which appear when I
run the prtab command as follows
prtab _Itert_ed_1 year_3, x(_Itert_ed_2=0 _Itert_ed_4=0 _Itert_ed_5=0)
...I just show the predicted probabilities for the category 'Professionals'
in the dependent variable
mlogit: Predicted probabilities for occup_att_2
Predicted probability of outcome 2 (Professionals)
tert_ed== | year_3
1 | 6 7 8 9 10 11 12 13
0 | 0.0248 0.0240 0.0232 0.0225 0.0217 0.0210 0.0203 0.0197
1 | 0.6741 0.6662 0.6580 0.6498 0.6414 0.6329 0.6242 0.6155
Now my question comes. I do not understand that such decreasing
probabilities appear when the interaction of year_3 and _Itert_ed_1 has
shown before (initial model) to be positive. How could I interpret this
discordance? How is it possible?
As suggested in the guidelines of Statalist, I have looked for help in the
Statalist itself, but I'm afraid I'm stuck with this problem.
I would very much appreciate your help on this.
In any case, my apologies for the query, if it results too long, and my
gratitude for your attention, if you have reached this point.
Luis Ortiz
Profesor Agregado
Departament de Ciencies Polítiques i Socials
Universitat Pompeu Fabra
Ramon Trias Fargas, 25-27
08005 Barcelona
Phone: +34-93-5422368
Fax: +34-93-5422372
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/