Date: Fri, 23 Sep 2005 11:55:51 +0200
From: Gijs Dekkers <[email protected]>
Subject: st: unobserved hetereogeneity and duration: interpreting pgmhaz8 and xtclog
Dear fellow Stata-users,
I am estimating a discrete duration model, explaining the probability
that a cohabiting (unmarried) individual (cohab=1) separates
i.e. no longer consensual union and not married after a certain time
(the variable 'duration'). The dataset is the European Comunity
Household Panel ECHP.
The variables are
pid: unique person identifier
duration: time (years)
cosep: 0 if the individual lives in consensual union, 1=if (s)he
does not live in consensual union (and is not married)
The data is of the following form:
+----------------------------+
| pid duration cosep |
|----------------------------|
1. | 1028101 1 0 |
2. | 1028101 2 0 |
3. | 1028105 1 0 |
4. | 1028105 2 0 |
5. | 2053101 1 0 |
|----------------------------|
6. | 2053102 1 0 |
7. | 3023101 1 0 |
8. | 3023101 2 0 |
9. | 3023101 3 1 |
etc...
A first analysis (somewhat dissapointingly) showed that the only
significant explanatory variables are a function of duration. In fact,
the best model explains 'cosep' using 'duration' and its quadrature
'duration2'
. cloglog cosep duration duration2
Iteration 0: log likelihood = -377.28398
Iteration 1: log likelihood = -377.24866
Iteration 2: log likelihood = -377.24865
Complementary log-log regression Number of obs
= 2410
Zero outcomes
= 2319
Nonzero outcomes
= 91
LR chi2(2) =
20.35
Log likelihood = -377.24865 Prob > chi2 =
0.0000
- ------------------------------------------------------------------------------
cosep | Coef. Std. Err. z P>|z| [95% Conf.
Interval]
- -------------+----------------------------------------------------------------
duration | .8310887 .2263603 3.67 0.000 .3874307
1.274747
duration2 | -.0825692 .0272601 -3.03 0.002 -.135998
- -.0291405
_cons | -4.782091 .4091147 -11.69 0.000 -5.583941
- -3.980241
- ------------------------------------------------------------------------------
However, I want to test for various parametric forms of frailty, using
Jenkins' Lesson 7 on 'unobserved heterogeneity'
(http://www.iser.essex.ac.uk/teaching/degree/stephenj/ec968/#_Toc520705914).
First, he suggests to test for heterogeneity assuming a normally
distributed frailty term (page 14).
. xtclog cosep duration duration2, nolog i(pid)
Random-effects complementary log-log model Number of obs
= 2410
Group variable (i): pid Number of groups
= 739
Random effects u_i ~ Gaussian Obs per group: min
= 1
avg
= 3.3
max
= 8
Wald chi2(2) =
18.20
Log likelihood = -377.24865 Prob > chi2 =
0.0001
- ------------------------------------------------------------------------------
cosep | Coef. Std. Err. z P>|z| [95% Conf.
Interval]
- -------------+----------------------------------------------------------------
duration | .8310886 .2263602 3.67 0.000 .3874307
1.274747
duration2 | -.0825692 .0272601 -3.03 0.002 -.135998
- -.0291404
_cons | -4.782091 .4091147 -11.69 0.000 -5.583941
- -3.980241
- -------------+----------------------------------------------------------------
/lnsig2u | -14 .
. .
- -------------+----------------------------------------------------------------
sigma_u | .0009119 .
. .
rho | 5.06e-07 .
. .
- ------------------------------------------------------------------------------
Likelihood-ratio test of rho=0: chibar2(01) = 0.00 Prob >= chibar2 =
1.000
Now this already looks pretty strange to me, or is it my suspicious
mind? Can I safely coclude that the hypothesis of normally distributed
unobserved heterogeneity shoud (very much) be rejected?
Secondly, I used pgmhaz8 to test for gamma-distributed unobserved
heterogeneity. I found the pgmhaz8-manual at
http://ideas.repec.org/c/boc/bocode/s438501.html
If I understand this manual correctly (but I am not quite sure), the
model should be
. pgmhaz8 duration2, id(pid) dead(cosep) seq(duration)
(anyway, the model pgmhaz8 duration duration2 etc. does not converge)
The results are:
PGM hazard model without gamma frailty
Generalized linear models No. of obs
= 2410
Optimization : ML Residual df
= 2408
Scale parameter
= 1
Deviance = 769.387838 (1/df) Deviance =
.3195132
Pearson = 2400.608032 (1/df) Pearson =
.9969302
Variance function: V(u) = u*(1-u) [Bernoulli]
Link function : g(u) = ln(-ln(1-u)) [Complementary log-log]
AIC =
.3209078
Log likelihood = -384.693919 BIC =
- -17982.63
- ------------------------------------------------------------------------------
| OIM
cosep | Coef. Std. Err. z P>|z| [95% Conf.
Interval]
- -------------+----------------------------------------------------------------
duration2 | .0135626 .0055006 2.47 0.014 .0027817
.0243435
_cons | -3.456552 .1401122 -24.67 0.000 -3.731167
- -3.181937
- ------------------------------------------------------------------------------
Iteration 0: log likelihood = -385.00279
Iteration 1: log likelihood = -384.79069
Iteration 2: log likelihood = -384.73062
Iteration 3: log likelihood = -384.70334
Iteration 4: log likelihood = -384.69612
Iteration 5: log likelihood = -384.6944
Iteration 6: log likelihood = -384.69403
Iteration 7: log likelihood = -384.69394
Iteration 8: log likelihood = -384.69392
Iteration 9: log likelihood = -384.69392
Iteration 10: log likelihood = -384.69392
PGM hazard model with gamma frailty Number of obs
= 2410
LR chi2()
= .
Log likelihood = -384.69392 Prob > chi2
= .
- ------------------------------------------------------------------------------
cosep | Coef. Std. Err. z P>|z| [95% Conf.
Interval]
- -------------+----------------------------------------------------------------
hazard |
duration2 | .0135593 .0055347 2.45 0.014 .0027114
.0244072
_cons | -3.456778 .141079 -24.50 0.000 -3.733287
- -3.180268
- -------------+----------------------------------------------------------------
ln_varg |
_cons | -13.77345 952.569 -0.01 0.988 -1880.774
1853.228
- -------------+----------------------------------------------------------------
Gamma var. | 1.04e-06 .0009935 0.00 0.999
0 .
- ------------------------------------------------------------------------------
LR test of Gamma var. = 0: chibar2(01) = -8.9e-06 Prob.>=chibar2
= .5
And here it is again: analogous to the results from the xtclog, the
hypothesis of gamma-distributed unobserved heterogeneity should be
rejected. However, again like the xtclog results, the above results of
pgmhaz8 suspiciously look like some sort of corner solution, or an
artefact.
And this (finally!) brings me to my question: can I trust these results
and safely conclude that the hypotheses of unobserved hetereogeneity
(both normally and gamma-distributed) should be rejected? Or is there
something else going on? If so, any suggestions?
Any help would be appreciated!
Gijs