Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Binary panel data questions
From
Kim Peeters <[email protected]>
To
"[email protected]" <[email protected]>
Subject
Re: st: Binary panel data questions
Date
Thu, 9 Feb 2012 09:27:52 -0800 (PST)
Dear Maarten,
Thank you for your reply. Concerning your data preparation / quality remark, it turns out that the data is correct. The ailment is not very common and once you suffer from it, it is very unlikely that it will ever cure.
In the meantime I fitted two different models.
First model: standard logistic regression including a time factor variable and clustered standard errors, allowing for intra-patient correlation
Logistic regression Number of obs = 4526
Wald chi2(21) = 62.63
Prob > chi2 = 0.0000
Log pseudolikelihood = -2889.4078 Pseudo R2 = 0.0690
(Std. Err. adjusted for 588 clusters in ID)
-----------------------------------------------------------------------------------------------
| Robust
Profitstatus | Coef. Std. Err. z P>|z| [95% Conf. Interval]
------------------------------+----------------------------------------------------------------
Year |
1995 | 0 (empty)
1996 | .3095819 .6289854 0.49 0.623 -.9232068 1.542371
1997 | .2287845 .3932461 0.58 0.561 -.5419637 .9995326
1998 | .2779752 .2760959 1.01 0.314 -.2631629 .8191133
1999 | .2198992 .2423209 0.91 0.364 -.2550409 .6948394
2000 | .2776958 .1964845 1.41 0.158 -.1074067 .6627984
2001 | .173692 .1671147 1.04 0.299 -.1538467 .5012308
2002 | -.0233154 .1418964 -0.16 0.869 -.3014272 .2547964
2003 | .0028641 .1155645 0.02 0.980 -.2236381 .2293663
2004 | .0281098 .0998883 0.28 0.778 -.1676678 .2238873
2005 | .0220868 .0823186 0.27 0.788 -.1392547 .1834282
2006 | .0470962 .0740874 0.64 0.525 -.0981124 .1923049
2007 | .008058 .0702793 0.11 0.909 -.1296869 .1458029
2008 | .0484251 .0671299 0.72 0.471 -.083147 .1799971
2009 | .0380139 .0655851 0.58 0.562 -.0905306 .1665584
2010 | 0 (omitted)
|
X |
2 | 1.20977 .3477654 3.48 0.001 .5281622 1.891377
3 | .7152767 .287351 2.49 0.013 .152079 1.278474
4 | .1813765 .2763467 0.66 0.512 -.3602532 .7230061
5 | 0 (empty)
6 | .750882 .3379602 2.22 0.026 .0884923 1.413272
|
Y | .2927133 .0971447 3.01 0.003 .1023131 .4831135
Z | -.9795072 .3057005 -3.20 0.001 -1.578669 -.3803452
A | -1.525683 .3984367 -3.83 0.000 -2.306604 -.7447611
_cons | .5056934 .3791922 1.33 0.182 -.2375097 1.248896
-----------------------------------------------------------------------------------------------
Note: 1 failure and 3 successes completely determined.
note: 1995.Year != 0 predicts success perfectly
1995.Year dropped and 2 obs not used
note: 5.X != 0 predicts failure perfectly
5.X dropped and 322 obs not used
note: 2010.Year omitted because of collinearity
Second model: -xtlogit- with random effects
Random-effects logistic regression Number of obs = 4850
Group variable: ID Number of groups = 624
Random effects u_i ~ Gaussian Obs per group: min = 2
avg = 7.8
max = 16
Wald chi2(8) = 37.88
Log likelihood = -379.48407 Prob > chi2 = 0.0000
-----------------------------------------------------------------------------------------------
Profitstatus | Coef. Std. Err. z P>|z| [95% Conf. Interval]
------------------------------+----------------------------------------------------------------
X |
2 | 1.952878 1.480678 1.32 0.187 -.9491981 4.854953
3 | 2.070424 1.322714 1.57 0.118 -.5220471 4.662896
4 | .2901392 1.324056 0.22 0.827 -2.304962 2.885241
5 | -55.3464 4613.941 -0.01 0.990 -9098.504 8987.812
6 | 3.294871 2.974562 1.11 0.268 -2.535163 9.124905
|
Y | .4409673 .1010158 4.37 0.000 .2429799 .6389547
Z | -1.308088 1.035164 -1.26 0.206 -3.336972 .7207964
A | -3.993116 2.400293 -1.66 0.096 -8.697604 .7113715
_cons | -.0651275 2.014896 -0.03 0.974 -4.01425 3.883995
------------------------------+----------------------------------------------------------------
/lnsig2u | 4.505505 .116975 4.276238 4.734772
------------------------------+----------------------------------------------------------------
sigma_u | 9.513887 .5564434 8.483466 10.66946
rho | .9649282 .0039586 .9562861 .971912
-----------------------------------------------------------------------------------------------
Likelihood-ratio test of rho=0: chibar2(01) = 5026.14 Prob >= chibar2 = 0.000
In the standard logistic regression, variables Y, Z and A are significant. However, in the random-effects panel data regression, only the Y variable is significant. The X variable result is also different. I did not expect the models to vary that much. Why are these models so different or am I doing something wrong?
Thank you!
Kind regards,
Kim
----- Original Message -----
From: Maarten Buis <[email protected]>
To: [email protected]
Cc:
Sent: Wednesday, February 8, 2012 10:27 AM
Subject: Re: st: Binary panel data questions
On Wed, Feb 8, 2012 at 1:19 AM, Kim Peeters wrote:
> Somewhat remarkably, it turns out that none of the participants in the study experienced a transition from one state to the other state (e.g. transition from no ailment to ailment and vice versa). In other words, all patients that did not suffer from the illness at the onset of the study remained disease-free and all patients that did suffer from the illness at the onset of the study continued to be ill.
>
> Originally, I planned to use -xtlogit- with fixed effects to control for unobserved influences that differ between patients but remain constant in a given patient. However, since none of patients experienced a transition, Stata correctly returns error code 2000: outcome does not vary in any group.
>
> At the moment, I do not know which statistical technique would be the most appropriate. Recall that I try to test for a relationship between the outcome (no illness vs. illness) and a group of independent variables. I thought about running a logistic regression with clustered standard errors (i.e. vce(cluster ID)). However, I do not want to discard the time dimension in the panel data and I would to correct for potential omitted variable bias.
In essence you do not have panel data, you could just as well use the
first observation in each person and do a regular -logit-. I just
don't think there is any more information present in your data, and no
amount of fancy modeling can invent information that isn't present in
the data.
I would really check again whether that constant disease status isn't
some error during data preparation or some artifact of the way the
data was collected, as that a) sounds really suspicious and b) is
causing you this problem.
Hope this helps,
Maarten
--------------------------
Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen
Germany
http://www.maartenbuis.nl
--------------------------
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/