Shige Song <[email protected]> asks:
> Is there anyone here who use both Stata and TDA? I did a simple log-logistic
> hazard model in both packages, and I had two questions:
I've never used TDA, but am somewhat familiar with how Stata does survival
analysis.
> 1) The two packages give the same point estimatation with opposite direction
> (e.g. 3.13 in Stata but -3.13 in TDA), anyone can tell me why?
The log-logistic model is an accelerated failure time (AFT) model, and in
general there are two ways in which you can think of AFT models. Consider a
time-to-failure, tau_i, for baseline individual (one with covariates all eqaul
to zero), and a time-to-failure, t_i, for an individual with covariate x_i.
For both models, assume (without loss of generality) that beta is univariate
and estimated to be positive.
You can then think of an AFT model as either:
(A) A model in log-time:
ln(t_j) = x_i*beta + ln(tau_j)
Here, the effect of x_i is interpreted as: every unit increase in x_i will
result in ln(t_j) increasing by beta, which means we would have to wait
longer for failure over baseline. So, in effect, this is the reverse of
accelerated time. The effect of incrementing x_i is to decelerate time, i.e.
time passes more slowly.
(B) A more direct implementation of accelerated time:
t_j = exp(-x_i*beta)*tau_j
In this scenario, a larger x_i means a smaller exp(-x_i*beta), which in
turn means that any given time in the baseline metric, tau_j, corresponds
to a smaller time t_j in the regular metric, i.e. time accelerates by
increasing x_i.
If you take the log of both sides of (B), you'll see the sign flip. Stata
uses (A) (we like log-linear models); my guess is that TDA uses (B). It
really didn't matter which implementation we decided to use -- the important
thing is to be consistent accross models, which Stata is.
> 2) The log-likelihood reported by these two packages are dramatically
> different, although I was using the same model on the same data set (for
> example, Stata gave me 1350.692 but TDA gave me -15803.2146). Am I missing
> anything here?
Stata does not include scale terms that do not impact the optimization in its
log-likelihood calculations, i.e. additive terms that are only a function of
the data and not the parameters to be estimated. The advantage is that
rescalng your time measurements (say, from months to days) will not change
the value of the "log-likelihood."
If you want the true log-likelihood, you can always put this term back in.
Try the following just after fitting your model using -streg-:
. generate lnt = ln(_t)
. summarize lnt if _d==1, meanonly
. display e(ll) - r(sum)
In the unweighted case, this is the true log-likelihood.
As I was getting ready to send this off, Jesper B. Sorensen <[email protected]>
responded to this question. My comments are in agreement with his, except
perhaps the comment:
> In stata, the log-logistic is implemented as an accelerated failure time
> model, while in TDA it is a hazard rate model; hence the opposite signs
When I think of "hazard rate model," I think of a proporational hazards
model in the spirit of Cox,
h(t | x_i) = h_0(t)*exp(x_i*\beta) (*)
where h_0(t) is the baseline hazard and is free of x_i.
I don't exactly see how the log-logistic could be thought of as a "hazard
rate" model in any parameterization. The hazard function for an individual
with covariate x_i is, for lambda = exp(x_i*beta),
(1/lambda)^(1/gamma) * t^(1/gamma - 1)
h(t) = --------------------------------------
gamma*{1 + (t/lambda)^(1/gamma)}
and there is really no way to factor out lambda so that we can express h(t) in
the form of (*), regardless of whether we flip the sign on beta.
Again, I plead ignorance on the part of what TDA does, but I think the
flip in sign is more a case of what you want to call "acclerated" than a
case of AFT vs. proportional hazards.
--Bobby
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/