Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: R: st: Population attributable fractions (PAFs) in discrete-time survival analysis. -punaf-


From   "Roger B. Newson" <[email protected]>
To   [email protected]
Subject   Re: R: st: Population attributable fractions (PAFs) in discrete-time survival analysis. -punaf-
Date   Fri, 16 Aug 2013 16:15:00 +0100

The -punaf- package starts by calling -margins- )with the -post- option) to estimate the 2 scenario means, and then calls -nlcom- (again with the -post- option) to estimate the logs of these scenario means and their ratio (the PUF), and then uses the estimation results from -nlcom- to compute the confidence interval for the PAF using the formula

PAF=1-PUF

to do an end-point transformation. The output

expression (log(_b[_cons])) evaluates to missing

looks as if it comes from the -nlcom- step. So, something seems to be wrong with log(_b(_cons)), as computed by -nlcom- from the estimation results left by -margins- with the -post- option.

The -margins- command (at least in Stata Version 12) sometimes produces parameters whose parameter name is _b[_cons], even when the user does not specify a -noconst- option for the original regression command. It might help if I knew which version of Stata you were using, and which version of -punaf- you were using. To find the -punaf- version, type in Stata

which punaf

And it might help even more if you could email to me, privately, a specimen dataset, and a specimen Stata do-file, which produces the quoted error. (Statalist rules do not allow attachments of any kind.)

Best wishes

Roger


Roger B Newson BSc MSc DPhil
Lecturer in Medical Statistics
Respiratory Epidemiology and Public Health Group
National Heart and Lung Institute
Imperial College London
Royal Brompton Campus
Room 33, Emmanuel Kaye Building
1B Manresa Road
London SW3 6LR
UNITED KINGDOM
Tel: +44 (0)20 7352 8121 ext 3381
Fax: +44 (0)20 7351 8322
Email: [email protected]
Web page: http://www.imperial.ac.uk/nhli/r.newson/
Departmental Web page:
http://www1.imperial.ac.uk/medicine/about/divisions/nhli/respiration/popgenetics/reph/

Opinions expressed are those of the author, not of the institution.

On 16/08/2013 12:26, Angelo Belardi wrote:
Hello Roger,

Thanks for your interpretation of the problem. However, it seems not
to solve the issue.

Together with my colleagues I looked through the output of our main
analyses. None of our linear predictors are very large and negative,
which you described as a possible cause of this error message.

We still assume that the problem may be connected to the use of the
-noconstant- option in the -cloglog- functions, because -punaf- seems
to work fine if this option is not needed. We think that -noconstant-
may alter the output from -cloglog- in a way which makes it
incompatible with the -punaf- function.
However, I don't know how -punaf- reads in the output of -cloglog-
exactly or how the structure of the stored values get changed by
-noconstant-.

Best regards,
Angelo


Angelo Belardi
Ambizione research group (SNSF)
Department of Clinical Psychology and Psychiatry
University of Basel
Missionsstrasse 60/62
CH-4055 Basel, Switzerland
Email: [email protected]



2013/8/5 Roger B. Newson <[email protected]>

Hello Angelo

What appears to be happening here is that one or other of your scenario prevalences is being evaluated to a non-positive quantity (zero or even negative). The scenario prevalence for a -cloglog- model is a mean of predicted values of the form

1-exp(-exp(z))

where z is the linear predictor of a -cloglog- model (ie the sum of the beta*X terms). For this to be non-positive, exp(-exp(z)) must be at least 1, implying that -exp(z) must be non-negative, implying that exp(z) seems to be evaluating to zero. This will happen if z (the linear predictor) is very large and negative. And, presumably, this can happen if one or more of your fitted betas "converges" to plus or minus infinity.

I do not know how you have estimated extremely large negative linear predictors. And I do not know what a "fully non-parametric baseline hazard function" is, in a -cloglog- model. However, that is what appears to be happening, for whatever reason.

If you are fitting a binomial model where fitted values (ie fitted binomial proportions) may sometimes be zero, then you should possibly be measuring scenario differences (using -regpar-) instead of scenario ratios (using -punaf-). However, I am not sure exactly what question you are trying to answer.


I hope this helps.

Best wishes

Roger

Roger B Newson BSc MSc DPhil
Lecturer in Medical Statistics
Respiratory Epidemiology and Public Health Group
National Heart and Lung Institute
Imperial College London
Royal Brompton Campus
Room 33, Emmanuel Kaye Building
1B Manresa Road
London SW3 6LR
UNITED KINGDOM
Tel: +44 (0)20 7352 8121 ext 3381
Fax: +44 (0)20 7351 8322
Email: [email protected]
Web page: http://www.imperial.ac.uk/nhli/r.newson/
Departmental Web page:
http://www1.imperial.ac.uk/medicine/about/divisions/nhli/respiration/popgenetics/reph/

Opinions expressed are those of the author, not of the institution.

On 05/08/2013 10:33, Angelo Belardi wrote:

Thanks again for your precise answers.

I have now tried to run -punaf- after my -cloglog- analyses. However,
-punaf- encountered a problem.
The error message that comes up is: "expression (log(_b[_cons]))
evaluates to missing".

I assume that this might be connected to my use of the -noconstant-
option in the -cloglog- commands. For analyses where the calculations
are possible to run without the -nocons- option, -punaf- also gives me
reasonable results and no error message.
However, from what I know I have to use this option because of my
fully non-parametric baseline hazard function.

Is it possible that -punaf- has a problem with that or might the error
be due to something else? How could I solve this issue?

Best regards,
Angelo



Angelo Belardi
Ambizione research group (SNSF)
Department of Clinical Psychology and Psychiatry
University of Basel
Missionsstrasse 60/62
CH-4055 Basel, Switzerland
Email: [email protected]


2013/7/21 Roger B. Newson <[email protected]>

In reply to Angelo's queries:

A. You can indeed use -punaf- after -cloglog-. (Or you should be able to do so - let me know if you have any problems.) However, the interpretation of the attributable and unattributable fractions will then be similar to the interpretation of these parameters when you use -punaf- after -logit- or -logistic-. It is probably not a good idea to use -punafcc- after -cloglog-. And -punafcc- should probably not be used after -logit- or -logistic-, except if your data are from a case-control study (for which -punafcc- was written). After a Cox regression, you may use either -punaf- or -punafcc-, depending on what kind of population unattributable and attributable fractions you wanted to estimate (ie my kind or Samuelson and Eider's kind).

B. If you are working with a dataset with 1 observation per person per period, and the outcome variable is binary, then you should use an estimation command that allows for the clustering of person-periods by persons. For instance, you might use -xtgee-, or you might use -logit-, -logistic-, or -cloglog- with an option like -vce(cluster person)-. The interpretation of the population unattributable and attributable fractions will then be the same as when -punaf- is used after binary data. That is to say, the PAF (or PUF) will be the fraction of the binary outcomes equal to 1 that is attributable (or unattributable) to living in Scenario 0 instead of Scenario 1.

C. The WHO definition of a PAF is an extremely simple special case of the -punaf- definition of a PAF, for the special case of a binary outcome variable, a discrete-valued exposure variable with n levels, and no concomitant (or confounder) variables. And the WHO also assumes that "Scenario 0" is the real world that we live in, and that "Scenario 1" is a user-specified ideal scenario (eg a dream scenario where the whole world stopped smoking, or a dream scenario where the current smokers become ex-smokers, or a more realistic dream scenario where only a proportion of the current smokers quit smoking). The P_i specified by the WHO are the proportions of the population at the i'th exposure level in the real world (Scenario 0). And the P'_i are the proportions of the population that would have the i'th exposure level in the dream scenario (Scenario 1). And the RR_i are the relative risks (ie rate ratios) associated with the comparing the ith exposure level to the lowest expo
!
  su

!

   re level. So, the -punaf- definition is a generalization of the WHO definition. There seems to be some controversy about how best to generalize the concept of a PAF (or a PUF) to the case of a Cox regression. (At least, I had a different idea from Samuelson and Eide.)



I hope this helps.

Best wishes

Roger

Roger B Newson BSc MSc DPhil
Lecturer in Medical Statistics
Respiratory Epidemiology and Public Health Group
National Heart and Lung Institute
Imperial College London
Royal Brompton Campus
Room 33, Emmanuel Kaye Building
1B Manresa Road
London SW3 6LR
UNITED KINGDOM
Tel: +44 (0)20 7352 8121 ext 3381
Fax: +44 (0)20 7351 8322
Email: [email protected]
Web page: http://www.imperial.ac.uk/nhli/r.newson/
Departmental Web page:
http://www1.imperial.ac.uk/medicine/about/divisions/nhli/respiration/popgenetics/reph/

Opinions expressed are those of the author, not of the institution.

On 16/07/2013 23:20, Angelo Belardi wrote:


Roger, thanks a lot for the detailed answers and all the effort.

After a discussion with my colleagues, I have a few follow-up
questions on the subject:

A:  In your last reply you spoke about Cox regression. Would these
statements also apply to hazard models with a
non-parametric baseline hazard function (using -cloglog-)?

B: We work with person-period formatted datasets we got from
reorganising our initial data. Does that have an influence on the
results we get out of -punaf- or can the results be interpreted
similarly?

C: How would the resulting AHFs have to be interpreted? Are they
time-independent as suggested by Samuelsen and Eide (2008) in their
Equation 4? And could these be interpreted in line with the WHO
definition of PAFs, as a "proportional reduction in the hazard ratio"?


Best regards and thanks already for any further help
Angelo


References:
- Sven Ove Samuelsen and Geir Egil Eide. 2008. Attributable fractions with
survival data. Statistics in Medicine 2008; 27:1447–1467.
http://onlinelibrary.wiley.com/doi/10.1002/sim.3022/abstract
- WHO definition of population attributable fraction,
http://www.who.int/healthinfo/global_burden_disease/metrics_paf/en/index.html



Angelo Belardi
Ambizione research group (SNSF)
Department of Clinical Psychology and Psychiatry
University of Basel
Missionsstrasse 60/62
CH-4055 Basel, Switzerland
Email: [email protected]





2013/7/1 Roger B. Newson <[email protected]>



PS I have had a look at the Sauelsen and Eide paper, and would like to make a minor correction. The AHF of Equation 4 looks like the PAF that you would get by using -punaf- after a Cox regression, and is equal (in their notation) to

AHF = 1 - E[exp(beta'Z*)]/E[exp(beta'Z)]

where Z is the covariate vector in the real-world scenario, and Z* is the covariate vector in the fantasy-intervention scenario. If you use -punafcc- after a Cox regression, then you should instead get

PAF = 1 - E[exp(beta'Z*)/exp(beta'Z)]

which is not exactly the same thing. However, whichever formula we use, we should probably use the option -vce(unconditional)- if we use it after a Cox regression, because the covariates at the time of each death are subject to sampling error.


Best wishes

Roger

Roger B Newson BSc MSc DPhil
Lecturer in Medical Statistics
Respiratory Epidemiology and Public Health Group
National Heart and Lung Institute
Imperial College London
Royal Brompton Campus
Room 33, Emmanuel Kaye Building
1B Manresa Road
London SW3 6LR
UNITED KINGDOM
Tel: +44 (0)20 7352 8121 ext 3381
Fax: +44 (0)20 7351 8322
Email: [email protected]
Web page: http://www.imperial.ac.uk/nhli/r.newson/
Departmental Web page:
http://www1.imperial.ac.uk/medicine/about/divisions/nhli/respiration/popgenetics/reph/

Opinions expressed are those of the author, not of the institution.

On 01/07/2013 13:09, Roger B. Newson wrote:



Thanks to Carlo for this reference. Yes, the attributable hazard
fraction (AHF) in Equation (4) of Samuelsen and Eide (2008) is the same
as the population attributable fraction (PAF) produced by -punafcc-
after using -stcox-. The confidence interval formulas are a little
different. Samuelson and Eide use the percentile bootstrap, whereas the
online help for -punafcc- recommends the user to use Shah variances by
specifying the option -vce(unconditional)-. You could presumably write a
program to use the percentile bootstrap with -punafcc-, though.

Best wishes

Roger

References

Sven Ove Samuelsen and Geir Egil Eide. 2008. Attributable fractions with
survival data. Statistics in Medicine 2008; 27:1447–1467.

Roger B Newson BSc MSc DPhil
Lecturer in Medical Statistics
Respiratory Epidemiology and Public Health Group
National Heart and Lung Institute
Imperial College London
Royal Brompton Campus
Room 33, Emmanuel Kaye Building
1B Manresa Road
London SW3 6LR
UNITED KINGDOM
Tel: +44 (0)20 7352 8121 ext 3381
Fax: +44 (0)20 7351 8322
Email: [email protected]
Web page: http://www.imperial.ac.uk/nhli/r.newson/
Departmental Web page:
http://www1.imperial.ac.uk/medicine/about/divisions/nhli/respiration/popgenetics/reph/


Opinions expressed are those of the author, not of the institution.

On 01/07/2013 12:21, Carlo Lazzaro wrote:



I suppose that Angelo refers to the following reference (access to the
full
text conditional on subscription to Stat Med):

Samuelsen SO, Eide GE. Attributable fractions with survival data. Stat
Med.
2008 Apr 30;27(9):1447-67.

Kind regards,
Carlo
-----Messaggio originale-----
Da: [email protected]
[mailto:[email protected]] Per conto di Roger B.
Newson
Inviato: lunedì 1 luglio 2013 12:57
A: [email protected]
Oggetto: Re: st: Population attributable fractions (PAFs) in
discrete-time
survival analysis. -punaf-

Yes, you can use -punaf- after a generalized linear model (GLM) with a
complementary log-log link and a binomial error function. Or after any
other
GLM that gives positive-valued conditional expectations (which includes
proportions and also Gamma and inverse-Gaussian means).

For proportional-hazard models (and also for case-control data), there
is a
package -punafcc-, which you can also download from SSC, and which
estimates
population attributable hazard factions (after proportional-hazard
regressions), or population attributable fractions (after logit
regressions
on case-control data).

Angelo has not given the Samuelsen & Eide (2008) reference on PAHFs in
full.
However, I would guess that the PAHFs of that reference would be
either the
same as, or similar to, those produced by -punafcc-. I would very much
like
to know the full reference, so I can read it and find out more.

I hope this helps.

Best wishes

Roger

Roger B Newson BSc MSc DPhil
Lecturer in Medical Statistics
Respiratory Epidemiology and Public Health Group National Heart and Lung
Institute Imperial College London Royal Brompton Campus Room 33, Emmanuel
Kaye Building 1B Manresa Road London SW3 6LR UNITED KINGDOM
Tel: +44 (0)20 7352 8121 ext 3381
Fax: +44 (0)20 7351 8322
Email: [email protected]
Web page: http://www.imperial.ac.uk/nhli/r.newson/
Departmental Web page:
http://www1.imperial.ac.uk/medicine/about/divisions/nhli/respiration/popgene

tics/reph/

Opinions expressed are those of the author, not of the institution.

On 01/07/2013 00:13, Angelo Belardi wrote:



Dear All,

I am working on discrete-time proportional hazard models with a
non-parametric baseline hazard function, using -cloglog- in
person-period formatted datasets.

I would like to additionally calculate population attributable
fractions (PAFs) in these models.
However, I have never worked with PAFs in survival analyses before and
therefore don't know which functions to use and how to correctly
interpret the results.

Previously, I calculated PAFs in STATA with the -punaf- package from
Roger Newson, e.g.
for logistic regressions.

Can I use -punaf- here as well, just after calculating the estimates
over -cloglog-?

Or is there another function/package for this situation?

Or would it be better to calculate population attributable hazard
fractions (PAHFs) as described in Samuelsen & Eide (2008)?


Thanks for any help or advice on the subject.

Regards,
Angelo


Ref:
S. O. Samuelsen, G. E. Eide, Statist. Med. 27, 1447 (2008).
http://onlinelibrary.wiley.com/doi/10.1002/sim.3022/abstract



*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index