Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: -cloglog- memory & -stcurve- median : was -svy stocx- attained age
From
Steve Samuels <[email protected]>
To
[email protected]
Subject
st: -cloglog- memory & -stcurve- median : was -svy stocx- attained age
Date
Tue, 6 Aug 2013 17:25:42 -0400
Pradip:
Statalist is not a place for private conversations. Nobody will be able to
follow this thread unless you are clear about what you are doing.
In my response to your private email, I asked you to send your original
private post to Statalist. Please now describe the study, data, and
study questions so that others might help you.
For specific issues, show code and results as the Statalist FAQ request,
detail the models you tried, the relevant output, and give essential
information, such as the N, number of deaths, descriptive stats on your
primary exposure variable, and a 2 x 2 table cross-tabulating death and
exposure.
To bring others up to date, the d* and dur* variables Pradip refers to
are period variables for -cloglog-, as detailed in Lesson 6 of
https://www.iser.essex.ac.uk/resources/survival-analysis-with-stata
The "spell" dataset, is the result of an -expand- operation which
contains one observation for each period a person is at risk:
About memory problem, I have only generic advice:
• Upgrade your OS to one that can address more memory
• Add physical memory
As we know more details, perhaps other ideas may suggest themselves.
To estimate the median, compute it "by hand" for a
single curve; then try to write code that will automate what you did. If
you can't, "by hand" may be good enough. Just be aware that, because of
the grouping, you'll need to linearly interpolate between quarter end
points or to fit flexible parametric models. I pointed the way to
creating expected values in an earlier post. For a way of adding
points to a graph, see:
http://www.stata.com/statalist/archive/2008-02/msg01145.html
By the way: you are not correct in assuming that date of interview can
be located only by year and quarter. The NHIS data sets contain an
"assignment week" of the quarter, and instructions are to finish the
survey by that week, or no later than a bit longer than 14 days. Thus
you can easily locate interview date to the most likely month, that of
the assignment week. I'm not sure that will you do any good unless you
can work with the restricted data that has date of death.
See:
http://www.amstat.org/sections/srms/proceedings/papers/1992_048.pdf)
http://www.amstat.org/sections/srms/proceedings/y2001/Proceed/00040.pdf
You are welcome for the advice I've given so far, but to quote a recent post
from Nick Cox:
"Better to think that you are replying to the list. Anyone who replies
to you is not volunteering to provide dedicated support, just trying to
push a discussion forward. Nor will they necessarily write lots of code
for you...What you want requires custom programming..."
Regards,
Steve
On Aug 5, 2013, at 3:54 PM, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:
Steve, Thank you so much for your continued support, advice, and
excellent reference materials (Thanks to Professor Jenkins for making
his materials available online). My apologies for e-mail transmission
problems, source yet to be ascertained. Since my recent e-mail did also
not get posted, I am resending it, with no copying or pasting this time.
I would like to run models with both -svy stcox- and -svy cloglog- for
comparison purposes. But, I would like to report results from -cloglog-
models.
Issues:
1) -cloglog-: The spell-quarter data file is created (compressed -
5,485,946 obs 70 vars including d1-d39 dummies and dur1-dur10 variables,
with -set memory 725m - can't go beyond this limit). Failed to run
models due to memory issue. Any advice?
2) -syv:stcox-: Plotted the survival curve [...at1() ... at2 ()] and
then saved the results in a .dta file that provides me three variables
(_t, surv2, and surv3). I am looking for the sample code to calculate
the following:
(a) the median survival time, that is, at at with (_t) half of the
people survived (based on surv2 and surv3); and
(b) the arithmetic mean of the survival time (_t) or life
expectancy.
Your advice toward resolving the issues will be highly appreciated.
Regards,
Pradip
Pradip K. Muhuri, PhD SAMHSA/CBHSQ 1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857 Tel: 240-276-1070 Fax: 240-276-1260
-----Original Message----- From: [email protected]
[mailto:[email protected]] On Behalf Of Steve Samuels
Sent: Wednesday, July 31, 2013 6:30 PM To:
[email protected] Subject: Re: st: -svy stocx- attained age
Pradip:
You are encountering a version of problem that I diagnosed earlier this
month at http://www.stata.com/statalist/archive/2013-07/msg00644.html,
where I referred to
http://www.stata.com/support/faqs/statistics/stcox-producing-missing-
standard-errors/ "4) Covariate does not vary within death event risk
sets."
In your case, people in the same age group at baseline will have the
same attained age at all points of followup. You can use either age at
baseline or time as attained age, but not both.
I'd recommend age at baseline, as you know this pretty precisely, a gain
that doesn't transfer to attained age. But the use of only five age
groups loses a lot of information; try fractional polynomial regression
(-fp-) on continuous age.
In a post to me, you stated that you have a maximum of 10 years of
follow-up and, that for dates of death and interview, you have only year
and quarter. Your solution is to assign the midpoint date of each
quarter. This could work, but violates the assumption of -stcox- that
times are essentially continous. The measurement error in follow-up time
could be as much as ±3 months and will probably bias the estimated
coefficients. Moreover, if someone died in < 3 months after baseline,
you would be assigning start and death to the same date, and -stset-
will drop the observation.
Therefore I suggest that you also do a grouped hazard analysis with
-cloglog- which accepts a -svy- prefix. (-stpm2-, as you reminded me,
does not.). With -cloglog-, assign the person who died < 3 months after
baseline to the first period. For more details, see the the Lesson 6
link to discrete data analysis on Stephen Jenkins's fine web page
"Survival analysis with Stata"
(http://www.iser.essex.ac.uk/survival-analysis )
Steve
On Jul 31, 2013, at 12:53 PM, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:
Hello,
I am new to the Statalist. In response to my first posted e-mail to the
List, and I got a reply from Steve, with insightful comments and advice.
At this point, I need your help with two issues.
1) All my subsequent e-mails (content --copied from the Stata log file -
in plain text, not html) have bounced back to me. I don't understand
what I am doing wrong.
2) I am using -svy:stcox- models, with attained age as the time scale. I
have successfully run several models. However, the addition of a
5-category factor (attained age) to the model gives me the following
error message: "flat region resulting in a missing likelihood error
occurred when svy executed stcox last estimates not found". Sorry I am
not pasting the content from the log file, fearing that this e-mail will
also bounce back.
Thanks,
Pradip
This not a private conversation between us. Nobody will be able to
follow this thread unless you are clear about what you are doing. I
asked you to send your original private post to Statalist. Please do so
now. Include the study description and goals.
For specific issues, please show code and results as the Statalist FAQ
request, detail the models you tried, and give essential information,
such as the N, number of deaths, descriptive stats on your primary
exposure variable, and a 2 x 2 table cross-tabulating death and
exposure.
I already suggested that attained-age was
To bring others up to date, the d* and dur* variables Pradip refers to
are period variables for -cloglog-, as detailed in Lesson 6 of
https://www.iser.essex.ac.uk/resources/survival-analysis-with-stata
The "spell" dataset, is the result of an -expand- operation which
contains one observation for each time a person is at risk:
As to your memory problem, I have only generic advice:
• Upgrade your OS to one that can address more memory
• Add physical memory
As we know more details, perhaps other ideas may suggest themselves.
Your other questions: To estimate the median, compute it "by hand" for a
single curve; then try to write code that will automate what you did. If
you can't, "by hand" may be good enough. Just be aware that, because of
the grouping, you'll need to linearly interpolate to locate median to
the nearest month. I pointed the way to creating expected values in an
earlier post.
You are not correct in assuming that date of interview can be located
only by year and quarter. The NHIS data sets contain an "assignment
week" of the quarter, and instructions are to finish the survey by that
week, or no later than a bit longer than 14 days. Thus you can easily
locate interview date to the most likely month, that of the assignment
week. I'm not sure that will you do any good.
See:
http://www.amstat.org/sections/srms/proceedings/papers/1992_048.pdf)
http://www.amstat.org/sections/srms/proceedings/y2001/Proceed/00040.pdf
You are welcome for the advice I've given, but to quote a recent post
from Nick Cox:
"Better to think that you are replying to the list. Anyone who replies
to you is not volunteering to provide dedicated support, just trying to
push a discussion forward. Nor will they necessarily write lots of code
for you. What you want requires custom programming..."
Regards,
Steve
On Aug 5, 2013, at 3:54 PM, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:
Steve, Thank you so much for your continued support, advice, and
excellent reference materials (Thanks to Professor Jenkins for making
his materials available online). My apologies for e-mail transmission
problems, source yet to be ascertained. Since my recent e-mail did also
not get posted, I am resending it, with no copying or pasting this time.
I would like to run models with both -svy stcox- and -svy cloglog- for
comparison purposes. But, I would like to report results from -cloglog-
models.
Issues:
1) -cloglog-: The spell-quarter data file is created (compressed -
5,485,946 obs 70 vars including d1-d39 dummies and dur1-dur10 variables,
with -set memory 725m - can't go beyond this limit). Failed to run
models due to memory issue. Any advice?
2) -syv:stcox-: Plotted the survival curve [...at1() ... at2 ()] and
then saved the results in a .dta file that provides me three variables
(_t, surv2, and surv3). I am looking for the sample code to calculate
the following:
(a) the median survival time, that is, at at with (_t) half of the
people survived (based on surv2 and surv3); and
(b) the arithmetic mean of the survival time (_t) or life
expectancy.
Your advice toward resolving the issues will be highly appreciated.
Regards,
Pradip
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/