Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Artificial censoring in survival analysis

From	[email protected]
To	"[email protected]" <[email protected]>
Subject	Re: st: Artificial censoring in survival analysis
Date	Sat, 6 Aug 2011 08:03:43 +0300

Steve,

Sorry for making CC to your email again. I think I could not reply to Statalist because my iPhone has problem with plain-text.

Thanks. I give an example here about my data structure. I will be happy to hear your idea about it.

I have employment data for all individuals between 1995 and 2006. Now let me give you example that shows my problem. Let's take two individuals - individual 1 went to college in 1995 and completed it in 1998 and he has also been working sporadically between 1995 and 1998. But I make January 1998 the date of entry of this individual into my analysis on condition that he was unemployed in January 1998. I did not make January 1995 his entry date to my analysis - because I assume that he could not have invested much time for looking job while studying.
The second individual did not go to college. For him, I make January 1995 the date of entry into my analysis on condition that he was unemployed by then. 
My data set includes these two types of individuals. The first individual has 9 years of observation while the second has 12 years. Let's say that they were both unemployed by 2006, the last year of observation. Now it could be that the first individual did not exit to employment because he lower years of observation - 9 months - compared to the second individual who has 12 years of observation period. 

The disadvantage of my decision to censor all, say after 24 or 48 months is, as you said, the loss of information on those observations who have longer period of risk exposure to exit to employment.

Thanks 




On 6 Aug 2011, at 00:46, Steven Samuels <[email protected]> wrote:

> 
> --
> 
> Thanks for your detailed answer.  This looks like an excellent data set.  I suggest that you use all the observations that you have and censor everyone the first year they leave unemployment or 2006 for those who never became employed.  There is no advantage that I can see in ending observation after 24 months for everybody. On the contrary, there is a loss of information. 
> 
> 
> Steve
> 
> 
> On Aug 5, 2011, at 3:29 PM, Melaku Fekadu wrote:
> 
> Steven,
> 
> 
> Thanks for taking your time and answering me. Here are some detail
> about my data.
> 
> 
> What kind of study generated the data. A prospective cohort?.  A
> cross-section with retrospective recall?
> 
> 
> It is a panel administrative data.
> 
> 
> • Was the study a complex sample, so that there are weights and
> clusters (PSUs)?
> 
> 
> It has no weight.
> 
> 
> • What is the purpose of YOUR analysis?
> 
> 
> I am analyzing determinants of reemployment of unemployed individuals.
> 
> 
> • What was the larger data set, if any, from which you took your
> specific data.  What criteria did you use for inclusions?
> 
> 
> I have data on one age cohort - those who were born in 1977 and were
> unemployed in 1995 (at age 18). I have their monthly employment data
> from 1995 to 2006, that is from age 18 to 29. Some were studying in
> college during these years and some were not. For those who did not go
> to college, I use for employment data from January 1995 for the
> analysis. But for those who went to college I use their employment
> data from the month of their college completion. The date of college
> completion could be different for different individuals. This makes
> the date of entry to analysis different for all individuals. This also
> makes different the length of observation period for all individuals;
> some have longer period of observation and some have less. Remember
> that my data is restricted to 1995 to 2006. To overcome this problem I
> decided to censor all at a given length of observation, say 24 months.
> Because those who went to college are "too young" to experience entry
> to employment compared to those who went to look for work directly.
> 
> 
> • What is month "1"?   a calendar month, a month of an interview?  The
> first month of unemployment?
> 
> 
> Month "1" is the first month of unemployment. Month "1" could be
> different for each observation.
> 
> 
> • Did unemployment start before month "1" for everybody or some
> people?  After month 1?
> 
> 
> Month "1" is the start of unemployment for all. Some have just
> finished school, some have just departed from an earlier job and start
> looking for work starting from month "1".
> 
> 
> • For those who started before month "1", do you know how long they
> had been unemployed?
> 
> 
> No unemployment before month "1".
> 
> On Fri, Aug 5, 2011 at 12:22 AM, Steven Samuels <[email protected]> wrote:
>> 
>> -
>> I am answering your second question about -hshaz-.  There are examples of two and three mass points at the end of the -help-.  The mixture model for heterogeneity means that the unobserved log hazard is at one of those points, with locations and probabilities to be estimated.
>> 
>> 
>> 
>> For your earlier question.
>> 
>> I don't see a good reason for censoring individuals at 12 months because of problems in observing other individuals.  However until you describe your data more fully, then I really don't know.
>> 
>> 
>> • What kind of study generated the data. A prospective cohort?.  A cross-section with retrospective recall?
>> 
>> • Was the study a complex sample, so that there are weights and clusters (PSUs)?
>> 
>> • What is the purpose of YOUR analysis?
>> 
>> • What was the larger data set, if any, from which you took your specific data.  What criteria did you use for inclusions?
>> 
>> • What is month "1"?   a calendar month, a month of an interview?  The first month of unemployment?
>> 
>> • Did unemployment start before month "1" for everybody or some people?  After month 1?
>> 
>> • For those who started before month "1", do you know how long they had been unemployed?
>> 
>> • What do you mean people were "younger" to experience the event?  Did you mean "too young" to qualify as unemployed at the start?
>> 
>> • Why do you have information on some people for more than 12 months but not for others?  How did observation end.
>> 
>> • Have you information on people who were employed but became unemployed during the study period (perhaps not in the data set you describe below.
>> 
>> 
>> In short we need a complete description of the study design and the beginning and endinfg of observation.
>> 
>> 
>> 
>> Dear statalisters,
>> 
>> I am doing a project on duration of unemployment. I want to compare models with and without unobserved heterogeneity. I want to use -hshaz- module to estimate a mixture model but I couldn't find example on how to do that. I will appreciate any help where to find examples.
>> 
>> Thanks,
>> Melaku
>> 
>> 
>> On Aug 2, 2011, at 3:25 AM, [email protected] wrote:
>> 
>> Hello statalisters,
>> 
>> I analyze employment data using survival method for a length of 12 months. I decided to do so because some of my observations are younger to experience the event (in this case exiting unemployment) for more than 12 months; that is I observe them only for 12 months. To overcome this problem I imposed a 12 months period of analysis for all of my observations. That is all observations have equal length of 12 months to experience the event. I did so by artificially censoring those observations for whom I have data for more than 12 months and did not experience the event within 12 months. These are old individuals. I did censor even though I see some of these observations experience the event later, after the 12 months period.
>> 
>> My questions:
>> 1. Should I include in the analysis those observations that I censored?
>> 2. Is the sample data presented below appropriate for survival analysis? Note that all of observations experience the event except those I censored at the 12 month.
>> 
>> Below is a small representation of my data. The failure variable 'Failure' is cross-tabulated with the variable 'studytime' which is the number of months until experiencing the event.
>> 
>> Failure
>> 0 | 1
>>  ------
>> 1    0 | 200
>> 2    0 | 89
>> 3    0 | 70
>> 5    0 | 68
>> 6    0 | 58
>> 7    0 | 50
>> 8    0 | 51
>> 10   0 | 45
>> 11   0 | 30
>> 12   150 | 0
>> 
>> Thanks,
>> Melaku
>> 
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>> 
>> 
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>> 
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
> 
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: Artificial censoring in survival analysis
  - From: [email protected]
- Re: st: Artificial censoring in survival analysis
  - From: Steven Samuels <[email protected]>
- Re: st: Artificial censoring in survival analysis
  - From: Melaku Fekadu <[email protected]>
- Re: st: Artificial censoring in survival analysis
  - From: Steven Samuels <[email protected]>

Prev by Date: st: quasi-difference
Next by Date: st: How to drop observations in panel
Previous by thread: Re: st: Artificial censoring in survival analysis
Next by thread: Re: st: Artificial censoring in survival analysis
Index(es):
- Date
- Thread