Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: R: Stset-ing Multiple Failure/Multiple Spell Data : Moving in and out of risk set
From
Kathleen Bui <[email protected]>
To
[email protected]
Subject
Re: st: R: Stset-ing Multiple Failure/Multiple Spell Data : Moving in and out of risk set
Date
Tue, 22 Mar 2011 20:01:48 -0700 (PDT)
Thank you Steve and Nick,
Yes, I will mention the large bias and measurement error present and I have
st-set the data so that the first year an indiviudal states he was in
self-employmetn was recorded as starting at "analysis time" 0.
But I am running into a problem with svy-setting my data.
I was going to proceed with the method in 3.2.4 of
http://www.stata.com/support/faqs/stat/stmfail.html
For my survey, I have both strata, cluster and weights, so I svy-set my data
accordingly: svyset PSU [pw=Weight], stra(strata)
However, as seen in 3.2.4, I again need to cluster on on my Person ID variable
since, with the multiple failures and resetting my time to zero, I have made it
seem as though each spell of self-employment was essentially from a different
indiviudal, (when in reality, it is not)
However, I am unable to use the cluster option with the svy option.
I am not sure how to solve this issue. Any suggestions?
Thank you for all the help!
Kathleen--
----- Original Message ----
From: Steven Samuels <[email protected]>
To: [email protected]
Sent: Sat, March 19, 2011 5:02:47 PM
Subject: Re: st: R: Stset-ing Multiple Failure/Multiple Spell Data : Moving in
and out of risk set
Kathleen--
With your data, you are obligated to report that measurement error of *at least*
±1 years is possible in recorded "times" of employment because dates that
self-employment started or stopped in a year are unknown. Also, report that
there is a positive bias in estimates of probabilities that a person stayed
self-employed for at least k years. The bias arises because the data don't
record instances where people left and returned to self-employment between
interviews. So, for example, four consecutive "years" (i.e. interviews) of
reported self-employment could be made up of a number of shorter spells.
Status at interview apparently was the only observation actually made, so I
suggest that you model that status directly instead of a questionable time
variable. Such an analysis would be based on the same data as you'd feed into
-stset-. Model the probability that if a person was self-employed at the year K
interview, they were also self-employed at the year K+1 interview. In this
analysis the zero is the first interview in a spell of self-empployment, and
you index all the subsequent interviews as Nick suggested.
If your data are based on a complex survey sample, -svyset- your data and use
-svy: logistic_. Failure to do so would invalidate your standard errors and
hypothesis tests.
Steve
Steven J. Samuels
Consulting Statistician
18 Cantine's Island
Saugerties, NY 12477 USA
Voice: 845-246-0774
Fax: 206-202-4783
[email protected]
On Mar 19, 2011, at 5:46 AM, Nick Cox wrote:
I don't understand what you are trying to do, but given a
classification of spells by a variable -_spell- then time in each
spell has a minimum
egen Start = min(Year) if _spell, by(PersonId _spell)
so that you just need to subtract that from Year to get a time
variable that starts at 0 in each spell.
Another way to do it is
bysort PersonId _spell (Year) : gen Time = Year - Year[1] if _spell
Nick
On Sat, Mar 19, 2011 at 12:13 AM, Kathleen Bui <[email protected]> wrote:
> Thanks for all the help!
>
> I do understand that smaller time intervals would be a much better , but I
> don't have access to any smaller time frame than a year.
>
> On another note,I was wondering, how do I go about "reseting" the time to zero
> for each spell of self-employment, since I have multiple observations for each
> spell of selfemployment? (If I wanted to employ the PWP time gap model
>approach)
>
>
>
> For example, following my example before, if I had something that looked like:
>
> (where the _spell, just indicates what spell of self-employment (first second
> etc)),
>
>
> How can I stset the data so the time is "reset" to zero for each new spell?
>
>
>+----------------------------------------------------------------------------------+
>+
>
>
> PersonID Year0 Year Failed SelfEmploy _spell
>-------------------------------------------------------------------------------------------
>-
>
>
> 1. 1 . 1990 0 0 0
> 2. 1 1990 1991 0 1 1
> 3. 1 1991 1992 0 1 1
> 4. 1 1992 1993 0 1 1
> 5. 1 1993 1994 1 1 1
> 6. 1 1994 1995 0 0 0
> 7. 1 1995 1996 0 0 0
> 8. 1 1996 1997 0 1 2
> 9. 1 1997 1998 0 1 2
> 10. 1 1998 1999 1 1 2
> -------------------------------------------------------
> 11. 1 1999 2000 0 0 0
> 12. 2 . 1993 0 0 0
> 13. 2 1993 1994 0 1 1
> 14. 2 1994 1995 0 1 1
> 15. 2 1995 1996 0 1 1
> -------------------------------------------------------
> 16. 2 1996 1997 1 1 1
> 17. 2 1997 1998 0 0 0
> +-------------------------------------------------------+
>
> If I do:
>
> stset Year, origin(SelfEmploy==1) failure(Failed) time0(Year0) id(PersonID)
> exit(time .) if(_spell!=0)
>
> this doesn't reset the time for the beginning of each spell, rather it
>continues
> (with time gaps) from the time of the first spell.
>
> Thanks again! Appreciate the help!
> -Kathleen
>
>
> The following example (performed in Stata 9.2/SE) considers this issue:
> --------------- exampe begins ------------------------------------
> set obs 6
> g id = 1 in 1/2
> replace id=2 in 3/4
> replace id=3 in 5/6
> g In=0
> replace In=6 in 2
> replace In=3 in 4
> replace In=4 in 6
> g Out=1
> replace Out=7 in 2
> replace Out=8 in 4
> replace Out=5 in 6
> g No_Self_Employed=1
> replace No_Self_Employed=0 in 4
> stset Out, id(id) failure(No_Self_Employed==1)time0(In)
> exit(No_Self_Employed==2) origin(time In)
> stdes
> --------------- exampe ends ------------------------------------
>
> In the previous code subjects do not live the SA at the first failure (ie
> No_Self_Employed==1)- since it would conflate with the assumption of
> multiple failures - but when the event No_Self_Employed==2 comes alive (and
> this event will never occurr).
>
> As I can see from your thread and previous replies, your subjects do show
> gaps. You can check whether gaps are consistent with your methodological
> expectations using - stdes -.
>
> For more on this topic, I would refer you to:
> MA Cleves, WW Gould, RG Gutierrez. An intoduction to survival analysis using
> Stata. Revised edition. College Station: Stata Press, 2004: 59-62.The same
> textbook (147-156)also offers interesting insights on Cox model with shared
> frailty, that may fit your data;
> the already referenced http://www.stata.com/support/faqs/stat/stmfail.html.
>
> HTH and Kind Regards,
> Carlo
> -----Messaggio originale-----
> Da: [email protected]
> [mailto:[email protected]] Per conto di Kathleen Bui
> Inviato: domenica 13 marzo 2011 16.31
> A: [email protected]
> Oggetto: st: Stset-ing Multiple Failure/Multiple Spell Data : Moving in and
> out of risk set
>
> My question is how to stset a multiple failure data set when an individual
> can
> move in and out of the risk set.
>
> I have read Cleves’s An Introduction to Survival Analysis Using Stata,
> Cleve’s
> STB-49, and all previous posts concerning st-setting multiple failures.
> Others
> have asked similar questions as mine, but I have yet to find a solution that
>
> works.
>
> I am analyzing the duration of an individual’s stay in Self-Employment.
> Failure
> will be exit from self-employment. My question is how can I stset the data
> so
> that Stata recognizes that an individual can move into and out of the risk
> set
> (which is being Self-Employed).
>
> To be more explicit, for each individual in my data set, I have information
> as
> to whether or not they are Self-Employed. The issue arises when an
> individual
> has a self employment history as follows:
>
> The individual is self-employed and therefore at risk of failure. Then they
>
> fail (leave self employment) and enter waged employment. By entering waged
> employment, they are no longer at risk of failing, since they are no longer
> Self-Employed. However, after a period of time, they once again become Self
> Employed (thus re-enter the risk set) and fail once again (their second
> failure).
>
> As a result, multiple failures are possible as individuals are moving in and
> out
> of different employment states. However, although I understand that Stata
> can
> recognize multiple failures, I am unsure of how stset can be used to
> recognize
> the multiple spells of Self-Employment, particularly the period of time
> between
> spells when the individual is no longer at risk.
>
> Specifically, I am unable to set the analysis time back to 0 for when the
> individual begins a second period at risk after being not at risk.
>
> For example, one individual in my data set of multiple individuals can look
> like:
>
> +----------------------------------------------------------------------+
> | ID Year0 Year SelfEmploy Failure |
>
> |--------------------------------------------------------------------|
> 1. | 1 1989 1990 0 0 |
> 2. | 1 1990 1991 1 0 |
> 3. | 1 1991 1992 1 0 |
> 4. | 1 1992 1993 1 0 |
> 5. | 1 1993 1994 1 0 |
> 6. | 1 1994 1995 0 1 |
> 7. | 1 1995 1996 0 0 |
> 8. | 1 1996 1997 1 0 |
> 9. | 1 1997 1998 1 0 |
> 10. | 1 1998 1999 1 0 |
> 11. | 1 1999 2000 0 1 |
> +-------------------------------------------------------------------+
>
> where “SelfEmploy” is the indicator variable denoting whether or not the
> individual is self employed, “Failed” is an indicator variable denoting if
> the
>
> individual has left self employment and year0 and year are the corresponding
>
> beginning and end of time period.
>
> So between, 1990 and 1994, the individual is at risk of failing, and fails
> between 1994 and 1995. But between 1995 and 1996, they are no longer at risk
> of
>
> failing (say they are employed in the waged sector). But then they enter
> self
> employment in 1996 and thus experience another failure between in 1999-2000.
>
> Is there a command in stset that allows Stata to “ignore” the periods when
> they
> are no longer at risk?
>
> For example, when I stset my data as follows: stset year,
> origin(SelfEmploy==1)
> failure(Failed) time0(Year0) id(PersonID) exit(time .), the period when
> they
> are no longer at risk of failing is treated as if they are in
> self-employment as
> the output I receive is:
>
>
> +---------------------------------------------------------------------------
> ------------- +
>
> | ID Year0 Year SelfEmploy Failure _s _d _t0 _t
> |
> |-----------------------------------------------------------------------
> ---------------------|
>
> 1. | 1 1989 1990 0 0 0 0
> .
> . |
> 2. | 1 1990 1991 1 0 0 0 .
>
> . |
> 3. | 1 1991 1992 1 0 1 0 0
>
> 1 |
> 4. | 1 1992 1993 1 0 1 0 1
> 2 |
> 5. | 1 1993 1994 1 0 1 0 2
>
> 3 |
> 6. | 1 1994 1995 0 1 1 1
> 3
> 4 |
> 7. | 1 1995 1996 0 0 1 0
> 4
> 5 |
> 8. | 1 1996 1997 1 0 1 0
> 5
> 6 |
> 9. | 1 1997 1998 1 0 1 0
> 6
> 7 |
> 10.| 1 1998 1999 1 0 1 0 7
>
> 8 |
> 11.| 1 1999 2000 0 1 1 1
> 8
> 9 |
>
> +---------------------------------------------------------------------------
> ----------------+
>
>
> Stata seems to count the period form 1995-1996,as a time where the
> individual is
> at risk of failing, when he is not.
>
>
>
> Therefore, am unsure as to how to st-set the data so that from 1995-1996,
> Stata
> recognizes that the individual is no longer at risk of failing and that my
>
> analysis time can be “Reset” to 0 for when the individual begins a second
> period
> at risk after being not at risk.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/