Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: R: Stset-ing Multiple Failure/Multiple Spell Data : Moving in and out of risk set

From	Kathleen Bui <[email protected]>
To	[email protected]
Subject	Re: st: R: Stset-ing Multiple Failure/Multiple Spell Data : Moving in and out of risk set
Date	Tue, 22 Mar 2011 20:01:48 -0700 (PDT)
Thank you Steve and Nick,
 
 
Yes, I will mention the large bias and measurement error present and I have 
st-set the data so that the first year an indiviudal states he was in 

self-employmetn was recorded as starting at  "analysis time" 0. 

But I am running into a problem with svy-setting my data. 

I was going to proceed with the method in 3.2.4 of 
http://www.stata.com/support/faqs/stat/stmfail.html

For my survey, I have both strata, cluster and weights, so I svy-set my data 
accordingly: svyset PSU [pw=Weight], stra(strata)

However,  as seen in 3.2.4, I again need to cluster on on my Person ID variable 
since, with the multiple failures and resetting my time to zero, I have made it 
seem as though each spell of self-employment was essentially from a different 
indiviudal, (when in reality, it is not)  

However, I am unable to use the cluster option with the svy option. 

I am not sure how to solve this issue. Any suggestions?
 
Thank you for all the help!
 
 
Kathleen--




----- Original Message ----
From: Steven Samuels <[email protected]>
To: [email protected]
Sent: Sat, March 19, 2011 5:02:47 PM
Subject: Re: st: R: Stset-ing Multiple Failure/Multiple Spell Data : Moving in 
and out of risk set

Kathleen--

With your data, you are obligated to report that measurement error of *at least* 
±1 years is possible in recorded "times" of employment  because dates that 
self-employment started or stopped in a year are unknown.  Also, report that 
there is a positive bias in estimates of probabilities that a person stayed 
self-employed for at least k years. The bias arises because the data don't 
record instances where people left and returned to self-employment between 
interviews. So, for example, four consecutive "years" (i.e. interviews) of 
reported self-employment could be made up of a number of shorter spells.

Status at interview apparently was the only observation actually made, so I 
suggest that you model that status directly instead of a questionable time 
variable. Such an analysis would be based on the same data as you'd feed into 
-stset-.  Model the probability that if a person was self-employed at the year K 
interview, they were also self-employed  at the year K+1 interview.  In this 
analysis  the zero is the first interview in a spell of self-empployment, and 
you index all the subsequent interviews as Nick suggested.  


If your data are based on a complex survey sample, -svyset- your data and use 
-svy: logistic_.  Failure to do so would invalidate your standard errors and 
hypothesis tests.

Steve

Steven J. Samuels
Consulting Statistician
18 Cantine's Island
Saugerties, NY 12477 USA
Voice: 845-246-0774
Fax:  206-202-4783 
[email protected]




On Mar 19, 2011, at 5:46 AM, Nick Cox wrote:

I don't understand what you are trying to do, but given a
classification of spells by a variable -_spell- then time in each
spell has a minimum

egen Start = min(Year) if _spell, by(PersonId _spell)

so that you just need to subtract that from Year to get a time
variable that starts at 0 in each spell.

Another way to do it is

bysort PersonId _spell (Year) : gen Time = Year - Year[1] if _spell

Nick

On Sat, Mar 19, 2011 at 12:13 AM, Kathleen Bui <[email protected]> wrote:

> Thanks for all the help!
> 
>  I do understand that smaller time intervals would be a much better , but I
> don't have access to any smaller time frame than a year.
> 
> On another note,I was wondering, how do I go about "reseting" the time to zero
> for each spell of self-employment, since I have multiple observations for each
> spell of selfemployment? (If I wanted to employ the PWP time gap model 
>approach)
> 
> 
> 
> For example, following my example before, if I had something that looked like:
> 
> (where the _spell, just indicates what spell of self-employment (first second
> etc)),
> 
> 
> How can I stset the data so the time is "reset" to zero for each new spell?
> 
> 
>+----------------------------------------------------------------------------------+
>+
> 
> 
> PersonID  Year0  Year  Failed  SelfEmploy  _spell
>-------------------------------------------------------------------------------------------
>-
> 
> 
> 1.        1      .      1990        0          0        0
> 2.        1    1990  1991        0          1        1
> 3.        1    1991  1992        0          1        1
> 4.        1    1992  1993        0          1        1
> 5.        1    1993  1994        1          1        1
> 6.        1    1994  1995        0          0        0
> 7.        1    1995  1996        0          0        0
> 8.        1    1996  1997        0          1        2
> 9.        1    1997  1998        0          1        2
> 10.      1    1998  1999        1          1        2
> -------------------------------------------------------
> 11.        1    1999  2000        0          0        0
> 12.        2      .      1993        0          0        0
> 13.        2    1993  1994        0          1        1
> 14.        2    1994  1995        0          1        1
> 15.        2    1995  1996        0          1        1
> -------------------------------------------------------
> 16.        2    1996  1997        1          1        1
> 17.        2    1997  1998        0          0        0
> +-------------------------------------------------------+
> 
> If I do:
> 
> stset Year, origin(SelfEmploy==1) failure(Failed) time0(Year0) id(PersonID)
> exit(time .) if(_spell!=0)
> 
> this doesn't reset the time for the beginning of each spell, rather it 
>continues
> (with time gaps) from the time of the first spell.
> 
> Thanks again! Appreciate the help!
> -Kathleen
> 
> 
> The following example (performed in Stata 9.2/SE) considers this issue:
> --------------- exampe begins ------------------------------------
> set obs 6
> g id = 1 in 1/2
> replace id=2 in 3/4
> replace id=3 in 5/6
> g In=0
> replace In=6 in 2
> replace In=3 in 4
> replace In=4 in 6
> g Out=1
> replace Out=7 in 2
> replace Out=8 in 4
> replace Out=5 in 6
> g No_Self_Employed=1
> replace No_Self_Employed=0 in 4
> stset Out, id(id) failure(No_Self_Employed==1)time0(In)
> exit(No_Self_Employed==2) origin(time In)
> stdes
> --------------- exampe ends ------------------------------------
> 
> In the previous code subjects do not live the SA at the first failure (ie
> No_Self_Employed==1)- since it would conflate with the assumption of
> multiple failures - but when the event No_Self_Employed==2 comes alive (and
> this event will never occurr).
> 
> As I can see from your thread and previous replies, your subjects do show
> gaps. You can check whether gaps are consistent with your methodological
> expectations using - stdes -.
> 
> For more on this topic, I would refer you to:
> MA Cleves, WW Gould, RG Gutierrez. An intoduction to survival analysis using
> Stata. Revised edition. College Station: Stata Press, 2004: 59-62.The same
> textbook (147-156)also offers interesting insights on Cox model with shared
> frailty, that may fit your data;
> the already referenced http://www.stata.com/support/faqs/stat/stmfail.html.
> 
> HTH and Kind Regards,
> Carlo
> -----Messaggio originale-----
> Da: [email protected]
> [mailto:[email protected]] Per conto di Kathleen Bui
> Inviato: domenica 13 marzo 2011 16.31
> A: [email protected]
> Oggetto: st: Stset-ing Multiple Failure/Multiple Spell Data : Moving in and
> out of risk set
> 
> My question is how to stset a multiple failure data set when an individual
> can
> move in and out of the risk set.
> 
> I have read Cleves’s An Introduction to Survival Analysis Using Stata,
> Cleve’s
> STB-49, and all previous posts concerning st-setting multiple failures.
> Others
> have asked similar questions as mine, but I have yet to find a solution that
> 
> works.
> 
> I am analyzing the duration of an individual’s stay in Self-Employment.
> Failure
> will be exit from self-employment.  My question is how can I stset the data
> so
> that Stata recognizes that an individual can move into and out of the risk
> set
> (which is being Self-Employed).
> 
> To be more explicit, for each individual in my data set, I have information
> as
> to whether or not they are Self-Employed.  The issue arises when an
> individual
> has a self employment history as follows:
> 
> The individual is self-employed and therefore at risk of failure.  Then they
> 
> fail (leave self employment) and enter waged employment. By entering waged
> employment, they are no longer at risk of failing, since they are no longer
> Self-Employed. However, after a period of time, they once again become Self
> Employed (thus re-enter the risk set) and fail once again (their second
> failure).
> 
> As a result, multiple failures are possible as individuals are moving in and
> out
> of different employment states. However, although I understand that Stata
> can
> recognize multiple failures, I am unsure of how stset can be used to
> recognize
> the multiple spells of Self-Employment, particularly the period of time
> between
> spells when the individual is no longer at risk.
> 
> Specifically, I am unable to set the analysis time back to 0 for when the
> individual begins a second period at risk after being not at risk.
> 
> For example, one individual in my data set of multiple individuals can look
> like:
> 
>    +----------------------------------------------------------------------+
>        | ID  Year0  Year  SelfEmploy    Failure        |
> 
> |--------------------------------------------------------------------|
> 1.    |  1    1989    1990        0                  0            |
> 2.    |  1    1990    1991        1                  0            |
> 3.    |  1    1991    1992        1                  0            |
> 4.    |  1    1992    1993        1                  0            |
> 5.    |  1    1993    1994        1                  0            |
> 6.    |  1    1994    1995        0                  1            |
> 7.    |  1    1995    1996        0                  0            |
> 8.    |  1    1996    1997        1                  0            |
> 9.    |  1    1997    1998        1                  0            |
> 10.  |  1    1998    1999        1                  0            |
> 11.  |  1    1999    2000        0                  1            |
>        +-------------------------------------------------------------------+
> 
> where “SelfEmploy” is the indicator variable denoting whether or not the
> individual is self employed, “Failed” is an indicator variable denoting if
> the
> 
> individual has left self employment and year0 and year are the corresponding
> 
> beginning and end of time period.
> 
> So between, 1990 and 1994, the individual is at risk of failing, and fails
> between 1994 and 1995. But between 1995 and 1996, they are no longer at risk
> of
> 
> failing (say they are employed in the waged sector). But then they enter
> self
> employment in 1996 and thus experience another failure between in 1999-2000.
> 
> Is there a command in stset that allows Stata to “ignore” the periods when
> they
> are no longer at risk?
> 
> For example, when I stset my data as follows: stset year,
> origin(SelfEmploy==1)
> failure(Failed)  time0(Year0)  id(PersonID) exit(time .), the period when
> they
> are no longer at risk of failing is treated as if they are in
> self-employment as
> the output I receive is:
> 
> 
> +---------------------------------------------------------------------------
> ------------- +
> 
>    | ID  Year0  Year  SelfEmploy  Failure  _s  _d      _t0    _t
> |
>    |-----------------------------------------------------------------------
> ---------------------|
> 
> 1. |  1    1989    1990      0              0        0        0
> .
> .    |
> 2. |  1    1990    1991      1              0        0        0      .
> 
> .    |
> 3. |  1    1991    1992      1              0        1        0        0
> 
>  1    |
> 4. |  1    1992    1993      1              0        1        0        1
>  2    |
> 5. |  1    1993    1994      1              0        1        0        2
> 
>  3    |
> 6. |  1    1994    1995      0              1        1        1
> 3
>  4    |
> 7. |  1    1995    1996      0              0        1        0
> 4
>  5    |
> 8. |  1    1996    1997      1              0        1        0
> 5
>  6    |
> 9. |  1    1997    1998      1              0        1        0
> 6
>  7    |
> 10.|  1    1998    1999      1              0        1        0        7
> 
>  8    |
> 11.|  1    1999    2000      0              1        1        1
> 8
>  9    |
> 
> +---------------------------------------------------------------------------
> ----------------+
> 
> 
> Stata seems to count the period form 1995-1996,as a time where the
> individual is
> at risk of failing, when he is not.
> 
> 
> 
> Therefore,  am unsure as to how to st-set the data so that from 1995-1996,
> Stata
> recognizes that the individual is no longer at risk of failing and that my
> 
> analysis time can be “Reset” to 0 for when the individual begins a second
> period
> at risk after being not at risk.

*
*  For searches and help try:
*  http://www.stata.com/help.cgi?search
*  http://www.stata.com/support/statalist/faq
*  http://www.ats.ucla.edu/stat/stata/


*
*  For searches and help try:
*  http://www.stata.com/help.cgi?search
*  http://www.stata.com/support/statalist/faq
*  http://www.ats.ucla.edu/stat/stata/



      

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Follow-Ups:
- Re: st: R: Stset-ing Multiple Failure/Multiple Spell Data : Moving in and out of risk set
  - From: Steven Samuels <[email protected]>
Prev by Date: st: bug in bs4rw (or bug in Stata?)
Next by Date: Re: st: multiple regression, r squared and normality of residuals
Previous by thread: st: bug in bs4rw (or bug in Stata?)
Next by thread: Re: st: R: Stset-ing Multiple Failure/Multiple Spell Data : Moving in and out of risk set
Index(es):
- Date
- Thread