Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | "Kristian Thor Jakobsen" <KRJ@dm.dk> |
To | <statalist@hsphsun2.harvard.edu> |
Subject | SV: st: Analysis of event history data |
Date | Wed, 21 Mar 2012 13:30:13 +0100 |
That's so true because I have one more..... I now have my data sorted in the following way, where the variable status is a dummy indicating if a person has exited from a specific programme during a spell of unemployment. Now I need to calculate the average number of weeks that the person has been employed during e.g. the past two years before exiting this programme (for example, employment[_n-1] to employment[_n-104] if status[_n]==1), but I can't make STATA do the correct calculation (it is taking the mean total employment of each person). Id Status Employment Time 1 0 1 1 1 1 0 2 1 0 0 3 2 0 1 1 2 0 1 2 2 0 0 3 3 0 0 1 3 0 1 2 3 1 1 3 Any idea of how to get around this? Really appreciate the help. -----Oprindelig meddelelse----- Fra: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] På vegne af Nick Cox Sendt: 20. marts 2012 15:14 Til: statalist@hsphsun2.harvard.edu Emne: Re: st: Analysis of event history data Never say "one final question"! -help egen- shows that there are -egen- functions -anycount()-, -anymatch()-. -anyvalue()-. So egen ones = anycount(y_*), values(1) keep if ones Even if those functions did not exist, you could do this gen ones = 0 quietly foreach v of var y_* { replace ones = ones + (`v' == 1) } keep if ones Nick On Tue, Mar 20, 2012 at 1:28 PM, Kristian Thor Jakobsen <KRJ@dm.dk> wrote: > Thanks again, Nick. I figured it out with your help. But I have one final question. Given that my dataset consists of several million observations, I would like to trim the dataset down before I do the -reshape- command in order to avoid wasting time on observations that I would subsequently throw out. Say that I want to keep those observations where y_* is equal to 1 in one or more cases: > > Id y_1001 y_1002 y_1003 ... y_1101 area_10 area_11 > 1 1 1 0 1 10 5 > > I guess I could do the following: > > keep if y_1001==1| y_1002==1 etc. > > But given that I have around 1000 variables or so where I would need to check for the sufficient condition that would be a quite tedious function. Is there a smart way to get around this? Nick Cox > Do spend some time studying the resources for -reshape- including FAQs. > > First off, your -y_- cannot be an identifier! It doesn't identify observations. > > Second off, you can include -area- in the -reshape- but I guess you > will need some extra surgery before and after. I would try a -rename- > of the -area*- such as > > foreach v of var area* { > rename `v' `v'01 > } > > and then there will be some fill-in afterwards. > > Nick > > On Mon, Mar 19, 2012 at 12:30 PM, Kristian Thor Jakobsen <KRJ@dm.dk> wrote: >> Thanks, Nick. -reshape- is a big help. But what if I have time-varying variables that I would like to carry over as well, but not with same intervals. For example: >> >> Id y_1001 y_1002 y_1003 ... y_1101 area_10 >> area_11 >> 1 1 1 0 0 10 5 >> >> If I do -reshape using y_ as the identifier I would get something like: >> >> Id j y_ area_10 area_11 >> 1 1001 1 10 5 >> 1 1002 1 10 5 >> 1 1003 0 10 5 >> . >> . >> .1 1101 0 10 5 >> >> But I would like to have something like: >> >> Id j y_ area >> 1 1001 1 10 >> 1 1002 1 10 >> 1 1003 0 10 >> . >> . >> . >> 1 1101 0 5 >> >> Is that possible with -reshape-? Or would I have to convert the yearly time-varying variables into weekly first? >> >> Thanks again, >> Kristian >> >> -----Oprindelig meddelelse----- >> Fra: owner-statalist@hsphsun2.harvard.edu >> [mailto:owner-statalist@hsphsun2.harvard.edu] På vegne af Nick Cox >> Sendt: 19. marts 2012 12:43 >> Til: statalist@hsphsun2.harvard.edu >> Emne: Re: st: Analysis of event history data >> >> For most Stata purposes your data would indeed be better reshaped to a long data structure or shape or form (some people do say "format", but in a Stata context format implies -format-, etc.). >> >> reshape long y_ , i(id) j(time) >> rename y_ status >> >> should do it. See also -tsspell- (SSC) and >> >> SJ-7-2 dm0029 . . . . . . . . . . . . . . Speaking Stata: >> Identifying spells >> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. >> J. Cox >> Q2/07 SJ 7(2):249--265 (no >> commands) >> shows how to handle spells with complete control over >> spell specification >> >> as well as the literature on survival analysis with which you are evidently familiar. >> >> Nick >> >> On Mon, Mar 19, 2012 at 11:32 AM, Kristian Thor Jakobsen <KRJ@dm.dk> wrote: >> >>> I am trying to do an analysis of transition in and out of public >>> income transfers. My data is organized roughly the following way: >>> >>> Id y_1001 y_1002 y_1003 >>> 1 0 1 0 >>> 2 0 0 0 >>> 3 1 1 0 >>> >>> This means that I have the weekly status of each individual from >>> 1991 to 2011. But in order to any sort of analysis I would guess >>> that I had to convert the data into the following way instead (for >>> example survival >>> analysis): >>> >>> Id Status Time >>> 1 0 1 >>> 1 1 2 >>> 1 0 3 >>> 2 0 1 >>> 2 0 2 >>> 2 0 3 >>> 3 1 1 >>> 3 1 2 >>> 3 0 3 >>> >>> Is that correct, and if so, does there exist a smart way to convert >>> the data from one format into the other? Or can I perhaps use the >>> data as given? * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/