Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Nick Cox <njcoxstata@gmail.com> |
To | "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |
Subject | Re: st: Coding overlap events in sequence data |
Date | Wed, 13 Nov 2013 10:01:01 +0000 |
There is an easy thing you can do. With your sample data reshape long sex, i(id) string rename sex age drop if age == . bysort id (age) : gen number = sum((substr(_j, 1, 1) == "B") - (substr(_j,1,1) == " E")) l So, you can keep a running tally by adding 1 every time someone starts a relationship (value starts with "B") and subtracting 1 every time it stops (value starts with "E"). You can summarize by using -egen-, e.g. egen max = max(number), by(id) Nick njcoxstata@gmail.com On 13 November 2013 08:56, Nick Cox <njcoxstata@gmail.com> wrote: > This problem would be easier after a -reshape long-. Your present data > structure makes it really difficult. > > After that, check out > > -spellutil- (SSC) > > -disjoint- (SSC) > > which may help. > Nick > njcoxstata@gmail.com > > > On 12 November 2013 22:26, Cheng, Hsu-Chih <simon.cheng@uconn.edu> wrote: >> Dear All: >> >> I am coding sequence data for 5013 respondents’ sexual relationship histories. For each respondent, I have 16 time positions (ages 18~18.25 [sext1], 18.25~18.50 [sext2],…, 21.75~22 [sext16]) and the respondent’s beginning and end ages of up to 48 relationships (most respondents have fewer than 5 relationships; so the beginning and end ages of the other 40+ relationships have missing values). Right now, respondents with multiple relationships in a given time position are coded as 4 (so, for example, sext5 = 4). I can manually go through all respondents to determine whether the multiple relationships in a given time overlap or not and recode them into different categories, but this is very tedious and time consuming. Is there a faster way to do this? >> >> Here are four examples with multiple relationships in Time 2 (ages 18.25~18.50). sexBag1 and sexEag1 indicate the beginning and end ages of relationship 1; sexBag2 and sexEag2 indicate the beginning and end ages of relationship 1;…, and so on. I want to recode [sext2] for Cases 1 and 2 as 1 to indicate that their relationships in Time 2 do not overlap, and Cases 3 and 4 as 2 to indicate their relationships in Time 2 overlap. >> >> id sext2 sexBag1 sexEag1 sexBag2 sexEag2 sexBag3 sexEag3 sexBag4 sexEag4 sexBag5 sexEag5 >> 1 4 18.5 18.75 18.333 18.416 21.416 21.5 19.25 24.083 . . >> 2 4 . . 18.250 18.333 18.416 21.666 21.583 22.833 22.5 22.50004 >> 3 4 17.249 18.999 18.499 21.999 22.166 23.249 (missing values after this) >> 4 4 16.750 22.750 18.416 18.666 (missing values after this) >> >> I really appreciate if anyone can give me some suggestions. I can provide more information about the data if needed. Thanks again. >> >> Best, >> >> Simon >> >> >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/faqs/resources/statalist-faq/ >> * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/