Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: Coding overlap events in sequence data
From
"Cheng, Hsu-Chih" <[email protected]>
To
"[email protected]" <[email protected]>
Subject
RE: st: Coding overlap events in sequence data
Date
Thu, 14 Nov 2013 18:52:14 +0000
Nick, Thanks for your suggestion. I will explore possibilities in this direction. Best, Simon
________________________________________
From: [email protected] [[email protected]] on behalf of Nick Cox [[email protected]]
Sent: Wednesday, November 13, 2013 5:01 AM
To: [email protected]
Subject: Re: st: Coding overlap events in sequence data
There is an easy thing you can do. With your sample data
reshape long sex, i(id) string
rename sex age
drop if age == .
bysort id (age) : gen number = sum((substr(_j, 1, 1) == "B") -
(substr(_j,1,1) == " E"))
l
So, you can keep a running tally by adding 1 every time someone starts
a relationship (value starts with "B") and subtracting 1 every time it
stops (value starts with "E"). You can summarize by using -egen-, e.g.
egen max = max(number), by(id)
Nick
[email protected]
On 13 November 2013 08:56, Nick Cox <[email protected]> wrote:
> This problem would be easier after a -reshape long-. Your present data
> structure makes it really difficult.
>
> After that, check out
>
> -spellutil- (SSC)
>
> -disjoint- (SSC)
>
> which may help.
> Nick
> [email protected]
>
>
> On 12 November 2013 22:26, Cheng, Hsu-Chih <[email protected]> wrote:
>> Dear All:
>>
>> I am coding sequence data for 5013 respondents’ sexual relationship histories. For each respondent, I have 16 time positions (ages 18~18.25 [sext1], 18.25~18.50 [sext2],…, 21.75~22 [sext16]) and the respondent’s beginning and end ages of up to 48 relationships (most respondents have fewer than 5 relationships; so the beginning and end ages of the other 40+ relationships have missing values). Right now, respondents with multiple relationships in a given time position are coded as 4 (so, for example, sext5 = 4). I can manually go through all respondents to determine whether the multiple relationships in a given time overlap or not and recode them into different categories, but this is very tedious and time consuming. Is there a faster way to do this?
>>
>> Here are four examples with multiple relationships in Time 2 (ages 18.25~18.50). sexBag1 and sexEag1 indicate the beginning and end ages of relationship 1; sexBag2 and sexEag2 indicate the beginning and end ages of relationship 1;…, and so on. I want to recode [sext2] for Cases 1 and 2 as 1 to indicate that their relationships in Time 2 do not overlap, and Cases 3 and 4 as 2 to indicate their relationships in Time 2 overlap.
>>
>> id sext2 sexBag1 sexEag1 sexBag2 sexEag2 sexBag3 sexEag3 sexBag4 sexEag4 sexBag5 sexEag5
>> 1 4 18.5 18.75 18.333 18.416 21.416 21.5 19.25 24.083 . .
>> 2 4 . . 18.250 18.333 18.416 21.666 21.583 22.833 22.5 22.50004
>> 3 4 17.249 18.999 18.499 21.999 22.166 23.249 (missing values after this)
>> 4 4 16.750 22.750 18.416 18.666 (missing values after this)
>>
>> I really appreciate if anyone can give me some suggestions. I can provide more information about the data if needed. Thanks again.
>>
>> Best,
>>
>> Simon
>>
>>
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/