Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: RE: generating variables based on the co-occurrence of ids in groups over time
From
Nick Cox <[email protected]>
To
"'[email protected]'" <[email protected]>
Subject
st: RE: generating variables based on the co-occurrence of ids in groups over time
Date
Wed, 7 Mar 2012 12:17:15 +0000
Here are some doodlings:
tab ind_id, gen(ind_id)
drop ind_id
foreach v of var ind_id* {
local call `call' (sum) `v'
}
collapse `call', by(year project_id)
l
egen count_id = rowtotal(ind_id*)
unab ind_id : ind_id*
local ind_id : subinstr local ind_id "ind_id" "", all
foreach id of local ind_id {
gen collab`id' = count_id - ind_id`id' if ind_id`id' == 1
}
edit
Not a complete solution, but may help.
Nick
[email protected]
-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Erik Aadland
Sent: 07 March 2012 11:15
To: [email protected]
Subject: st: generating variables based on the co-occurrence of ids in groups over time
Dear Statalist.
I am struggling to generate two variables based on the co-occurrence of ind_ids in project_ids over time (yearmonth).
Structure of my data is as follows:
yearmonth project_id ind_id
5 1 1
5 1 2
5 1 3
5 2 1
5 2 4
5 2 5
6 3 1
6 3 2
6 3 5
6 4 4
6 4 5
6 4 6
7 5 1
7 5 4
7 5 5
7 5 2
The two variables I need to generate are:
X (no. of prior collaborators in project for each ind_id): how many of the other individuals in project_id each ind_id has previously collaborated with (i.e. how many of the other ind_ids in the current project that each focal ind_id has co-occurred with in other projects in previous yearmonths)
Z (total prior collaborations in project for each ind_id): the total number of times each ind_id has previously collaborated with the given other individuals in project_id (i.e. the total number of times each focal ind_id has co-occurred with other ind_ids in the current project in previous yearmonths)
I have added varible X and Z scores to the data structure example below:
yearmonth project_id ind_id X Z
5 1 1
5 1 2
5 1 3
5 2 1
5 2 4
5 2 5
6 3 1 2 2
6 3 2 1 1
6 3 5 1 1
6 4 4 1 1
6 4 5 1 1
6 4 6 0 0
7 5 1 3 5
7 5 4 2 3
7 5 5 3 5
7 5 2 2 3
Any and all input to these problems would be greatly appreciated.
I use Stata 10 and the panel data is unbalanced.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/