Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Capturing unique actors meeting conditions by focal actor and year (excluding focal actor)

From	Nick Cox <[email protected]>
To	[email protected]
Subject	Re: st: Capturing unique actors meeting conditions by focal actor and year (excluding focal actor)
Date	Tue, 19 Feb 2013 19:34:59 +0000
I don't know about smart, but this seems the same kind of problem.
Change the order of the loops and let the list of colleagues
accumulate from year to year for each actorr.

Have a look at this code

clear
input year project_id actor_id condition
2000 1 1 1
2000 2 1 1
2000 1 2 0
2000 2 2 0
2000 1 3 0
2000 2 3 0
2000 3 4 1
2000 3 5 0
2000 3 6 0
2000 4 7 0
2001 5 1 1
2001 5 2 0
2001 6 2 0
2001 5 3 0
2001 6 3 0
2001 5 4 1
2001 6 4 1
2001 7 5 0
2001 7 6 0
2001 7 7 0
2001 8 8 0

end

egen proj = group(project_id year), label
su proj, meanonly
local nproj = r(max)

egen act = group(actor_id), label
su act, meanonly
local nact = r(max)

egen yr = group(year), label
su yr, meanonly
local nyr = r(max)

gen mywanted2 = 0

* lists of those in each project and year and condition == 0
qui forval p = 1/`nproj' {
levelsof act if proj == `p' & condition == 0, local(who`p')
}

macro list

* now cycle over actors
qui forval a = 1/`nact' {

* blank out workspace
local work

* cycle over years
qui forval y = 1/`nyr' {

* if actor was included, we want to add that list to workspace
forval p = 1/`nproj' {
count if act == `a' & proj == `p' & yr == `y'
if r(N) local work `work' `who`p''
}

* remove duplicates
local work : list uniq work
* remove this actor
local work : list work - a
* see what we got for debugging
noi di "`a' `work'"

replace mywanted2 = `: list sizeof work' if act == `a' & yr == `y'
}
}



On Tue, Feb 19, 2013 at 4:35 PM, Erik Aadland <[email protected]> wrote:

> A while back I got assistance from the list for making a separate count, for each actor_id and year, the number of distinct other actors that met a certain condition that the actor_id had occurred together with in projects.
> Nick Cox suggested the code below that worked wonderfully.
> This code generates a separate count for each actor_id and year.
> I now face a new challenge. I would like to generate a similar measure, that makes a cumulative count over each year (rather than for each year). So, if actor_id == 1 collaborated with 2 other distinct actors in 2000, the score for actor_id == 1 would be 2 in 2000. If actor_id == 1 collaborated with one additional distinct actor that met the condition in 2001, the score would increase to 3 in 2001 (if the disctinct actors already counted in the 2000 score were present in projects together with the actor_id in 2001 as well they would not be counted again in 2001).
> Is there a smart way to change the code below to generate this new measure?
> Sincerely and kind regards,
> Erik.
>
>
> ----------------------------------------
>> Date: Tue, 16 Oct 2012 14:48:36 +0100
>> Subject: Re: st: Capturing unique actors meeting conditions by focal actor and year (excluding focal actor)
>> From: [email protected]
>> To: [email protected]
>>
>> It seems that you want a separate count for each year.
>>
>> If that's so, the code looks more like
>>
>> clear
>>
>> input year project_id actor_id condition
>> 2000 1 1 1
>> 2000 2 1 1
>> 2000 1 2 0
>> 2000 2 2 0
>> 2000 1 3 0
>> 2000 2 3 0
>> 2000 3 4 1
>> 2000 3 5 0
>> 2000 3 6 0
>> 2000 4 7 0
>> 2001 5 1 1
>> 2001 5 2 0
>> 2001 6 2 0
>> 2001 5 3 0
>> 2001 6 3 0
>> 2001 5 4 1
>> 2001 6 4 1
>> 2001 7 5 0
>> 2001 7 6 0
>> 2001 7 7 0
>> 2001 8 8 0
>>
>> end
>>
>> egen proj = group(project_id year), label
>> su proj, meanonly
>> local nproj = r(max)
>>
>> egen act = group(actor_id), label
>> su act, meanonly
>> local nact = r(max)
>>
>> egen yr = group(year), label
>> su yr, meanonly
>> local nyr = r(max)
>>
>> gen mywanted = .
>>
>> * lists of those in each project and year and condition == 0
>> qui forval p = 1/`nproj' {
>> levelsof act if proj == `p' & condition == 0, local(who`p')
>> }
>>
>> macro list
>>
>> * cycle over years
>>
>> qui forval y = 1/`nyr' {
>>
>> * now cycle over actors
>> qui forval a = 1/`nact' {
>>
>> * blank out workspace
>> local work
>>
>> * if actor was included, we want to add that list to workspace
>> forval p = 1/`nproj' {
>> count if act == `a' & proj == `p' & yr == `y'
>> if r(N) local work `work' `who`p''
>> }
>>
>> * remove duplicates
>> local work : list uniq work
>> * remove this actor
>> local work : list work - a
>> * see what we got for debugging
>> noi di "`a' `work'"
>>
>> replace mywanted = `: list sizeof work' if act == `a' & yr == `y'
>> }
>> }
>>
>>
>>
>>
>> On Tue, Oct 16, 2012 at 1:00 PM, Erik Aadland <[email protected]> wrote:
>> > This is correct.
>> > So, referring to the results in my previous post.
>> > In year==2000, actor_id == 4|5|6 occur only in project_id==3, and for actor_id== 5 and 6 condition==0. Actor_id==4 should have a mywanted score == 2, while actor_id==5 and 6 should each have a mywanted score == 1. Actor_id == 7 occurs only in project_id==4 this year and has shared projects with none other in this year (and therefore shares no project_id with any actor_id with condition==0) and should have a mywantedscore == 0.
>> > It puzzles me why the suggested code generates correct mywanted scores for the actor_ids in project_id==1 and 2, but not in project_id== 3 and 4.
>> > Kind regards,
>> > Erik.
>> >
>> >
>> > ----------------------------------------
>> >> Date: Tue, 16 Oct 2012 12:44:11 +0100
>> >> Subject: Re: st: Capturing unique actors meeting conditions by focal actor and year (excluding focal actor)
>> >> From: [email protected]
>> >> To: [email protected]
>> >>
>> >> What you asked for, as I understood it, was the total number of distinct actors
>> >>
>> >> 1. that meet a specified condition (condition == 0)
>> >>
>> >> and
>> >>
>> >> 2. with which any actor has shared one or more projects in the same year.
>> >>
>> >> (I am ignoring the word "focal", which you haven't defined and I don't
>> >> understand.)
>> >>
>> >> If you want something else, please give the definition. It's not
>> >> enough (for me) to say that the code gives the wrong answer in some
>> >> cases. Note that my code gives the same answer for each actor, as it
>> >> is a total over all (project, year) possibilities. If you want a
>> >> different count for each (project, year) you'll need to modify the
>> >> code accordingly.
>> >>
>> >> Nick
>> >>
>> >> On Tue, Oct 16, 2012 at 12:25 PM, Erik Aadland <[email protected]> wrote:
>> >> > The code works. Thank you Nick.
>> >> > However, I am experiencing a few problems that I suspect stem from more detailed differences in my data structure. Detailed differences that depart from the structure I previously specified in this post.
>> >> > In particular, I might have some projects in which only one actor is present.
>> >> > Showing by example is perhaps easiest. Here is the result from Nick's code (based on my previously supplied data structure) on a slightly expanded dataset:
>> >> > year project_id actor_id condition proj act mywanted
>> >> > 2000 1 1 1 1 2000 1 2
>> >> > 2000 2 1 1 2 2000 1 2
>> >> > 2000 1 2 0 1 2000 2 1
>> >> > 2000 2 2 0 2 2000 2 1
>> >> > 2000 1 3 0 1 2000 3 1
>> >> > 2000 2 3 0 2 2000 3 1
>> >> > 2000 3 4 1 3 2000 4 4
>> >> > 2000 3 5 0 3 2000 5 2
>> >> > 2000 3 6 0 3 2000 6 2
>> >> > 2000 4 7 0 4 2000 7 2
>> >> > 2001 5 1 1 5 2001 1 2
>> >> > 2001 5 2 0 5 2001 2 1
>> >> > 2001 6 2 0 6 2001 2 1
>> >> > 2001 5 3 0 5 2001 3 1
>> >> > 2001 6 3 0 6 2001 3 1
>> >> > 2001 5 4 1 5 2001 4 4
>> >> > 2001 6 4 1 6 2001 4 4
>> >> > 2001 7 5 0 7 2001 5 2
>> >> > 2001 7 6 0 7 2001 6 2
>> >> > 2001 7 7 0 7 2001 7 2
>> >> > 2001 8 8 0 8 2001 8 0
>> >> >
>> >> > In this result (focusing on year==2000 only now), mywanted scores for actor_id==7 in project_id==4 is incorrect (correct mywanted==0). The mywanted scores for actor_ids in project_id==3 are also incorrect.
>> >> >
>> >> > In year==2001, the mywanted score==0 for actor_id==8 in project_id==8 is on the other hand correct.
>> >> > How get around this? I am sorry that I did not include these structural details in my initial post.
>> >> > Sincerely,
>> >> > Erik.
>> >> >
>> >> > ----------------------------------------
>> >> >> Date: Tue, 16 Oct 2012 11:49:22 +0100
>> >> >> Subject: Re: st: Capturing unique actors meeting conditions by focal actor and year (excluding focal actor)
>> >> >> From: [email protected]
>> >> >> To: [email protected]
>> >> >>
>> >> >> I suspect that you didn't copy all the code. The last line of code
>> >> >> has a brace (curly bracket }) by itself.
>> >> >>
>> >> >> On Tue, Oct 16, 2012 at 11:44 AM, Erik Aadland <[email protected]> wrote:
>> >> >> > Thank you Nick!
>> >> >> > You are quite right. I was imprecise; it is distinct actors I want to capture.
>> >> >> > When I run your suggested code, I get this error message after the following line of code:
>> >> >> >
>> >> >> > qui forval a = 1/`nact' {
>> >> >> > unexpected end of file
>> >> >> > r(612);
>> >> >> >
>> >> >> > What could possibly cause this error message? I am using Stata 10.
>> >> >> > Thanks again and kind regards,
>> >> >> > Erik.
>> >> >> >
>> >> >> >
>> >> >> > ----------------------------------------
>> >> >> >> Date: Tue, 16 Oct 2012 11:19:41 +0100
>> >> >> >> Subject: Re: st: Capturing unique actors meeting conditions by focal actor and year (excluding focal actor)
>> >> >> >> From: [email protected]
>> >> >> >> To: [email protected]
>> >> >> >>
>> >> >> >> First off, on my list of hobby-horses is a prejudice that the word
>> >> >> >> "unique" is misused here, although you are in very good company:
>> >> >> >> StataCorp itself does it in various places, e.g. -codebook-, although
>> >> >> >> I am working on changing their habits if I can. The word "unique"
>> >> >> >> strictly means occurring once only; I recommend the word "distinct"
>> >> >> >> for what you want. There is a longer discussion of terminology in
>> >> >> >>
>> >> >> >> SJ-8-4 dm0042 . . . . . . . . . . . . Speaking Stata: Distinct observations
>> >> >> >> (help distinct if installed) . . . . . . N. J. Cox and G. M. Longton
>> >> >> >> Q4/08 SJ 8(4):557--568
>> >> >> >> shows how to answer questions about distinct observations
>> >> >> >> from first principles; provides a convenience command
>> >> >> >>
>> >> >> >> That said, when faced with a problem like yours, vague ideas of
>> >> >> >> possible solutions rise up. Is this a case for associative arrays as
>> >> >> >> implemented in Mata? Is there a cunning restructuring of the data from
>> >> >> >> which the answer would fall out easily? Precise inspiration was
>> >> >> >> lacking and what seemed crucial was that you need to consider each
>> >> >> >> actor in each combination of project and year. That pointed out to
>> >> >> >> loops over actors _and_ over project-years. Once that idea was taken
>> >> >> >> up, life is usually easier if all identifiers run over the integers
>> >> >> >> from 1 up. Also, the flavour of compiling a list and eventually
>> >> >> >> counting distinct members of other actors suggested -levelsof- and the
>> >> >> >> list manipulation tools documented at -help macrolists-.
>> >> >> >>
>> >> >> >> So, here is my code. Absolutely nothing rules out other kinds of solutions.
>> >> >> >>
>> >> >> >> input year project_id actor_id condition wanted
>> >> >> >> 2000 1 1 1 2
>> >> >> >> 2000 1 2 0 1
>> >> >> >> 2000 1 3 0 1
>> >> >> >> 2000 1 7 1 2
>> >> >> >> 2000 2 1 1 2
>> >> >> >> 2000 2 2 0 1
>> >> >> >> 2000 2 3 0 1
>> >> >> >> 2000 3 4 1 2
>> >> >> >> 2000 3 5 0 1
>> >> >> >> 2000 3 6 0 1
>> >> >> >> 2000 3 . . .
>> >> >> >> 2001 4 1 1 2
>> >> >> >> 2001 4 2 0 1
>> >> >> >> 2001 4 3 0 1
>> >> >> >> end
>> >> >> >>
>> >> >> >> * identifiers guaranteed to run 1 up if the real ones don't!
>> >> >> >> * note that "same project, same year" defines a group
>> >> >> >> egen proj = group(project_id year), label
>> >> >> >> su proj, meanonly
>> >> >> >> local nproj = r(max)
>> >> >> >>
>> >> >> >> egen act = group(actor_id), label
>> >> >> >> su act, meanonly
>> >> >> >> local nact = r(max)
>> >> >> >>
>> >> >> >> gen mywanted = .
>> >> >> >>
>> >> >> >> * lists of those in each project and year and condition == 0
>> >> >> >> qui forval p = 1/`nproj' {
>> >> >> >> levelsof act if proj == `p' & condition == 0, local(who`p')
>> >> >> >> }
>> >> >> >>
>> >> >> >> macro list
>> >> >> >>
>> >> >> >> * now cycle over actors
>> >> >> >> qui forval a = 1/`nact' {
>> >> >> >>
>> >> >> >> * blank out workspace
>> >> >> >> local work
>> >> >> >>
>> >> >> >> * if actor was included, we want to add that list to workspace
>> >> >> >> * in practice -if r(N)- will be true if and only if -r(N)- is positive
>> >> >> >> forval p = 1/`nproj' {
>> >> >> >> count if act == `a' & proj == `p'
>> >> >> >> if r(N) local work `work' `who`p''
>> >> >> >> }
>> >> >> >>
>> >> >> >> * remove duplicates
>> >> >> >> local work : list uniq work
>> >> >> >> * remove this actor
>> >> >> >> local work : list work - a
>> >> >> >> * see what we got for debugging
>> >> >> >> noi di "`a' `work'"
>> >> >> >>
>> >> >> >> replace mywanted = `: list sizeof work' if act == `a'
>> >> >> >> }
>> >> >> >>
>> >> >> >> Nick
>> >> >> >>
>> >> >> >>
>> >> >> >> On Tue, Oct 16, 2012 at 9:52 AM, Erik Aadland <[email protected]> wrote:
>> >> >> >>
>> >> >> >> > I am trying to generate a variable "wanted" that by each focal actor and year captures the total number of unique actors (excluding the focal actor) that meet a specified condition (condition == 0) and that the focal actor has occured together with in one or more projects.
>> >> >> >> > This is my data structure:
>> >> >> >> > year project_id actor_id condition wanted
>> >> >> >> > 2000 1 1 1 2
>> >> >> >> > 2000 1 2 0 1
>> >> >> >> > 2000 1 3 0 1
>> >> >> >> > 2000 1 7 1 2
>> >> >> >> > 2000 2 1 1 2
>> >> >> >> > 2000 2 2 0 1
>> >> >> >> > 2000 2 3 0 1
>> >> >> >> > 2000 3 4 1 2
>> >> >> >> > 2000 3 5 0 1
>> >> >> >> > 2000 3 6 0 1
>> >> >> >> > 2000 3 . . .
>> >> >> >> > 2001 4 1 1 2
>> >> >> >> > 2001 4 2 0 1
>> >> >> >> > 2001 4 3 0 1
>> >> >> >> > .....and so on
>> >> >> >> > So in year == 2000, actor_id == 1 has occurred with 2 unique actor_id (namely 2 and 3) meeting condition == 0 in projects. Therefore, wanted == 2 for actor_id == 1 in year == 2000.
>> >> >> >> > My attempted code (which is quite wrong):
>> >> >> >> > sort actor_id year projects ;
>> >> >> >> > by actor_id year: gen nvals = _n == 1 ;
>> >> >> >> > sort actor_id year project_id ;
>> >> >> >> > egen wanted = total(nvals & condition == 0), by(agency_id year) ;
>> >> >> >> > replace wanted = wanted - (nvals & condition == 0) ;

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
Follow-Ups:
- RE: st: Capturing unique actors meeting conditions by focal actor and year (excluding focal actor)
  - From: Erik Aadland <[email protected]>
References:
- RE: st: Capturing unique actors meeting conditions by focal actor and year (excluding focal actor)
  - From: Erik Aadland <[email protected]>
Prev by Date: st: re use of age as a time scale in Cox Models
Next by Date: st: ttest and svy
Previous by thread: RE: st: Capturing unique actors meeting conditions by focal actor and year (excluding focal actor)
Next by thread: RE: st: Capturing unique actors meeting conditions by focal actor and year (excluding focal actor)
Index(es):
- Date
- Thread