Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: Capturing unique actors meeting conditions by focal actor and year (excluding focal actor)
From
Erik Aadland <[email protected]>
To
<[email protected]>
Subject
RE: st: Capturing unique actors meeting conditions by focal actor and year (excluding focal actor)
Date
Tue, 19 Feb 2013 16:35:36 +0000
Dear Statalist.
A while back I got assistance from the list for making a separate count, for each actor_id and year, the number of distinct other actors that met a certain condition that the actor_id had occurred together with in projects.
Nick Cox suggested the code below that worked wonderfully.
This code generates a separate count for each actor_id and year.
I now face a new challenge. I would like to generate a similar measure, that makes a cumulative count over each year (rather than for each year). So, if actor_id == 1 collaborated with 2 other distinct actors in 2000, the score for actor_id == 1 would be 2 in 2000. If actor_id == 1 collaborated with one additional distinct actor that met the condition in 2001, the score would increase to 3 in 2001 (if the disctinct actors already counted in the 2000 score were present in projects together with the actor_id in 2001 as well they would not be counted again in 2001).
Is there a smart way to change the code below to generate this new measure?
Sincerely and kind regards,
Erik.
----------------------------------------
> Date: Tue, 16 Oct 2012 14:48:36 +0100
> Subject: Re: st: Capturing unique actors meeting conditions by focal actor and year (excluding focal actor)
> From: [email protected]
> To: [email protected]
>
> It seems that you want a separate count for each year.
>
> If that's so, the code looks more like
>
> clear
>
> input year project_id actor_id condition
> 2000 1 1 1
> 2000 2 1 1
> 2000 1 2 0
> 2000 2 2 0
> 2000 1 3 0
> 2000 2 3 0
> 2000 3 4 1
> 2000 3 5 0
> 2000 3 6 0
> 2000 4 7 0
> 2001 5 1 1
> 2001 5 2 0
> 2001 6 2 0
> 2001 5 3 0
> 2001 6 3 0
> 2001 5 4 1
> 2001 6 4 1
> 2001 7 5 0
> 2001 7 6 0
> 2001 7 7 0
> 2001 8 8 0
>
> end
>
> egen proj = group(project_id year), label
> su proj, meanonly
> local nproj = r(max)
>
> egen act = group(actor_id), label
> su act, meanonly
> local nact = r(max)
>
> egen yr = group(year), label
> su yr, meanonly
> local nyr = r(max)
>
> gen mywanted = .
>
> * lists of those in each project and year and condition == 0
> qui forval p = 1/`nproj' {
> levelsof act if proj == `p' & condition == 0, local(who`p')
> }
>
> macro list
>
> * cycle over years
>
> qui forval y = 1/`nyr' {
>
> * now cycle over actors
> qui forval a = 1/`nact' {
>
> * blank out workspace
> local work
>
> * if actor was included, we want to add that list to workspace
> forval p = 1/`nproj' {
> count if act == `a' & proj == `p' & yr == `y'
> if r(N) local work `work' `who`p''
> }
>
> * remove duplicates
> local work : list uniq work
> * remove this actor
> local work : list work - a
> * see what we got for debugging
> noi di "`a' `work'"
>
> replace mywanted = `: list sizeof work' if act == `a' & yr == `y'
> }
> }
>
>
>
>
> On Tue, Oct 16, 2012 at 1:00 PM, Erik Aadland <[email protected]> wrote:
> > This is correct.
> > So, referring to the results in my previous post.
> > In year==2000, actor_id == 4|5|6 occur only in project_id==3, and for actor_id== 5 and 6 condition==0. Actor_id==4 should have a mywanted score == 2, while actor_id==5 and 6 should each have a mywanted score == 1. Actor_id == 7 occurs only in project_id==4 this year and has shared projects with none other in this year (and therefore shares no project_id with any actor_id with condition==0) and should have a mywantedscore == 0.
> > It puzzles me why the suggested code generates correct mywanted scores for the actor_ids in project_id==1 and 2, but not in project_id== 3 and 4.
> > Kind regards,
> > Erik.
> >
> >
> > ----------------------------------------
> >> Date: Tue, 16 Oct 2012 12:44:11 +0100
> >> Subject: Re: st: Capturing unique actors meeting conditions by focal actor and year (excluding focal actor)
> >> From: [email protected]
> >> To: [email protected]
> >>
> >> What you asked for, as I understood it, was the total number of distinct actors
> >>
> >> 1. that meet a specified condition (condition == 0)
> >>
> >> and
> >>
> >> 2. with which any actor has shared one or more projects in the same year.
> >>
> >> (I am ignoring the word "focal", which you haven't defined and I don't
> >> understand.)
> >>
> >> If you want something else, please give the definition. It's not
> >> enough (for me) to say that the code gives the wrong answer in some
> >> cases. Note that my code gives the same answer for each actor, as it
> >> is a total over all (project, year) possibilities. If you want a
> >> different count for each (project, year) you'll need to modify the
> >> code accordingly.
> >>
> >> Nick
> >>
> >> On Tue, Oct 16, 2012 at 12:25 PM, Erik Aadland <[email protected]> wrote:
> >> > The code works. Thank you Nick.
> >> > However, I am experiencing a few problems that I suspect stem from more detailed differences in my data structure. Detailed differences that depart from the structure I previously specified in this post.
> >> > In particular, I might have some projects in which only one actor is present.
> >> > Showing by example is perhaps easiest. Here is the result from Nick's code (based on my previously supplied data structure) on a slightly expanded dataset:
> >> > year project_id actor_id condition proj act mywanted
> >> > 2000 1 1 1 1 2000 1 2
> >> > 2000 2 1 1 2 2000 1 2
> >> > 2000 1 2 0 1 2000 2 1
> >> > 2000 2 2 0 2 2000 2 1
> >> > 2000 1 3 0 1 2000 3 1
> >> > 2000 2 3 0 2 2000 3 1
> >> > 2000 3 4 1 3 2000 4 4
> >> > 2000 3 5 0 3 2000 5 2
> >> > 2000 3 6 0 3 2000 6 2
> >> > 2000 4 7 0 4 2000 7 2
> >> > 2001 5 1 1 5 2001 1 2
> >> > 2001 5 2 0 5 2001 2 1
> >> > 2001 6 2 0 6 2001 2 1
> >> > 2001 5 3 0 5 2001 3 1
> >> > 2001 6 3 0 6 2001 3 1
> >> > 2001 5 4 1 5 2001 4 4
> >> > 2001 6 4 1 6 2001 4 4
> >> > 2001 7 5 0 7 2001 5 2
> >> > 2001 7 6 0 7 2001 6 2
> >> > 2001 7 7 0 7 2001 7 2
> >> > 2001 8 8 0 8 2001 8 0
> >> >
> >> > In this result (focusing on year==2000 only now), mywanted scores for actor_id==7 in project_id==4 is incorrect (correct mywanted==0). The mywanted scores for actor_ids in project_id==3 are also incorrect.
> >> >
> >> > In year==2001, the mywanted score==0 for actor_id==8 in project_id==8 is on the other hand correct.
> >> > How get around this? I am sorry that I did not include these structural details in my initial post.
> >> > Sincerely,
> >> > Erik.
> >> >
> >> > ----------------------------------------
> >> >> Date: Tue, 16 Oct 2012 11:49:22 +0100
> >> >> Subject: Re: st: Capturing unique actors meeting conditions by focal actor and year (excluding focal actor)
> >> >> From: [email protected]
> >> >> To: [email protected]
> >> >>
> >> >> I suspect that you didn't copy all the code. The last line of code
> >> >> has a brace (curly bracket }) by itself.
> >> >>
> >> >> On Tue, Oct 16, 2012 at 11:44 AM, Erik Aadland <[email protected]> wrote:
> >> >> > Thank you Nick!
> >> >> > You are quite right. I was imprecise; it is distinct actors I want to capture.
> >> >> > When I run your suggested code, I get this error message after the following line of code:
> >> >> >
> >> >> > qui forval a = 1/`nact' {
> >> >> > unexpected end of file
> >> >> > r(612);
> >> >> >
> >> >> > What could possibly cause this error message? I am using Stata 10.
> >> >> > Thanks again and kind regards,
> >> >> > Erik.
> >> >> >
> >> >> >
> >> >> > ----------------------------------------
> >> >> >> Date: Tue, 16 Oct 2012 11:19:41 +0100
> >> >> >> Subject: Re: st: Capturing unique actors meeting conditions by focal actor and year (excluding focal actor)
> >> >> >> From: [email protected]
> >> >> >> To: [email protected]
> >> >> >>
> >> >> >> First off, on my list of hobby-horses is a prejudice that the word
> >> >> >> "unique" is misused here, although you are in very good company:
> >> >> >> StataCorp itself does it in various places, e.g. -codebook-, although
> >> >> >> I am working on changing their habits if I can. The word "unique"
> >> >> >> strictly means occurring once only; I recommend the word "distinct"
> >> >> >> for what you want. There is a longer discussion of terminology in
> >> >> >>
> >> >> >> SJ-8-4 dm0042 . . . . . . . . . . . . Speaking Stata: Distinct observations
> >> >> >> (help distinct if installed) . . . . . . N. J. Cox and G. M. Longton
> >> >> >> Q4/08 SJ 8(4):557--568
> >> >> >> shows how to answer questions about distinct observations
> >> >> >> from first principles; provides a convenience command
> >> >> >>
> >> >> >> That said, when faced with a problem like yours, vague ideas of
> >> >> >> possible solutions rise up. Is this a case for associative arrays as
> >> >> >> implemented in Mata? Is there a cunning restructuring of the data from
> >> >> >> which the answer would fall out easily? Precise inspiration was
> >> >> >> lacking and what seemed crucial was that you need to consider each
> >> >> >> actor in each combination of project and year. That pointed out to
> >> >> >> loops over actors _and_ over project-years. Once that idea was taken
> >> >> >> up, life is usually easier if all identifiers run over the integers
> >> >> >> from 1 up. Also, the flavour of compiling a list and eventually
> >> >> >> counting distinct members of other actors suggested -levelsof- and the
> >> >> >> list manipulation tools documented at -help macrolists-.
> >> >> >>
> >> >> >> So, here is my code. Absolutely nothing rules out other kinds of solutions.
> >> >> >>
> >> >> >> input year project_id actor_id condition wanted
> >> >> >> 2000 1 1 1 2
> >> >> >> 2000 1 2 0 1
> >> >> >> 2000 1 3 0 1
> >> >> >> 2000 1 7 1 2
> >> >> >> 2000 2 1 1 2
> >> >> >> 2000 2 2 0 1
> >> >> >> 2000 2 3 0 1
> >> >> >> 2000 3 4 1 2
> >> >> >> 2000 3 5 0 1
> >> >> >> 2000 3 6 0 1
> >> >> >> 2000 3 . . .
> >> >> >> 2001 4 1 1 2
> >> >> >> 2001 4 2 0 1
> >> >> >> 2001 4 3 0 1
> >> >> >> end
> >> >> >>
> >> >> >> * identifiers guaranteed to run 1 up if the real ones don't!
> >> >> >> * note that "same project, same year" defines a group
> >> >> >> egen proj = group(project_id year), label
> >> >> >> su proj, meanonly
> >> >> >> local nproj = r(max)
> >> >> >>
> >> >> >> egen act = group(actor_id), label
> >> >> >> su act, meanonly
> >> >> >> local nact = r(max)
> >> >> >>
> >> >> >> gen mywanted = .
> >> >> >>
> >> >> >> * lists of those in each project and year and condition == 0
> >> >> >> qui forval p = 1/`nproj' {
> >> >> >> levelsof act if proj == `p' & condition == 0, local(who`p')
> >> >> >> }
> >> >> >>
> >> >> >> macro list
> >> >> >>
> >> >> >> * now cycle over actors
> >> >> >> qui forval a = 1/`nact' {
> >> >> >>
> >> >> >> * blank out workspace
> >> >> >> local work
> >> >> >>
> >> >> >> * if actor was included, we want to add that list to workspace
> >> >> >> * in practice -if r(N)- will be true if and only if -r(N)- is positive
> >> >> >> forval p = 1/`nproj' {
> >> >> >> count if act == `a' & proj == `p'
> >> >> >> if r(N) local work `work' `who`p''
> >> >> >> }
> >> >> >>
> >> >> >> * remove duplicates
> >> >> >> local work : list uniq work
> >> >> >> * remove this actor
> >> >> >> local work : list work - a
> >> >> >> * see what we got for debugging
> >> >> >> noi di "`a' `work'"
> >> >> >>
> >> >> >> replace mywanted = `: list sizeof work' if act == `a'
> >> >> >> }
> >> >> >>
> >> >> >> Nick
> >> >> >>
> >> >> >>
> >> >> >> On Tue, Oct 16, 2012 at 9:52 AM, Erik Aadland <[email protected]> wrote:
> >> >> >>
> >> >> >> > I am trying to generate a variable "wanted" that by each focal actor and year captures the total number of unique actors (excluding the focal actor) that meet a specified condition (condition == 0) and that the focal actor has occured together with in one or more projects.
> >> >> >> > This is my data structure:
> >> >> >> > year project_id actor_id condition wanted
> >> >> >> > 2000 1 1 1 2
> >> >> >> > 2000 1 2 0 1
> >> >> >> > 2000 1 3 0 1
> >> >> >> > 2000 1 7 1 2
> >> >> >> > 2000 2 1 1 2
> >> >> >> > 2000 2 2 0 1
> >> >> >> > 2000 2 3 0 1
> >> >> >> > 2000 3 4 1 2
> >> >> >> > 2000 3 5 0 1
> >> >> >> > 2000 3 6 0 1
> >> >> >> > 2000 3 . . .
> >> >> >> > 2001 4 1 1 2
> >> >> >> > 2001 4 2 0 1
> >> >> >> > 2001 4 3 0 1
> >> >> >> > .....and so on
> >> >> >> > So in year == 2000, actor_id == 1 has occurred with 2 unique actor_id (namely 2 and 3) meeting condition == 0 in projects. Therefore, wanted == 2 for actor_id == 1 in year == 2000.
> >> >> >> > My attempted code (which is quite wrong):
> >> >> >> > sort actor_id year projects ;
> >> >> >> > by actor_id year: gen nvals = _n == 1 ;
> >> >> >> > sort actor_id year project_id ;
> >> >> >> > egen wanted = total(nvals & condition == 0), by(agency_id year) ;
> >> >> >> > replace wanted = wanted - (nvals & condition == 0) ;
> >> >> >>
> >>
> >> *
> >> * For searches and help try:
> >> * http://www.stata.com/help.cgi?search
> >> * http://www.stata.com/support/faqs/resources/statalist-faq/
> >> * http://www.ats.ucla.edu/stat/stata/
> > *
> > * For searches and help try:
> > * http://www.stata.com/help.cgi?search
> > * http://www.stata.com/support/faqs/resources/statalist-faq/
> > * http://www.ats.ucla.edu/stat/stata/
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/