Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: Capturing unique actors meeting conditions by focal actor and year (excluding focal actor)
From
Erik Aadland <[email protected]>
To
<[email protected]>
Subject
RE: st: Capturing unique actors meeting conditions by focal actor and year (excluding focal actor)
Date
Tue, 16 Oct 2012 12:00:40 +0000
This is correct.
So, referring to the results in my previous post.
In year==2000, actor_id == 4|5|6 occur only in project_id==3, and for actor_id== 5 and 6 condition==0. Actor_id==4 should have a mywanted score == 2, while actor_id==5 and 6 should each have a mywanted score == 1. Actor_id == 7 occurs only in project_id==4 this year and has shared projects with none other in this year (and therefore shares no project_id with any actor_id with condition==0) and should have a mywantedscore == 0.
It puzzles me why the suggested code generates correct mywanted scores for the actor_ids in project_id==1 and 2, but not in project_id== 3 and 4.
Kind regards,
Erik.
----------------------------------------
> Date: Tue, 16 Oct 2012 12:44:11 +0100
> Subject: Re: st: Capturing unique actors meeting conditions by focal actor and year (excluding focal actor)
> From: [email protected]
> To: [email protected]
>
> What you asked for, as I understood it, was the total number of distinct actors
>
> 1. that meet a specified condition (condition == 0)
>
> and
>
> 2. with which any actor has shared one or more projects in the same year.
>
> (I am ignoring the word "focal", which you haven't defined and I don't
> understand.)
>
> If you want something else, please give the definition. It's not
> enough (for me) to say that the code gives the wrong answer in some
> cases. Note that my code gives the same answer for each actor, as it
> is a total over all (project, year) possibilities. If you want a
> different count for each (project, year) you'll need to modify the
> code accordingly.
>
> Nick
>
> On Tue, Oct 16, 2012 at 12:25 PM, Erik Aadland <[email protected]> wrote:
> > The code works. Thank you Nick.
> > However, I am experiencing a few problems that I suspect stem from more detailed differences in my data structure. Detailed differences that depart from the structure I previously specified in this post.
> > In particular, I might have some projects in which only one actor is present.
> > Showing by example is perhaps easiest. Here is the result from Nick's code (based on my previously supplied data structure) on a slightly expanded dataset:
> > year project_id actor_id condition proj act mywanted
> > 2000 1 1 1 1 2000 1 2
> > 2000 2 1 1 2 2000 1 2
> > 2000 1 2 0 1 2000 2 1
> > 2000 2 2 0 2 2000 2 1
> > 2000 1 3 0 1 2000 3 1
> > 2000 2 3 0 2 2000 3 1
> > 2000 3 4 1 3 2000 4 4
> > 2000 3 5 0 3 2000 5 2
> > 2000 3 6 0 3 2000 6 2
> > 2000 4 7 0 4 2000 7 2
> > 2001 5 1 1 5 2001 1 2
> > 2001 5 2 0 5 2001 2 1
> > 2001 6 2 0 6 2001 2 1
> > 2001 5 3 0 5 2001 3 1
> > 2001 6 3 0 6 2001 3 1
> > 2001 5 4 1 5 2001 4 4
> > 2001 6 4 1 6 2001 4 4
> > 2001 7 5 0 7 2001 5 2
> > 2001 7 6 0 7 2001 6 2
> > 2001 7 7 0 7 2001 7 2
> > 2001 8 8 0 8 2001 8 0
> >
> > In this result (focusing on year==2000 only now), mywanted scores for actor_id==7 in project_id==4 is incorrect (correct mywanted==0). The mywanted scores for actor_ids in project_id==3 are also incorrect.
> >
> > In year==2001, the mywanted score==0 for actor_id==8 in project_id==8 is on the other hand correct.
> > How get around this? I am sorry that I did not include these structural details in my initial post.
> > Sincerely,
> > Erik.
> >
> > ----------------------------------------
> >> Date: Tue, 16 Oct 2012 11:49:22 +0100
> >> Subject: Re: st: Capturing unique actors meeting conditions by focal actor and year (excluding focal actor)
> >> From: [email protected]
> >> To: [email protected]
> >>
> >> I suspect that you didn't copy all the code. The last line of code
> >> has a brace (curly bracket }) by itself.
> >>
> >> On Tue, Oct 16, 2012 at 11:44 AM, Erik Aadland <[email protected]> wrote:
> >> > Thank you Nick!
> >> > You are quite right. I was imprecise; it is distinct actors I want to capture.
> >> > When I run your suggested code, I get this error message after the following line of code:
> >> >
> >> > qui forval a = 1/`nact' {
> >> > unexpected end of file
> >> > r(612);
> >> >
> >> > What could possibly cause this error message? I am using Stata 10.
> >> > Thanks again and kind regards,
> >> > Erik.
> >> >
> >> >
> >> > ----------------------------------------
> >> >> Date: Tue, 16 Oct 2012 11:19:41 +0100
> >> >> Subject: Re: st: Capturing unique actors meeting conditions by focal actor and year (excluding focal actor)
> >> >> From: [email protected]
> >> >> To: [email protected]
> >> >>
> >> >> First off, on my list of hobby-horses is a prejudice that the word
> >> >> "unique" is misused here, although you are in very good company:
> >> >> StataCorp itself does it in various places, e.g. -codebook-, although
> >> >> I am working on changing their habits if I can. The word "unique"
> >> >> strictly means occurring once only; I recommend the word "distinct"
> >> >> for what you want. There is a longer discussion of terminology in
> >> >>
> >> >> SJ-8-4 dm0042 . . . . . . . . . . . . Speaking Stata: Distinct observations
> >> >> (help distinct if installed) . . . . . . N. J. Cox and G. M. Longton
> >> >> Q4/08 SJ 8(4):557--568
> >> >> shows how to answer questions about distinct observations
> >> >> from first principles; provides a convenience command
> >> >>
> >> >> That said, when faced with a problem like yours, vague ideas of
> >> >> possible solutions rise up. Is this a case for associative arrays as
> >> >> implemented in Mata? Is there a cunning restructuring of the data from
> >> >> which the answer would fall out easily? Precise inspiration was
> >> >> lacking and what seemed crucial was that you need to consider each
> >> >> actor in each combination of project and year. That pointed out to
> >> >> loops over actors _and_ over project-years. Once that idea was taken
> >> >> up, life is usually easier if all identifiers run over the integers
> >> >> from 1 up. Also, the flavour of compiling a list and eventually
> >> >> counting distinct members of other actors suggested -levelsof- and the
> >> >> list manipulation tools documented at -help macrolists-.
> >> >>
> >> >> So, here is my code. Absolutely nothing rules out other kinds of solutions.
> >> >>
> >> >> input year project_id actor_id condition wanted
> >> >> 2000 1 1 1 2
> >> >> 2000 1 2 0 1
> >> >> 2000 1 3 0 1
> >> >> 2000 1 7 1 2
> >> >> 2000 2 1 1 2
> >> >> 2000 2 2 0 1
> >> >> 2000 2 3 0 1
> >> >> 2000 3 4 1 2
> >> >> 2000 3 5 0 1
> >> >> 2000 3 6 0 1
> >> >> 2000 3 . . .
> >> >> 2001 4 1 1 2
> >> >> 2001 4 2 0 1
> >> >> 2001 4 3 0 1
> >> >> end
> >> >>
> >> >> * identifiers guaranteed to run 1 up if the real ones don't!
> >> >> * note that "same project, same year" defines a group
> >> >> egen proj = group(project_id year), label
> >> >> su proj, meanonly
> >> >> local nproj = r(max)
> >> >>
> >> >> egen act = group(actor_id), label
> >> >> su act, meanonly
> >> >> local nact = r(max)
> >> >>
> >> >> gen mywanted = .
> >> >>
> >> >> * lists of those in each project and year and condition == 0
> >> >> qui forval p = 1/`nproj' {
> >> >> levelsof act if proj == `p' & condition == 0, local(who`p')
> >> >> }
> >> >>
> >> >> macro list
> >> >>
> >> >> * now cycle over actors
> >> >> qui forval a = 1/`nact' {
> >> >>
> >> >> * blank out workspace
> >> >> local work
> >> >>
> >> >> * if actor was included, we want to add that list to workspace
> >> >> * in practice -if r(N)- will be true if and only if -r(N)- is positive
> >> >> forval p = 1/`nproj' {
> >> >> count if act == `a' & proj == `p'
> >> >> if r(N) local work `work' `who`p''
> >> >> }
> >> >>
> >> >> * remove duplicates
> >> >> local work : list uniq work
> >> >> * remove this actor
> >> >> local work : list work - a
> >> >> * see what we got for debugging
> >> >> noi di "`a' `work'"
> >> >>
> >> >> replace mywanted = `: list sizeof work' if act == `a'
> >> >> }
> >> >>
> >> >> Nick
> >> >>
> >> >>
> >> >> On Tue, Oct 16, 2012 at 9:52 AM, Erik Aadland <[email protected]> wrote:
> >> >>
> >> >> > I am trying to generate a variable "wanted" that by each focal actor and year captures the total number of unique actors (excluding the focal actor) that meet a specified condition (condition == 0) and that the focal actor has occured together with in one or more projects.
> >> >> > This is my data structure:
> >> >> > year project_id actor_id condition wanted
> >> >> > 2000 1 1 1 2
> >> >> > 2000 1 2 0 1
> >> >> > 2000 1 3 0 1
> >> >> > 2000 1 7 1 2
> >> >> > 2000 2 1 1 2
> >> >> > 2000 2 2 0 1
> >> >> > 2000 2 3 0 1
> >> >> > 2000 3 4 1 2
> >> >> > 2000 3 5 0 1
> >> >> > 2000 3 6 0 1
> >> >> > 2000 3 . . .
> >> >> > 2001 4 1 1 2
> >> >> > 2001 4 2 0 1
> >> >> > 2001 4 3 0 1
> >> >> > .....and so on
> >> >> > So in year == 2000, actor_id == 1 has occurred with 2 unique actor_id (namely 2 and 3) meeting condition == 0 in projects. Therefore, wanted == 2 for actor_id == 1 in year == 2000.
> >> >> > My attempted code (which is quite wrong):
> >> >> > sort actor_id year projects ;
> >> >> > by actor_id year: gen nvals = _n == 1 ;
> >> >> > sort actor_id year project_id ;
> >> >> > egen wanted = total(nvals & condition == 0), by(agency_id year) ;
> >> >> > replace wanted = wanted - (nvals & condition == 0) ;
> >> >>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/