Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Nick Cox <njcoxstata@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: Capturing unique actors meeting conditions by focal actor and year (excluding focal actor) |
Date | Tue, 16 Oct 2012 11:19:41 +0100 |
First off, on my list of hobby-horses is a prejudice that the word "unique" is misused here, although you are in very good company: StataCorp itself does it in various places, e.g. -codebook-, although I am working on changing their habits if I can. The word "unique" strictly means occurring once only; I recommend the word "distinct" for what you want. There is a longer discussion of terminology in SJ-8-4 dm0042 . . . . . . . . . . . . Speaking Stata: Distinct observations (help distinct if installed) . . . . . . N. J. Cox and G. M. Longton Q4/08 SJ 8(4):557--568 shows how to answer questions about distinct observations from first principles; provides a convenience command That said, when faced with a problem like yours, vague ideas of possible solutions rise up. Is this a case for associative arrays as implemented in Mata? Is there a cunning restructuring of the data from which the answer would fall out easily? Precise inspiration was lacking and what seemed crucial was that you need to consider each actor in each combination of project and year. That pointed out to loops over actors _and_ over project-years. Once that idea was taken up, life is usually easier if all identifiers run over the integers from 1 up. Also, the flavour of compiling a list and eventually counting distinct members of other actors suggested -levelsof- and the list manipulation tools documented at -help macrolists-. So, here is my code. Absolutely nothing rules out other kinds of solutions. input year project_id actor_id condition wanted 2000 1 1 1 2 2000 1 2 0 1 2000 1 3 0 1 2000 1 7 1 2 2000 2 1 1 2 2000 2 2 0 1 2000 2 3 0 1 2000 3 4 1 2 2000 3 5 0 1 2000 3 6 0 1 2000 3 . . . 2001 4 1 1 2 2001 4 2 0 1 2001 4 3 0 1 end * identifiers guaranteed to run 1 up if the real ones don't! * note that "same project, same year" defines a group egen proj = group(project_id year), label su proj, meanonly local nproj = r(max) egen act = group(actor_id), label su act, meanonly local nact = r(max) gen mywanted = . * lists of those in each project and year and condition == 0 qui forval p = 1/`nproj' { levelsof act if proj == `p' & condition == 0, local(who`p') } macro list * now cycle over actors qui forval a = 1/`nact' { * blank out workspace local work * if actor was included, we want to add that list to workspace * in practice -if r(N)- will be true if and only if -r(N)- is positive forval p = 1/`nproj' { count if act == `a' & proj == `p' if r(N) local work `work' `who`p'' } * remove duplicates local work : list uniq work * remove this actor local work : list work - a * see what we got for debugging noi di "`a' `work'" replace mywanted = `: list sizeof work' if act == `a' } Nick On Tue, Oct 16, 2012 at 9:52 AM, Erik Aadland <erikaadland@hotmail.com> wrote: > I am trying to generate a variable "wanted" that by each focal actor and year captures the total number of unique actors (excluding the focal actor) that meet a specified condition (condition == 0) and that the focal actor has occured together with in one or more projects. > This is my data structure: > year project_id actor_id condition wanted > 2000 1 1 1 2 > 2000 1 2 0 1 > 2000 1 3 0 1 > 2000 1 7 1 2 > 2000 2 1 1 2 > 2000 2 2 0 1 > 2000 2 3 0 1 > 2000 3 4 1 2 > 2000 3 5 0 1 > 2000 3 6 0 1 > 2000 3 . . . > 2001 4 1 1 2 > 2001 4 2 0 1 > 2001 4 3 0 1 > .....and so on > So in year == 2000, actor_id == 1 has occurred with 2 unique actor_id (namely 2 and 3) meeting condition == 0 in projects. Therefore, wanted == 2 for actor_id == 1 in year == 2000. > My attempted code (which is quite wrong): > sort actor_id year projects ; > by actor_id year: gen nvals = _n == 1 ; > sort actor_id year project_id ; > egen wanted = total(nvals & condition == 0), by(agency_id year) ; > replace wanted = wanted - (nvals & condition == 0) ; * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/