Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Capturing unique actors meeting conditions by focal actor and year (excluding focal actor)
From
Nick Cox <[email protected]>
To
[email protected]
Subject
Re: st: Capturing unique actors meeting conditions by focal actor and year (excluding focal actor)
Date
Tue, 16 Oct 2012 11:19:41 +0100
First off, on my list of hobby-horses is a prejudice that the word
"unique" is misused here, although you are in very good company:
StataCorp itself does it in various places, e.g. -codebook-, although
I am working on changing their habits if I can. The word "unique"
strictly means occurring once only; I recommend the word "distinct"
for what you want. There is a longer discussion of terminology in
SJ-8-4 dm0042 . . . . . . . . . . . . Speaking Stata: Distinct observations
(help distinct if installed) . . . . . . N. J. Cox and G. M. Longton
Q4/08 SJ 8(4):557--568
shows how to answer questions about distinct observations
from first principles; provides a convenience command
That said, when faced with a problem like yours, vague ideas of
possible solutions rise up. Is this a case for associative arrays as
implemented in Mata? Is there a cunning restructuring of the data from
which the answer would fall out easily? Precise inspiration was
lacking and what seemed crucial was that you need to consider each
actor in each combination of project and year. That pointed out to
loops over actors _and_ over project-years. Once that idea was taken
up, life is usually easier if all identifiers run over the integers
from 1 up. Also, the flavour of compiling a list and eventually
counting distinct members of other actors suggested -levelsof- and the
list manipulation tools documented at -help macrolists-.
So, here is my code. Absolutely nothing rules out other kinds of solutions.
input year project_id actor_id condition wanted
2000 1 1 1 2
2000 1 2 0 1
2000 1 3 0 1
2000 1 7 1 2
2000 2 1 1 2
2000 2 2 0 1
2000 2 3 0 1
2000 3 4 1 2
2000 3 5 0 1
2000 3 6 0 1
2000 3 . . .
2001 4 1 1 2
2001 4 2 0 1
2001 4 3 0 1
end
* identifiers guaranteed to run 1 up if the real ones don't!
* note that "same project, same year" defines a group
egen proj = group(project_id year), label
su proj, meanonly
local nproj = r(max)
egen act = group(actor_id), label
su act, meanonly
local nact = r(max)
gen mywanted = .
* lists of those in each project and year and condition == 0
qui forval p = 1/`nproj' {
levelsof act if proj == `p' & condition == 0, local(who`p')
}
macro list
* now cycle over actors
qui forval a = 1/`nact' {
* blank out workspace
local work
* if actor was included, we want to add that list to workspace
* in practice -if r(N)- will be true if and only if -r(N)- is positive
forval p = 1/`nproj' {
count if act == `a' & proj == `p'
if r(N) local work `work' `who`p''
}
* remove duplicates
local work : list uniq work
* remove this actor
local work : list work - a
* see what we got for debugging
noi di "`a' `work'"
replace mywanted = `: list sizeof work' if act == `a'
}
Nick
On Tue, Oct 16, 2012 at 9:52 AM, Erik Aadland <[email protected]> wrote:
> I am trying to generate a variable "wanted" that by each focal actor and year captures the total number of unique actors (excluding the focal actor) that meet a specified condition (condition == 0) and that the focal actor has occured together with in one or more projects.
> This is my data structure:
> year project_id actor_id condition wanted
> 2000 1 1 1 2
> 2000 1 2 0 1
> 2000 1 3 0 1
> 2000 1 7 1 2
> 2000 2 1 1 2
> 2000 2 2 0 1
> 2000 2 3 0 1
> 2000 3 4 1 2
> 2000 3 5 0 1
> 2000 3 6 0 1
> 2000 3 . . .
> 2001 4 1 1 2
> 2001 4 2 0 1
> 2001 4 3 0 1
> .....and so on
> So in year == 2000, actor_id == 1 has occurred with 2 unique actor_id (namely 2 and 3) meeting condition == 0 in projects. Therefore, wanted == 2 for actor_id == 1 in year == 2000.
> My attempted code (which is quite wrong):
> sort actor_id year projects ;
> by actor_id year: gen nvals = _n == 1 ;
> sort actor_id year project_id ;
> egen wanted = total(nvals & condition == 0), by(agency_id year) ;
> replace wanted = wanted - (nvals & condition == 0) ;
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/