Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Erik Aadland <erikaadland@hotmail.com> |
To | <statalist@hsphsun2.harvard.edu> |
Subject | RE: st: Capturing unique actors meeting conditions by focal actor and year (excluding focal actor) |
Date | Wed, 20 Feb 2013 12:52:03 +0000 |
Thanks again, Nick. This is very helpful. Kind regards, Erik. ---------------------------------------- > Date: Wed, 20 Feb 2013 11:03:43 +0000 > Subject: Re: st: Capturing unique actors meeting conditions by focal actor and year (excluding focal actor) > From: njcoxstata@gmail.com > To: statalist@hsphsun2.harvard.edu > > I added some commentary below. > > On Wed, Feb 20, 2013 at 8:20 AM, Erik Aadland <erikaadland@hotmail.com> wrote: > > Thank you so much, Nick. > > The code appears to work perfectly. > > I will compare this code to the previous code for the related measure and do my best to absorb what is going on. > > Kind regards, > > Erik. > > > > ---------------------------------------- > >> Date: Tue, 19 Feb 2013 19:34:59 +0000 > >> Subject: Re: st: Capturing unique actors meeting conditions by focal actor and year (excluding focal actor) > >> From: njcoxstata@gmail.com > >> To: statalist@hsphsun2.harvard.edu > >> > >> I don't know about smart, but this seems the same kind of problem. > >> Change the order of the loops and let the list of colleagues > >> accumulate from year to year for each actorr. > >> > >> Have a look at this code > > Comment #1. > > The first step is just to create a small toy or sandpit dataset for > which results are easy to derive. Erik provided this dataset himself > and it's always a good idea. Naturally, the full dataset might expose > problems not in the toy dataset, but one problem at a time.... > > >> clear > >> input year project_id actor_id condition > >> 2000 1 1 1 > >> 2000 2 1 1 > >> 2000 1 2 0 > >> 2000 2 2 0 > >> 2000 1 3 0 > >> 2000 2 3 0 > >> 2000 3 4 1 > >> 2000 3 5 0 > >> 2000 3 6 0 > >> 2000 4 7 0 > >> 2001 5 1 1 > >> 2001 5 2 0 > >> 2001 6 2 0 > >> 2001 5 3 0 > >> 2001 6 3 0 > >> 2001 5 4 1 > >> 2001 6 4 1 > >> 2001 7 5 0 > >> 2001 7 6 0 > >> 2001 7 7 0 > >> 2001 8 8 0 > >> > >> end > > Comment #2. > > I create variables using -egen-'s -group()- function that by > construction run 1 ... # of distinct values. Then I can pick up the > number of distinct values from -summarize, meanonly-. The maximum > group identifier is the number required. This is no more than > convenience, to make the loops to come very easy, but convenience > beats its opposite. > > >> egen proj = group(project_id year), label > >> su proj, meanonly > >> local nproj = r(max) > >> > >> egen act = group(actor_id), label > >> su act, meanonly > >> local nact = r(max) > >> > >> egen yr = group(year), label > >> su yr, meanonly > >> local nyr = r(max) > > Comment #3. > > I initialise a counter variable. In essence we assume no co-actors, > unless we find some, in which case we will change the counter. Often > this command is inserted once you realise that the strategy is going > to be > > Loop: > Look at each possibility and work out the result. > Put the result for that possibility in an existing variable. > > The second implies a -replace-, but that in turn requires a previous > -generate- ahead of the loops. > > >> gen mywanted2 = 0 > > Comment #4. > > Now the slope gets steeper! The main difficulty of the problem is the > need to look in a group of other observations for co-actors. I went > for list manipulation. -levelsof- gives you overall lists and the rest > is looping over possibilities. Stuff discussed at -help macrolists- is > invaluable. > > There are yet other possibilities, e.g. it is an open question whether > you would be better off with a different data structure. If the number > of actors on a project is small and their identifiers are of simple > form, then all the identifiers could be stored as values of a string > variable such as "1 3 5 8" and you could then treat the identifiers > using -word()- and -wordcount()-. A wild guess is that this makes some > things easier and some more difficult. > > >> * lists of those in each project and year and condition == 0 > >> qui forval p = 1/`nproj' { > >> levelsof act if proj == `p' & condition == 0, local(who`p') > >> } > >> > >> macro list > >> > >> * now cycle over actors > >> qui forval a = 1/`nact' { > >> > >> * blank out workspace > >> local work > >> > >> * cycle over years > >> qui forval y = 1/`nyr' { > >> > >> * if actor was included, we want to add that list to workspace > >> forval p = 1/`nproj' { > >> count if act == `a' & proj == `p' & yr == `y' > >> if r(N) local work `work' `who`p'' > >> } > >> > >> * remove duplicates > >> local work : list uniq work > >> * remove this actor > >> local work : list work - a > >> * see what we got for debugging > >> noi di "`a' `work'" > >> > >> replace mywanted2 = `: list sizeof work' if act == `a' & yr == `y' > >> } > >> } > > On Tue, Feb 19, 2013 at 4:35 PM, Erik Aadland <erikaadland@hotmail.com> wrote: > > >> > A while back I got assistance from the list for making a separate count, for each actor_id and year, the number of distinct other actors that met a certain condition that the actor_id had occurred together with in projects. > >> > Nick Cox suggested the code below that worked wonderfully. > >> > This code generates a separate count for each actor_id and year. > >> > I now face a new challenge. I would like to generate a similar measure, that makes a cumulative count over each year (rather than for each year). So, if actor_id == 1 collaborated with 2 other distinct actors in 2000, the score for actor_id == 1 would be 2 in 2000. If actor_id == 1 collaborated with one additional distinct actor that met the condition in 2001, the score would increase to 3 in 2001 (if the disctinct actors already counted in the 2000 score were present in projects together with the actor_id in 2001 as well they would not be counted again in 2001). > >> > Is there a smart way to change the code below to generate this new measure? > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/