Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Social Network Analysis shortest path centrality
From
Michael Goodwin <[email protected]>
To
[email protected]
Subject
Re: st: Social Network Analysis shortest path centrality
Date
Wed, 4 Sep 2013 11:10:22 -0400
This is very useful and more efficient than what I had put together.
The final piece of the puzzle is identifying which observations
contain the shortest path between the "co" and a given node. In the
example above, in the third observation, co 1 is connected to 9 in
level4. However, in the tenth observation, co 1 is connected to 9 in
level1. Is there an efficient way to replace level with a missing
value if that combination of co and level already exists and is a
shorter path? I'm working on this final piece now, but would love to
hear any thoughts. Thanks in advance for your help.
On Wed, Sep 4, 2013 at 8:54 AM, Robert Picard <[email protected]> wrote:
> You can stop a list of connections from looping back onto
> itself by replacing target with a missing value if it
> identifies a company that is already part of the list for
> that observation. This can easily be done using -inlist()-.
> Since -inlist()- does not handle a regular varlist, you can
> use a local macro to build-up the comma separated variable
> list as you go.
>
> Note that it really helps to create a toy dataset to sort
> out these problems. Your proposed code does not stop looping
> when used with the test data I'm using.
>
> * --------------------- begin example ---------------------
> clear
> input source target
> 1 2
> 1 5
> 1 6
> 1 9
> 2 3
> 2 5
> 2 7
> 5 8
> 5 6
> 8 9
> 8 1
> end
> tempfile main
> save "`main'"
> * make sure the data has no duplicates
> isid source target
>
> rename source co
> local i 0
> local more 1
> local isdone co
> while `more' {
> local ++i
> rename target source
> joinby source using "`main'", unmatched(master)
> drop _merge
> rename source level`i'
> replace target = . if inlist(target,`isdone')
> local isdone `isdone', level`i'
> sort co level`i'
> by co level`i': gen one = _n == 1 & !mi(level`i')
> by co: egen connect`i' = total(one)
> drop one
> count if !mi(target)
> local more = r(N)
> }
> drop target
> sort co level*
> order co level* connect*
> list, sepby(co) noobs
> * --------------------- end example -----------------------
>
> On Tue, Sep 3, 2013 at 6:15 PM, Michael Goodwin
> <[email protected]> wrote:
>> I think I've figured out the issue. Using this approach, when dropping
>> values that are equal, you need to ensure that you aren't dropping
>> null values. Below is the code to drop only matching non-null values.
>> Following the loop are the final few lines to calculate centrality
>> using the approach I mentioned in my initial post. Thanks for your
>> help!
>>
>> * Create dataset containing all connections;
>> rename Source company;
>> local i 0;
>> local more 1;
>> while `more' {;
>> local ++i;
>> rename Target Source;
>> joinby Source using `main', unmatched(master);
>> drop _merge;
>> rename Source level`i';
>> sort company level`i';
>> forv num = 1/`i' {;
>> local count = `i'-`num';
>> cap drop if level`i'==level`count' & level`i'!="";
>> cap drop if company==level`count';
>> };
>> by company level`i': gen one = _n == 1 & !mi(level`i');
>> by company: egen connect`i' = total(one);
>> drop one;
>> count if !mi(Target);
>> local more = r(N);
>> qui compress;
>> };
>>
>> * Centrality;
>> egen totalPath = rownonmiss(connect*), strok;
>> local maxPath = totalPath;
>> forv num = 1/`maxPath' {;
>> gen connectCentrality`num' = connect`num'/`num';
>> };
>> egen centrality = rowtotal(connectCentrality*);
>>
>> On Tue, Sep 3, 2013 at 5:10 PM, Michael Goodwin
>> <[email protected]> wrote:
>>> Robert, this is very helpful and actually not so far from what I had
>>> originally coded out.
>>>
>>> The only issue I am encountering now is that some of the networks
>>> actually double back on themselves, so that the loop you've written
>>> out continues on infinitely (or would if Stata didn't have observation
>>> limits).
>>>
>>> I think the solution is to write code that drops any observations in
>>> which the level`i' variable is equal to any of the preceding
>>> level[`i'-1], level[`i'-2], etc. variables. Do you have any thoughts
>>> on how to best accomplish that? My thinking was to create a loop that
>>> compares the current level with each previous level and drops the
>>> observation if any two values match. I have to use capture because
>>> there is no level0. This doesn't seem to be working using my dataset.
>>>
>>>
>>> * --------------------- begin example ---------------------
>>> clear
>>> input Source Target
>>> 1 2
>>> 1 5
>>> 1 6
>>> 1 9
>>> 2 3
>>> 2 5
>>> 2 7
>>> 5 8
>>> 5 6
>>> 8 9
>>> end
>>> tempfile main
>>> save "`main'"
>>> * make sure the data has no duplicates
>>> isid Source Target
>>>
>>> rename Source co;
>>> local i 0;
>>> local more 1;
>>> while `more' {;
>>> local ++i;
>>> rename Target Source;
>>> joinby Source using `main', unmatched(master);
>>> drop _merge;
>>> rename Source level`i';
>>> sort co level`i';
>>> forv num = 1/`i' {;
>>> local count = `i'-`num';
>>> cap drop if level`i'==level`count';
>>> cap drop if co==level`count';
>>> };
>>> by co level`i': gen one = _n == 1 & !mi(level`i');
>>> by co: egen connect`i' = total(one);
>>> drop one;
>>> count if !mi(Target);
>>> local more = r(N);
>>> qui compress;
>>> };
>>> drop Target;
>>> sort co level*;
>>> order co level* connect*;
>>> list, sepby(co) noobs;
>>> * --------------------- end example -----------------------
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Sun, Sep 1, 2013 at 11:44 AM, Robert Picard <[email protected]> wrote:
>>>> There is indeed a problem in merging the list with itself
>>>> as it leads to many-to-many merges and I have yet to see
>>>> one case where an m:m merge is useful. You can however use
>>>> -joinby- to perform what you intuitively would want -merge-
>>>> to do in this case.
>>>>
>>>> * --------------------- begin example ---------------------
>>>> clear
>>>> input source target
>>>> 1 2
>>>> 1 5
>>>> 1 6
>>>> 1 9
>>>> 2 3
>>>> 2 5
>>>> 2 7
>>>> 5 8
>>>> 5 6
>>>> 8 9
>>>> end
>>>> tempfile main
>>>> save "`main'"
>>>> * make sure the data has no duplicates
>>>> isid source target
>>>>
>>>> rename source co
>>>> local i 0
>>>> local more 1
>>>> while `more' {
>>>> local ++i
>>>> rename target source
>>>> joinby source using "`main'", unmatched(master)
>>>> drop _merge
>>>> rename source level`i'
>>>> sort co level`i'
>>>> by co level`i': gen one = _n == 1 & !mi(level`i')
>>>> by co: egen connect`i' = total(one)
>>>> drop one
>>>> count if !mi(target)
>>>> local more = r(N)
>>>> }
>>>> drop target
>>>> sort co level*
>>>> order co level* connect*
>>>> list, sepby(co) noobs
>>>> * --------------------- end example -----------------------
>>>>
>>>> Original message follows:
>>>>
>>>> st: Social Network Analysis shortest path centrality
>>>>
>>>> From Michael Goodwin <[email protected]>
>>>> To [email protected]
>>>> Subject st: Social Network Analysis shortest path centrality
>>>> Date Fri, 30 Aug 2013 13:53:16 -0400
>>>> Hi,
>>>>
>>>> I am trying to do some light social network analysis on a dataset
>>>> containing a list of edges. I have the dataset organized such that
>>>> there are two variables, Source and Target. Bot the Source and Target
>>>> are companies, and the connection between indicates that an employee
>>>> from Source went on to found Target. The relationship between these
>>>> two variables is indeterminate (i.e. m:m) and although the variables
>>>> start as strings, I've converted them to numeric values using encode
>>>> (and ensured
>>>> that the numeric values in both Target and Source are equal to one another).
>>>>
>>>> I am attempting to determine the number of first, second, third,...,n
>>>> degree connections that each Source has. For example if an employees
>>>> from Company A went on to found Company B and then employees from
>>>> Company B went on to found Companies C and D, Company A would have 1
>>>> first degree connection and 2 second degree connections.
>>>>
>>>> My goal is to create something similar to a shortest path measurement
>>>> whereby a first degree connection is equal to 1, a second degree
>>>> connection 1/2, a third degree connection 1/3, and so forth. In the
>>>> above example, Company A's score would be (1/1)+(2/2) or 2. I believe
>>>> this is a closeness/shortest path centrality approach, but I may be
>>>> mistaken (and would love to be corrected!).
>>>>
>>>> After making the connections symmetric (i.e. all pairs are present as
>>>> both inbound and outbound connections), I've attempted three
>>>> approaches, all without success:
>>>>
>>>> 1. Use netsis and netsummarize. Neither the adjacency nor closeness
>>>> calculations seems to get me to the right answer. I don't have
>>>> experience using mata, but it appears that the matrix generate by
>>>> netsis doesn't reflect the appropriate connections (i.e. a connection
>>>> in the original edge list is not represented by a 1 in the matrix)
>>>>
>>>> netsis Source Target, measure(adjacency) name(A, replace);
>>>> netsummarize A/(rows(A)-1), generate(degree) statistic(rowsum);
>>>>
>>>> netsis Source Target, measure(distance) name(D, replace)
>>>> netsummarize (rows(D)-1):/rowsum(D), generate(closeness) statistic(rowsum)
>>>>
>>>> 2. Create a matrix data structure in Stata and use centpow. I keep
>>>> receiving an error noting that the matrix is not symmetrical. I've
>>>> checked and made sure that the dataset is a perfect square (it has 707
>>>> observations and 707 variables) and that a connection between Company
>>>> A and Company B is also represented by a connection between Company B
>>>> and Company A. Does centpow require the data to actually be in a mata
>>>> matrix?
>>>>
>>>> use ".\dta\\${connection}_connectionIDSymmetric${typeInt}.dta";
>>>> contract targetID sourceID;
>>>> reshape wide _freq , i(targetID) j(sourceID);
>>>> qui foreach v of var _freq* {;
>>>> replace `v' = 0 if mi(`v');
>>>> };
>>>> drop targetID;
>>>> save ".\dta\\${connection}_adjacencyMatrix${typeInt}.dta", replace;
>>>> centpow ".\dta\\${connection}_adjacencyMatrix${typeInt}.dta";
>>>>
>>>> 3. Start with the edgelist, and merge it with itself, changing the
>>>> Target and Source variable names such that Target becomes Source for
>>>> the second degree connection and so forth (I think this is
>>>> demonstrably not the solution, so I won't elaborate further).
>>>>
>>>> I think this either has a simple solution that I can't think of
>>>> involving the edge list, or will involve a more intensive solution
>>>> using mata. If anyone has experience or could point me in the
>>>> direction of content (Statalist has limited SNA resources), that would
>>>> be a huge help.
>>>>
>>>> Here are some of the resources I've already reviewed:
>>>> http://www.rensecorten.org/index.php/research/social-network-analysis-with-stata/
>>>> https://sites.google.com/site/statagraphlibrary/netgen111
>>>> http://www.ats.ucla.edu/stat/sna/sna_stata.htm
>>>>
>>>> Thanks in advance.
>>>>
>>>> Best,
>>>>
>>>> Mike
>>>> *
>>>> * For searches and help try:
>>>> * http://www.stata.com/help.cgi?search
>>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>>> * http://www.ats.ucla.edu/stat/stata/
>>>
>>>
>>>
>>> --
>>> Mike Goodwin
>>> Senior Associate, Endeavor Insight
>>> 900 Broadway | Suite 301 | New York, NY 10003
>>> +1 646 368 6354 | Skype: michael.p.goodwin
>>
>>
>>
>> --
>> Mike Goodwin
>> Senior Associate, Endeavor Insight
>> 900 Broadway | Suite 301 | New York, NY 10003
>> +1 646 368 6354 | Skype: michael.p.goodwin
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>> * http://www.ats.ucla.edu/stat/stata/
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
--
Mike Goodwin
Senior Associate, Endeavor Insight
900 Broadway | Suite 301 | New York, NY 10003
+1 646 368 6354 | Skype: michael.p.goodwin
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/