Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Social Network Analysis shortest path centrality
From
Robert Picard <[email protected]>
To
"[email protected]" <[email protected]>
Subject
Re: st: Social Network Analysis shortest path centrality
Date
Wed, 4 Sep 2013 08:54:10 -0400
You can stop a list of connections from looping back onto
itself by replacing target with a missing value if it
identifies a company that is already part of the list for
that observation. This can easily be done using -inlist()-.
Since -inlist()- does not handle a regular varlist, you can
use a local macro to build-up the comma separated variable
list as you go.
Note that it really helps to create a toy dataset to sort
out these problems. Your proposed code does not stop looping
when used with the test data I'm using.
* --------------------- begin example ---------------------
clear
input source target
1 2
1 5
1 6
1 9
2 3
2 5
2 7
5 8
5 6
8 9
8 1
end
tempfile main
save "`main'"
* make sure the data has no duplicates
isid source target
rename source co
local i 0
local more 1
local isdone co
while `more' {
local ++i
rename target source
joinby source using "`main'", unmatched(master)
drop _merge
rename source level`i'
replace target = . if inlist(target,`isdone')
local isdone `isdone', level`i'
sort co level`i'
by co level`i': gen one = _n == 1 & !mi(level`i')
by co: egen connect`i' = total(one)
drop one
count if !mi(target)
local more = r(N)
}
drop target
sort co level*
order co level* connect*
list, sepby(co) noobs
* --------------------- end example -----------------------
On Tue, Sep 3, 2013 at 6:15 PM, Michael Goodwin
<[email protected]> wrote:
> I think I've figured out the issue. Using this approach, when dropping
> values that are equal, you need to ensure that you aren't dropping
> null values. Below is the code to drop only matching non-null values.
> Following the loop are the final few lines to calculate centrality
> using the approach I mentioned in my initial post. Thanks for your
> help!
>
> * Create dataset containing all connections;
> rename Source company;
> local i 0;
> local more 1;
> while `more' {;
> local ++i;
> rename Target Source;
> joinby Source using `main', unmatched(master);
> drop _merge;
> rename Source level`i';
> sort company level`i';
> forv num = 1/`i' {;
> local count = `i'-`num';
> cap drop if level`i'==level`count' & level`i'!="";
> cap drop if company==level`count';
> };
> by company level`i': gen one = _n == 1 & !mi(level`i');
> by company: egen connect`i' = total(one);
> drop one;
> count if !mi(Target);
> local more = r(N);
> qui compress;
> };
>
> * Centrality;
> egen totalPath = rownonmiss(connect*), strok;
> local maxPath = totalPath;
> forv num = 1/`maxPath' {;
> gen connectCentrality`num' = connect`num'/`num';
> };
> egen centrality = rowtotal(connectCentrality*);
>
> On Tue, Sep 3, 2013 at 5:10 PM, Michael Goodwin
> <[email protected]> wrote:
>> Robert, this is very helpful and actually not so far from what I had
>> originally coded out.
>>
>> The only issue I am encountering now is that some of the networks
>> actually double back on themselves, so that the loop you've written
>> out continues on infinitely (or would if Stata didn't have observation
>> limits).
>>
>> I think the solution is to write code that drops any observations in
>> which the level`i' variable is equal to any of the preceding
>> level[`i'-1], level[`i'-2], etc. variables. Do you have any thoughts
>> on how to best accomplish that? My thinking was to create a loop that
>> compares the current level with each previous level and drops the
>> observation if any two values match. I have to use capture because
>> there is no level0. This doesn't seem to be working using my dataset.
>>
>>
>> * --------------------- begin example ---------------------
>> clear
>> input Source Target
>> 1 2
>> 1 5
>> 1 6
>> 1 9
>> 2 3
>> 2 5
>> 2 7
>> 5 8
>> 5 6
>> 8 9
>> end
>> tempfile main
>> save "`main'"
>> * make sure the data has no duplicates
>> isid Source Target
>>
>> rename Source co;
>> local i 0;
>> local more 1;
>> while `more' {;
>> local ++i;
>> rename Target Source;
>> joinby Source using `main', unmatched(master);
>> drop _merge;
>> rename Source level`i';
>> sort co level`i';
>> forv num = 1/`i' {;
>> local count = `i'-`num';
>> cap drop if level`i'==level`count';
>> cap drop if co==level`count';
>> };
>> by co level`i': gen one = _n == 1 & !mi(level`i');
>> by co: egen connect`i' = total(one);
>> drop one;
>> count if !mi(Target);
>> local more = r(N);
>> qui compress;
>> };
>> drop Target;
>> sort co level*;
>> order co level* connect*;
>> list, sepby(co) noobs;
>> * --------------------- end example -----------------------
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Sun, Sep 1, 2013 at 11:44 AM, Robert Picard <[email protected]> wrote:
>>> There is indeed a problem in merging the list with itself
>>> as it leads to many-to-many merges and I have yet to see
>>> one case where an m:m merge is useful. You can however use
>>> -joinby- to perform what you intuitively would want -merge-
>>> to do in this case.
>>>
>>> * --------------------- begin example ---------------------
>>> clear
>>> input source target
>>> 1 2
>>> 1 5
>>> 1 6
>>> 1 9
>>> 2 3
>>> 2 5
>>> 2 7
>>> 5 8
>>> 5 6
>>> 8 9
>>> end
>>> tempfile main
>>> save "`main'"
>>> * make sure the data has no duplicates
>>> isid source target
>>>
>>> rename source co
>>> local i 0
>>> local more 1
>>> while `more' {
>>> local ++i
>>> rename target source
>>> joinby source using "`main'", unmatched(master)
>>> drop _merge
>>> rename source level`i'
>>> sort co level`i'
>>> by co level`i': gen one = _n == 1 & !mi(level`i')
>>> by co: egen connect`i' = total(one)
>>> drop one
>>> count if !mi(target)
>>> local more = r(N)
>>> }
>>> drop target
>>> sort co level*
>>> order co level* connect*
>>> list, sepby(co) noobs
>>> * --------------------- end example -----------------------
>>>
>>> Original message follows:
>>>
>>> st: Social Network Analysis shortest path centrality
>>>
>>> From Michael Goodwin <[email protected]>
>>> To [email protected]
>>> Subject st: Social Network Analysis shortest path centrality
>>> Date Fri, 30 Aug 2013 13:53:16 -0400
>>> Hi,
>>>
>>> I am trying to do some light social network analysis on a dataset
>>> containing a list of edges. I have the dataset organized such that
>>> there are two variables, Source and Target. Bot the Source and Target
>>> are companies, and the connection between indicates that an employee
>>> from Source went on to found Target. The relationship between these
>>> two variables is indeterminate (i.e. m:m) and although the variables
>>> start as strings, I've converted them to numeric values using encode
>>> (and ensured
>>> that the numeric values in both Target and Source are equal to one another).
>>>
>>> I am attempting to determine the number of first, second, third,...,n
>>> degree connections that each Source has. For example if an employees
>>> from Company A went on to found Company B and then employees from
>>> Company B went on to found Companies C and D, Company A would have 1
>>> first degree connection and 2 second degree connections.
>>>
>>> My goal is to create something similar to a shortest path measurement
>>> whereby a first degree connection is equal to 1, a second degree
>>> connection 1/2, a third degree connection 1/3, and so forth. In the
>>> above example, Company A's score would be (1/1)+(2/2) or 2. I believe
>>> this is a closeness/shortest path centrality approach, but I may be
>>> mistaken (and would love to be corrected!).
>>>
>>> After making the connections symmetric (i.e. all pairs are present as
>>> both inbound and outbound connections), I've attempted three
>>> approaches, all without success:
>>>
>>> 1. Use netsis and netsummarize. Neither the adjacency nor closeness
>>> calculations seems to get me to the right answer. I don't have
>>> experience using mata, but it appears that the matrix generate by
>>> netsis doesn't reflect the appropriate connections (i.e. a connection
>>> in the original edge list is not represented by a 1 in the matrix)
>>>
>>> netsis Source Target, measure(adjacency) name(A, replace);
>>> netsummarize A/(rows(A)-1), generate(degree) statistic(rowsum);
>>>
>>> netsis Source Target, measure(distance) name(D, replace)
>>> netsummarize (rows(D)-1):/rowsum(D), generate(closeness) statistic(rowsum)
>>>
>>> 2. Create a matrix data structure in Stata and use centpow. I keep
>>> receiving an error noting that the matrix is not symmetrical. I've
>>> checked and made sure that the dataset is a perfect square (it has 707
>>> observations and 707 variables) and that a connection between Company
>>> A and Company B is also represented by a connection between Company B
>>> and Company A. Does centpow require the data to actually be in a mata
>>> matrix?
>>>
>>> use ".\dta\\${connection}_connectionIDSymmetric${typeInt}.dta";
>>> contract targetID sourceID;
>>> reshape wide _freq , i(targetID) j(sourceID);
>>> qui foreach v of var _freq* {;
>>> replace `v' = 0 if mi(`v');
>>> };
>>> drop targetID;
>>> save ".\dta\\${connection}_adjacencyMatrix${typeInt}.dta", replace;
>>> centpow ".\dta\\${connection}_adjacencyMatrix${typeInt}.dta";
>>>
>>> 3. Start with the edgelist, and merge it with itself, changing the
>>> Target and Source variable names such that Target becomes Source for
>>> the second degree connection and so forth (I think this is
>>> demonstrably not the solution, so I won't elaborate further).
>>>
>>> I think this either has a simple solution that I can't think of
>>> involving the edge list, or will involve a more intensive solution
>>> using mata. If anyone has experience or could point me in the
>>> direction of content (Statalist has limited SNA resources), that would
>>> be a huge help.
>>>
>>> Here are some of the resources I've already reviewed:
>>> http://www.rensecorten.org/index.php/research/social-network-analysis-with-stata/
>>> https://sites.google.com/site/statagraphlibrary/netgen111
>>> http://www.ats.ucla.edu/stat/sna/sna_stata.htm
>>>
>>> Thanks in advance.
>>>
>>> Best,
>>>
>>> Mike
>>> *
>>> * For searches and help try:
>>> * http://www.stata.com/help.cgi?search
>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>> * http://www.ats.ucla.edu/stat/stata/
>>
>>
>>
>> --
>> Mike Goodwin
>> Senior Associate, Endeavor Insight
>> 900 Broadway | Suite 301 | New York, NY 10003
>> +1 646 368 6354 | Skype: michael.p.goodwin
>
>
>
> --
> Mike Goodwin
> Senior Associate, Endeavor Insight
> 900 Broadway | Suite 301 | New York, NY 10003
> +1 646 368 6354 | Skype: michael.p.goodwin
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/