Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Social Network Analysis shortest path centrality


From   Michael Goodwin <[email protected]>
To   [email protected]
Subject   Re: st: Social Network Analysis shortest path centrality
Date   Tue, 3 Sep 2013 17:10:32 -0400

Robert, this is very helpful and actually not so far from what I had
originally coded out.

The only issue I am encountering now is that some of the networks
actually double back on themselves, so that the loop you've written
out continues on infinitely (or would if Stata didn't have observation
limits).

I think the solution is to write code that drops any observations in
which the level`i' variable is equal to any of the preceding
level[`i'-1], level[`i'-2], etc. variables. Do you have any thoughts
on how to best accomplish that? My thinking was to create a loop that
compares the current level with each previous level and drops the
observation if any two values match. I have to use capture because
there is no level0. This doesn't seem to be working using my dataset.


* --------------------- begin example ---------------------
clear
input Source Target
1 2
1 5
1 6
1 9
2 3
2 5
2 7
5 8
5 6
8 9
end
tempfile main
save "`main'"
* make sure the data has no duplicates
isid Source Target

rename Source co;
local i 0;
local more 1;
while `more' {;
local ++i;
rename Target Source;
joinby Source using `main', unmatched(master);
drop _merge;
rename Source level`i';
sort co level`i';
forv num = 1/`i' {;
local count = `i'-`num';
cap drop if level`i'==level`count';
cap drop if co==level`count';
};
by co level`i': gen one = _n == 1 & !mi(level`i');
by co: egen connect`i' = total(one);
drop one;
count if !mi(Target);
local more = r(N);
qui compress;
};
drop Target;
sort co level*;
order co level* connect*;
list, sepby(co) noobs;
* --------------------- end example -----------------------
















On Sun, Sep 1, 2013 at 11:44 AM, Robert Picard <[email protected]> wrote:
> There is indeed a problem in merging the list with itself
> as it leads to many-to-many merges and I have yet to see
> one case where an m:m merge is useful. You can however use
> -joinby- to perform what you intuitively would want -merge-
> to do in this case.
>
> * --------------------- begin example ---------------------
> clear
> input source target
> 1 2
> 1 5
> 1 6
> 1 9
> 2 3
> 2 5
> 2 7
> 5 8
> 5 6
> 8 9
> end
> tempfile main
> save "`main'"
> * make sure the data has no duplicates
> isid source target
>
> rename source co
> local i 0
> local more 1
> while `more' {
> local ++i
> rename target source
> joinby source using "`main'", unmatched(master)
> drop _merge
> rename source level`i'
> sort co level`i'
> by co level`i': gen one = _n == 1 & !mi(level`i')
> by co: egen connect`i' = total(one)
> drop one
> count if !mi(target)
> local more = r(N)
> }
> drop target
> sort co level*
> order co level* connect*
> list, sepby(co) noobs
> * --------------------- end example -----------------------
>
> Original message follows:
>
> st: Social Network Analysis shortest path centrality
>
> From  Michael Goodwin <[email protected]>
> To  [email protected]
> Subject  st: Social Network Analysis shortest path centrality
> Date  Fri, 30 Aug 2013 13:53:16 -0400
>  Hi,
>
> I am trying to do some light social network analysis on a dataset
> containing a list of edges. I have the dataset organized such that
> there are two variables, Source and Target. Bot the Source and Target
> are companies, and the connection between indicates that an employee
> from Source went on to found Target. The relationship between these
> two variables is indeterminate (i.e. m:m) and although the variables
> start as strings, I've converted them to numeric values using encode
> (and ensured
> that the numeric values in both Target and Source are equal to one another).
>
> I am attempting to determine the number of first, second, third,...,n
> degree connections that each Source has. For example if an employees
> from Company A went on to found Company B and then employees from
> Company B went on to found Companies C and D, Company A would have 1
> first degree connection and 2 second degree connections.
>
> My goal is to create something similar to a shortest path measurement
> whereby a first degree connection is equal to 1, a second degree
> connection 1/2, a third degree connection 1/3, and so forth. In the
> above example, Company A's score would be (1/1)+(2/2) or 2. I believe
> this is a closeness/shortest path centrality approach, but I may be
> mistaken (and would love to be corrected!).
>
> After making the connections symmetric (i.e. all pairs are present as
> both inbound and outbound connections), I've attempted three
> approaches, all without success:
>
> 1. Use netsis and netsummarize. Neither the adjacency nor closeness
> calculations seems to get me to the right answer. I don't have
> experience using mata, but it appears that the matrix generate by
> netsis doesn't reflect the appropriate connections (i.e. a connection
> in the original edge list is not represented by a 1 in the matrix)
>
> netsis Source Target, measure(adjacency) name(A, replace);
> netsummarize A/(rows(A)-1), generate(degree) statistic(rowsum);
>
> netsis Source Target, measure(distance) name(D, replace)
> netsummarize (rows(D)-1):/rowsum(D), generate(closeness) statistic(rowsum)
>
> 2. Create a matrix data structure in Stata and use centpow. I keep
> receiving an error noting that the matrix is not symmetrical. I've
> checked and made sure that the dataset is a perfect square (it has 707
> observations and 707 variables) and that a connection between Company
> A and Company B is also represented by a connection between Company B
> and Company A. Does centpow require the data to actually be in a mata
> matrix?
>
> use ".\dta\\${connection}_connectionIDSymmetric${typeInt}.dta";
> contract targetID sourceID;
> reshape wide _freq , i(targetID) j(sourceID);
> qui foreach v of var _freq* {;
> replace `v' = 0 if mi(`v');
> };
> drop targetID;
> save ".\dta\\${connection}_adjacencyMatrix${typeInt}.dta", replace;
> centpow ".\dta\\${connection}_adjacencyMatrix${typeInt}.dta";
>
> 3. Start with the edgelist, and merge it with itself, changing the
> Target and Source variable names such that Target becomes Source for
> the second degree connection and so forth (I think this is
> demonstrably not the solution, so I won't elaborate further).
>
> I think this either has a simple solution that I can't think of
> involving the edge list, or will involve a more intensive solution
> using mata. If anyone has experience or could point me in the
> direction of content (Statalist has limited SNA resources), that would
> be a huge help.
>
> Here are some of the resources I've already reviewed:
> http://www.rensecorten.org/index.php/research/social-network-analysis-with-stata/
> https://sites.google.com/site/statagraphlibrary/netgen111
> http://www.ats.ucla.edu/stat/sna/sna_stata.htm
>
> Thanks in advance.
>
> Best,
>
> Mike
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/



-- 
Mike Goodwin
Senior Associate, Endeavor Insight
900 Broadway | Suite 301 | New York, NY 10003
+1 646 368 6354 | Skype: michael.p.goodwin
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index