Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: spmat memory considerations, datatype for banded matrix
From
László Sándor <[email protected]>
To
[email protected]
Subject
st: spmat memory considerations, datatype for banded matrix
Date
Wed, 10 Jul 2013 10:52:10 -0400
Hi,
I had a plan to do some social network analysis on a large family tree
of 9 million people (with Stata/MP 12.1), but if I'm doing my sums
right, the memory needs are enormous unless I happened to have a very
small bandwidth for the adjacency matrix.
Maybe Rafal Raciborski or other -sppack- developers could comment on
strategies to use Stata for such purposes?
If I still need a byte for each connection, I need 1.5 GBs even with
an (unachievable) bandwidth of 20 among 9 million, right?
200*9*10^6*8/(2^30) = 1.5
I tried to reorder nodes to cut bandwidth, but even in this very
sparse graphs I could not go below 98,000 (with the Reverse
Cuthill-McKee algorithm in NetworkX).
Can I at least save some space with using only bits for each
connection? (I do not weight be distance.)
Or first of all, did I get my maths right?
Or maybe I can force a low bandwidth and hope (check) that I lose few
links if the ordering was powerful.
Though probably the most important thing would be to select connected
subsamples of the entire network for separate analyses. There are many
islands of families of individuals alive (most connections would go
through deceased relatives, not in my data). But this gets messy too,
and does not impose the restriction of the same population parameters
across all islands.
Thanks for any thoughts,
Laszlo
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/