Pervasive Overlap

Just recently, I came across the following video showing LinkedIn chief scientist DJ Patil explaining the egocentric networks (networks consisting of an individual and their immediate friends) for a few individuals based on their LinkedIn connections.

Although the individuals in the center of these egocentric networks are unusual (in the sense that they have many more LinkedIn connections than the average user), the video clearly shows that each person is a member of multiple communities where the communities are dense and almost fully connected, while there are fewer connections between the communities. (If any of this sounds familiar, it’s because I wrote about this subject a couple of months ago on the Complexity and Social Networks Blog).

This notion of social structure implies that — seen from the perspective of a single node — everything is relatively simple: the world breaks neatly into easily recognizable parts (e.g. family,  co-workers, and friends). There are few or no links between the communities because we actively work to keep them separate (more here, on why this is the case).

I’ve been thinking about the consequences of this local structure for a while, and recently coauthored a paper this subject with YY Ahn and Jim Bagrow [1]. Here, and in an upcoming blog post, I’ll be writing about some insights from that work.

The idea I hope to explore here has to do with the global structure that arises when all nodes in a network have multiple community affiliations, when there is pervasive overlap. In the follow up, I’ll explore how a single hierarchical organization of the network can exist in the presence of pervasive overlap.

Untangling the hairball

In the standard view of communities in networks, the global structure is modular [2]. This situation is shown below (left), where the communities are labeled using different colors (image from gephi.org). Modular structure on the global level implies, however, that individual nodes can have only a single community affiliation!

If every node is a member of more than one community — and this is clearly the case in the LinkedIn example, as well as in real social networks — then the global structure of the network is not at all modular. Rather, the network will be a dense mess with no visually discernible structure. The network will look like ball of yarn … or a hairball (above, right). In fact, this is precisely the type of structure which has recently been discovered in empirical investigations of a comprehensive set of large networks (social and otherwise) [2, 3].

So the question becomes: How do we find network communities in the hairball? This is the question YY, Jim and I answer in Ref [1]. The trick is that although nodes have many community memberships, each link is mostly uniquely defined. For example, the link you have to one coworker is similar to the link you have to other coworkers. Thus, by formulating community detection as a question of categorizing links rather than nodes, we are able to detect communities in networks with pervasive overlap.

Using our algorithm, for example, we show that dense hairball-networks, such as the word association network (which is what is pictured above, right) contain highly organized internal structure with well defined and pervasively overlapping communities. We’re hoping that our algorithm will help reveal new insights about some of the many highly overlapping social networks, such as the LinkedIn data shown above.

Code for our algorithm may be downloaded here; that site also features a neat interactive visualization of the link clustering algorithm.

Note: This entry was originally posted on the Complexity and Social Networks Blog.

References

Published by

Sune Lehmann

I’m an Associate Professor at the Department of Applied Mathematics and Computer Science, at the Technical University of Denmark.

10 thoughts on “Pervasive Overlap”

  1. Hello!
    I’ve been trying to use your clustering algorithm in Python 2.7.2 and receive the following error:
    Usage: python link_clustering_test.py [options] edges_space

    link_clustering_test.py: error: incorrect number of arguments

    Traceback (most recent call last):
    File “C:\Users\alexandrsemenov\Documents\Data\Python\link_clustering_test.py”, line 179, in
    parser.error(“incorrect number of arguments”)
    File “C:\Program Files\Python2\lib\optparse.py”, line 1583, in error
    self.exit(2, “%s: error: %s\n” % (self.get_prog_name(), msg))
    File “C:\Program Files\Python2\lib\optparse.py”, line 1573, in exit
    sys.exit(status)
    SystemExit: 2

    How can I fix it?

      1. It is space separated list of edges in *.csv format

        I’m a newbie in Python, so I simply put the script into the same directory with the edgelist, opened it in Python 2.7.2 IDLE and replaced “filename” in the code with edles.csv (the name of my data file). Then I tried to execute it.

      2. Hmm. First of all, I don’t think csv usually separates values by space (since the filename literally means “comma separated values”), so be aware of that.

        If you type “python link_clustering.py -h” at the prompt you’ll see a helpstring help get you started with the options.

        Also, if you’re a total newbie at Python, you may want to check out Alex Kalinka’s implementation in R

        http://cran.r-project.org/web/packages/linkcomm/index.html
        http://bioinformatics.oxfordjournals.org/content/early/2011/05/19/bioinformatics.btr311.abstract

        as it offers a lot of functionality that’s not present in our Python version.

        Cheers,
        Sune

  2. I made it space separated myself.

    Does this command work in Windows: “python link_clustering.py -h”? I couldn’t run it.

    Thanks for the links.

    1. The line “python link_clustering.py -h” should work in the command prompt of windows – I haven’t really used IDLE so not sure how you run scripts from there.

      Before running the command, you can check if python works on your system by typing “python” at the command prompt. If that starts python you should be all set. Otherwise, you probably have to add python to the path.

      Hope this helps,
      Sune

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s