Another great guest is visiting my group this week: Conrad Lee. Conrad has been writing consistently superb blog posts over at http://sociograph.blogspot.com for quite a while now (I highly recommend checking out his back catalog, which contains insightful analysis of issues related to community detection in complex networks and much more). And he’s interested in many of the same topics that I’ve worked on for years, so there should be lots of great discussions.
Tomorrow, he’ll be speaking on methods for validating community detection algorithms using meta-data in the talk: Are network communities good for nothing? Benchmarking algorithms with inference tasks. With abstract hinting at a very interesting talk (and containing wildlife simile):
While community detection algorithms proliferate like rabbits in the spring, relatively little work has gone into determining which methods work best. In many cases, we know only that a given method can partition Zachary’s Karate club – a problem which was solved over thirty years ago. Furthermore, the small literature concerned with benchmarking these algorithms focuses on synthetic data, leaving us with little evidence to support the claim that we can find meaningful communities in non-trivial, real-world social network data. We know so little about the performance of these algorithms because on the one hand we have a poor a priori intuition of how network communities are actually structured, and on the other hand we lack datasets that have a “ground truth” set of communities.
In this presentation, I argue that the quality of network communities can be evaluated by measuring how well they allow inference of missing information, such as certain node attributes and missing links. More concretely, good network communities should provide a machine learning model with informative features. I will discuss some conceptual and practical difficulties which came up when implementing a benchmark based on this premise using the Facebook100 dataset. Early results indicate that all tested methods have a bias for a particular scale, a finding which suggests that a scaling parameter is necessary. For example, modularity maximization and the Map Equation perform poorly, even when using the hierarchical versions of these methods. Their performance improved only when using their generalized formulations, which include a scaling parameter that alters the underlying objective function.
I highly recommend stopping by if you’re in the area! Time and place are listed here.