Conrad Lee

Another great guest is visiting my group this week: Conrad Lee. Conrad has been writing consistently superb blog posts over at http://sociograph.blogspot.com for quite a while now (I highly recommend checking out his back catalog, which contains insightful analysis of issues related to community detection in complex networks and much more). And he’s interested in many of the same topics that I’ve worked on for years, so there should be lots of great discussions.

Tomorrow, he’ll be speaking on methods for validating community detection algorithms using meta-data in the talk: Are network communities good for nothing? Benchmarking algorithms with inference tasks. With abstract hinting at a very interesting talk (and containing wildlife simile):

While community detection algorithms proliferate like rabbits in the spring, relatively little work has gone into determining which methods work best. In many cases, we know only that a given method can partition Zachary’s Karate club – a problem which was solved over thirty years ago. Furthermore, the small literature concerned with benchmarking these algorithms focuses on synthetic data, leaving us with little evidence to support the claim that we can find meaningful communities in non-trivial, real-world social network data. We know so little about the performance of these algorithms because on the one hand we have a poor a priori intuition of how network communities are actually structured, and on the other hand we lack datasets that have a “ground truth” set of communities.

In this presentation, I argue that the quality of network communities can be evaluated by measuring how well they allow inference of missing information, such as certain node attributes and missing links. More concretely, good network communities should provide a machine learning model with informative features. I will discuss some conceptual and practical difficulties which came up when implementing a benchmark based on this premise using the Facebook100 dataset. Early results indicate that all tested methods have a bias for a particular scale, a finding which suggests that a scaling parameter is necessary. For example, modularity maximization and the Map Equation perform poorly, even when using the hierarchical versions of these methods. Their performance improved only when using their generalized formulations, which include a scaling parameter that alters the underlying objective function.

I highly recommend stopping by if you’re in the area! Time and place are listed here.

Bruno Gonçalves

This week, we have another exciting guest, Bruno Gonçalves (twitter: @bgoncalves) will be visiting the lab Monday 24th of September, and Tuesday the 25th. Bruno has just moved to the university at Aix-Marseille University, from Alex Vespignani’s group at Northeastern and we’re excited to have him.
Bruno is giving a talk Monday at 11 – I highly recommend it:
Title: From Individual Activity to Collective Attention – Insights from Large Scale Social Network Analysis

Abstract: Modern social systems such as Twitter expose digital traces of social discourse with an unprecedented degree of resolution of individual behaviors. They offer an opportunity to investigate both individual and collective behavioral patterns and to disentangle the temporal, spatial and topical aspects of human activity.

A large survey of online exchanges or conversations on Twitter, collected across six months involving 1.7 million individuals is used to study how individuals manage their social relations. Two main features are observed:

1. Social interaction strength is highly dependent of the number of connections, corroborating Dunbar’s Social Brain theory. A simple model shows how limited individual capacity for social interaction is enough to qualitatively reproduce the features observed.

2. Users display extremely diverse activity levels that follow a broad tailed distribution. We construct an activity driven model that is capable of encoding the instantaneous time description of social network dynamics.  Within this framework, highly dynamical networks can be described analytically, providing a powerful tool for the analysis of social phenomena occurring over time-varying networks.

Finally, we focus on Twitter activity surrounding American Idol voting as minimal and simplified version of complex societal phenomena such as political elections, and show that the volume of information available in online systems permits the real time gathering of quantitative indicators anticipating the future unfolding of opinion formation events.

Time & place:

Monday 24 September at 11:00-12:00

Building 305, Seminar room 053