The September talk series continues full steam ahead. This week, you have a chance to see Philipp Lorenz talk about dynamics of topics in online social media. Philipp is a PhD student at TU Berlin’s Institut für Theoretische Physik in the Nonlinear Dynamics and Control: empirical networks and neurodynamics group. Phillip’s work focuses on temporal communities of hashtags, modeling the rise and fall of online topics, threshold models with repost and recovery, and more. Details of the talk below
Date: Tuesday, September 19th
Location: DTU, Building 321, 1st floor lab-space
Title: Capturing and modeling the dynamics of online topics
Abstract: Online media have a huge impact on public opinion, economics and politics. Every day, billions of posts are created and comments are written, covering a broad range of topics. Especially the format of hashtags, as a discrete and condensed version of online content, is in our focus. Here we present a pipeline, consisting of methods from static community detection as well as novel approaches for tracing the dynamics of topics in temporal data. We build co-occurrence networks from hashtags with timestamped edges. On static snapshots we infer the community structure and solve the resulting bipartite matching problem, by taking into account higher order memory. The results are robust to temporal fluctuations and instabilities of the static community detection.
The resulting dynamics in various datasets and for different observables, such as the community sizes or the likes they gather, as a proxy for the popularity of a topic, we observe universal behavior. Despite their versatility we find that in all datasets the distributions of gains and losses in popularity are fat-tailed, indicating occasional but large and sudden changes in public interest.
We hypothesise that only a few mechanisms may govern this behavior:
Gaining interest follows the rule of preferential attachment .
Saturation of the limited attention span decreases its fame.
discrete ranking leads to a competition between threads.
With these ingredients, we are able to design a class of models, which can reproduce the qualitative dynamics and the quantitative distributions of dynamical properties in the empirical observations. The model parameters and the required configuration for a given dataset is informational with respect to the sociological and psychological mechanisms that drive the dynamics of popularity in different contexts.
Place: Technical University of Denmark, Building 210, room 112.
Date: September 13th, 2017
Title: Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm Abstract: NLP tasks are often limited by scarcity of manually annotated data. In social media sentiment analysis and related tasks, researchers have therefore used binarized emoticons and specific hashtags as forms of distant supervision. Our paper shows that by extending the distant supervision to a more diverse set of noisy labels, the models can learn richer representations. Through emoji prediction on a dataset of 1246 million tweets containing one of 64 common emojis we obtain state-of-the-art performance on 8 benchmark datasets within sentiment, emotion and sarcasm detection using a single pretrained model. Our analyses confirm that the diversity of our emotional labels yield a performance improvement over previous distant supervision approaches.
Friend of the lab & postdoc at Northeastern University Piotr Sapieżyński is visiting Copenhagen and we’re lucky to hear about his ongoing work on FAT (Fair, Accountable, and Transparent) Machine Learning. This talk which focuses on the fair part of FAT ML is not one to miss if you want to be on the cutting edge of ethically responsible Machine Learning.
Date: September 7th, 2017
Place: Technical University of Denmark, Building 321, first floor lab space.
Title: Academic performance prediction in a gender-imbalanced environment
Abstract: Individual characteristics and informal social processes are among the factors that contribute to a student’s performance in an academic context. Universities can leverage this knowledge to limit drop-out rates and increase performance through interventions targeting at-risk students. Data-driven recommendation systems have been proposed to identify such students for early interventions. However, we find that the performance of some students is best predicted using indicators that differ from those predictive for the majority. Naive approaches that do not account for this fact might favor the majority class and lead to disparate mistreatment in the case of minorities. In this presentation I will talk about behavioral and psychological differences between male and female participants of the Copenhagen Networks Study, and how these differences can contribute to unequal performance in the academic achievement prediction problem. I will also stress the importance of the error analysis in seemingly well-performing predictors and review the approaches to fair machine learning.
Wow. We are lucky to have legendary researcher Bernardo Huberman visiting later this month. His production of high-impact papers, books, and patents are is too rich and plentiful to reproduce here, so I’ll simply quote Wikipedia’s summary!
Bernardo has been a central player throughout the rise of network theory (and mentor for field notables, such as Lada Adamic and Jure Leskovec), but that’s just a fraction of what he’s accomplished. If you care about anything related to information sciences, this is a talk you cannot miss. Here are the details:
Date: August 29, 2017.
Location: Technical University of Denmark, Building 324, Room 040.
Title: Social media and the attention economy
Abstract: We are witnessing a momentous transformation in the way people interact and exchange information with each other. Content is now co-produced, shared, classified and rated by millions of people, while attention has become the ephemeral and valuable resource that everyone seeks to acquire. This content explosion is to a large extent driven by a mix of novel technologies and the deep human drive for recognition. This talk will describe the regularities that govern how social attention is allocated among all media and the role it plays in the production and consumption of content. It will also describe how its dynamics determines the emergence of public agendas while allowing predict the evolution of social trends.
I’m very excited to have Roberta Sinatra visiting the group for the week of April 3rd. She is an Assistant Professor at the Center for Network Science and Math Department at the Central European University in Budapest.
Roberta works on ‘the science of success’, her most recent adventures resulting in two very impressive pieces in the interdisciplinary journal Science (and corresponding world wide press coverage).Check out those papers here and here.
She will give a talk about her work at DTU Compute. Details can be found below.
Date: April 4th, 2017
Location: Technical University of Denmark, Building 321, 1st floor lab space.
Title: Quantifying the evolution of individual scientific impact
Abstract:Despite the frequent use of numerous quantitative indicators to gauge the professional impact of a scientist, little is known about how scientific impact emerges and evolves in time. In this talk we quantify the changes in impact and productivity throughout a career in science and show that impact, as measured by influential publications, is distributed randomly within a scientist’s sequence of publications. This random impact rule allows us to formulate a stochastic model that uncouples the effects of productivity, individual ability and luck, unveiling the existence of universal patterns governing the emergence of scientific success. The model assigns a unique individual parameter Q to each scientist, which is stable during a career and accurately predicts the evolution of a scientist’s impact, from the h-index to cumulative citations. Finally, we show that the Q-parameter is more predictive of independent recognitions, like prizes, than cumulative citations, h-index or productivity.
We are very lucky to have Michael Szell visiting the week of April 3rd. Micheal is a research fellow at the Hungarian Academy of Sciences, Centre for Social Sciences and visiting at Northeastern University, Center for Complex Network Research. He’s previously worked at the MIT Media Lab’s Senseable City Lab.
Michael’s research focuses on a quantitative understanding of collective behavior. How the the underlying patterns of our interlinked actions and decisions can be modeled in computational social science, and his past research involves mining and modeling large-scale data sets of human activity following a complex networks/systems approach.
His exciting work has been featured in PNAS, Nature Physics, Science, and many other fine journals. During his visit, Michael will give a talk at DTU Compute.
Date. Tuesday April 4th, 2017
Location. Technical University of Denmark, Building 321, 1st floor lab space.
Title: Using network science and data visualization to assess the potential of urban sharing economies
Abstract: We introduce the notion of shareability network, which allows us to model the collective benefits of sharing rides as a function of passenger inconvenience, and to efficiently compute optimal sharing strategies on massive datasets. We first apply this framework to a dataset of millions of taxi trips taken in New York City, showing that with increasing but still relatively low passenger discomfort, cumulative trip length can be cut by 40% or more. This benefit comes with reductions in service cost, emissions, and with split fares, hinting toward a wide passenger acceptance of such a shared service. Simulation of a realistic online system demonstrates the feasibility of a shareable taxi service in New York City. Shareability as a function of trip density saturates fast, suggesting effectiveness of the taxi sharing system also in cities with much sparser taxi fleets or when willingness to share is low. Indeed, applying the same framework to a diverse set of world cities, using data on millions of taxi trips beyond New York City, in San Francisco, Singapore, and Vienna, we compute the shareability curves for each city, and find that a natural rescaling collapses them onto a single, universal curve. We explain this scaling law theoretically with a simple model that predicts the potential for ride sharing in any city, using a few basic urban quantities and no adjustable parameters. Accurate extrapolations of this type will help planners, transportation companies, and society at large to shape a sustainable path for urban growth. Finally, we present “What the Street!?”, an online platform for the interactive exploration of city-wide mobility spaces, published in April 2017. The aim of What the Street!? is to facilitate the intuitive exploration of (wasted) mobility space in cities, exploring why and to which extent space is distributed unevenly between different modes of transportation. We demonstrate how this data visualization of re-ordered city spaces can effectively inform relevant stakeholders and the public about large-scale reductions of parking spaces in future scenarios of wide-spread car-sharing.
We’re very lucky to have Kim Albrecht visit for a few days later this month. Kim is a gifted visual researcher and information designer. His work is absolutely amazing (beautiful as well as informative).
Above is, for example, a summary of the careers of 128 tennis players; read the full story here. We are very lucky to have Kim speaking at DTU later this month!!
Title: Imagining Complex Systems
Time: Tuesday March 28th, 10 AM
Location: DTU Building 321, 1st floor lab space (details)
Abstract: How can visualization help to understand the world surrounding us? That is the basic underlying question that comes up in all projects that Kim investigated in throughout the last years. This theme sees design as something different than communication or decoration. It is not about a style, a trend or fashion anymore. The design process becomes a tool to create insights and knowledge. But once investigating these created technological artifacts in more depth all the cultural formations forming the graphics come into focus demonstrating the subjectivity of visualization.
Bio: As a visual researcher & information designer, Kim Albrecht is interested in networks, time, power, processes and how we can find visual representations for these topics to produce and represent knowledge. Currently, Kim is based in Boston, working at the Center for Complex Network Research as a visualization researcher. He collaborates and builds visualization interfaces with research groups from a wide variety of scientific fields and Universities (Harvard University, UCLA, Stanford University). In 2016 Kim started his Ph.D. research at the University of Potsdam in the field of media theory. Researching information visualizations and their interfaces regarding their epistemological value.
Have you recently finished your PhD? And would you like to come to Denmark to work with deep learning on an amazing dataset? Then keep reading. There’s a great opportunity for DTU funding that we can apply for together
Proposal: Deep learning, network structure, and language on Twitter
Based on a massive dataset (10% of all tweets going back to 2012), we wish to study the interplay between language and network structure. Specifically, we wish to study the interplay between language evolution and network evolution across time (effectively the co-evolution of language and network structure).
As part of the grant application, you will be part of shape the research questions, but a rough idea would be to use deep learning approaches (word embeddings, LSTMs) to represent the language component, and state-of-the-art network science approaches for the network evolution.
At the time of recruitment (1 July 2017) applicants must not have resided or carried out their main activity in Denmark or at DTU for more than 12 months in the 3 years immediately prior to recruitment (excl. holidays and short visits)
Successful applicants must move to Denmark by the time of employment at the latest;
The applicant must, by the time of recruitment (1 July 2017), be in possession of a doctoral degree or have at least 4 years of full-time equivalent research experience
Renowned network scientist and creator of InfoMap (probably the world’s best community detection algorithm for complex networks), Martin Rosvall, is visiting Copenhagen. And I’ve managed to convince him to visit DTU to give a talk!
Martin is an associate professor at the department of physics at the university of Umeå (Sweden). He’s an accomplished author of many highly cited papers, and a great speaker. Thus, I strongly recommend you come see his talk.
The details are below:
Time: Wednesday December 7, 11:00am
Place: Technical University of Denmark. Building 321, 1st floor Lab Space.
Title: Maps of sparse Markov chains efficiently reveal community structure in network flows with memory
Abstract: To better understand the flows of ideas or information through social and biological systems, researchers develop maps that reveal important patterns in network flows. In practice, network flow models have implied memoryless first-order Markov chains, but recently researchers have introduced higher-order Markov chain models with memory to capture patterns in multi-step pathways. Higher-order models are particularly important for effectively revealing actual, overlapping community structure, but higher-order Markov chain models suffer from the curse of dimensionality: their vast parameter spaces require exponentially increasing data to avoid overfitting and therefore make mapping inefficient already for moderate-sized systems. To overcome this problem, we introduce an efficient cross-validated mapping approach based on network flows modeled by sparse Markov chains. To illustrate our approach, we present a map of citation flows in science with research fields that overlap in multidisciplinary journals. Compared with currently used categories in science of science studies, the research fields form better units of analysis because the map more effectively captures how ideas flow through science.
I’m super excited to announce that we recently had a new paper published in PNAS. And by ‘we’ I mean my former PhD Student Vedran Sekara (first author), my former PostDoc Arek Stopczynski, along with yours truly.
I’m very proud of the work we’ve done, and somehow we got away with giving the paper the not-so-humble title Fundamental Structures of Dynamic Social Networks. The cool thing is that even though the title is perhaps ostentatious, I actually think that we’re on to something fundamental here. I’ve attempted to write a non-technical explanation below.
Prologue: The connection to communities
Community detection is a big deal in network science. Just look at this plot I created that shows the number of papers about community detection per year.
There are literally thousands of papers that address the topic of finding communities in networks published every single year, so in my world this is an important topic. Detecting communities in complex networks is usually all about finding groups of nodes with many links between then – and only few links to the rest of the network. The typical example network in a community detection paper looks something like this:
Back in 2010, YY Ahn, Jim Bagrow and I wrote a paper where we argue that there’s something fundamentally wrong with this idea of communities. The problem is that the illustration above assumes that each node is a member of only one single community. In that paper we argue that this assumption is often wrong. In most networks, each node is a member of more than one community. In social networks, for example, we are in communities of friends, family, co-workers, sports buddies, etc.
When each node is a member of many communities, the global picture gets more messy. The network doesn’t fall apart into neat chunks as above, rather it looks like a mess of a hairball. [I’ve written a popular explanation of those findings here plus a follow-up here.] The hairball below shows a real social network from the PNAS paper.
Back then, we did not have access to temporal information, but as part of trying to wrap our brains around how this hairball arises, Jim, YY, and I came up with the picture below (Jim actually drew it and impressively figured out how to do the perspective). This illustration – as we shall see below – turned out to be quite prophetic.
The illustration shows that when single individuals (marked in green and turquoise) participate in multiple communities the underlying simplicity is obscured in the aggregated network.
I had forgotten all about communities when my graduate student Vedran and I started looking at the incredible detailed data my group had just started collecting as part of the Copenhagen Networks Study (CNS). CNS contains 2.5 years of data collected by handing out 1000 smartphones to nearly all the DTU freshman students, collecting physical proximity data (using Bluetooth to measure the distance between pairs of individuals), phone calls, text messages, Facebook interactions, as well as GPS data. All of this with high temporal resolution (e.g. we recorded face-to-face meetings every 5 minutes)
Working as lead hacker-in-residence on top of his data science duties, Arek used a mix of 26-hour days & what I can only describe as pure black magic to start almost from scratch and orchestrate the software infrastructure needed to collect and store all of these data sources in something like six months.With CNS we finally had access to the temporal networks dataset needed to dig deeper.
When we looked at the physical proximity data we noticed that, as we considered finer and finer time resolution, the hair-ball (beautifully) dissolved into meaningful structures.
The green hairball shows everyone who has spent time together across an entire day. The orange network shows physical contacts aggregated over an hour, and the blue network shows the interactions for a five-minute time slice. The exciting thing is that in the blue network, we can directly observe the groups of people hanging out together. No community detection necessary – we had solved that question those thousands of papers in Figure 1 are addressing, simply by changing the temporal resolution . Said differently we’ve just identified a case where understanding the network got easier by adding more data (That’s why Renaud’s commentary is called “Rich Gets Simpler”).
Usually it’s the opposite. Things usually get a lot more complex when we have to account for more data. Just check any paper on temporal networks (for example take a look at this excellent review). I take the fact that more data has simplified the problem to mean that we’re on to something: that we’re looking at the network represented at the right temporal resolution.
Anyway. We’d just found out how to identify all of the little communities in a timeslice. Now we needed to put the pieces together again. But since we’d figured out the underlying simple principle, we began to study how meetings between people develop over time – simply by matching up groups between neighboring timeslices.
The result is gatherings – the temporal representation of a meeting between a group of individuals that can last anywhere from 15 mins to several hours.
We have a great visualization (with accompanying explainer-video, embedded below) that beautifully describes what gatherings are and how they work. Check out the video, it’s only 90 seconds long.
The visualization was created by Ulf Aslak Jensen, a newly started PhD student in my group. And it is officially awesome: earlier this year it won Science Magazine’s Data Stories Competition!
But, while they’re already great and exciting, gatherings are only the beginning of the story. If a group of people have a real social connection, they meet again and again. We call gatherings that occur repeatedly, cores. It is the cores that are the ‘fundamental structures’ that organize/simplify the dynamics we observe on the network. Let’s dig deeper.
First, let’s think about what the network looks like from the perspective of a single node. Below, we show an example from a real (and representative) individual.
Instead of modeling each and every interaction in the network, we now have a framework that allows us to think about a node’s social activity in a different way. We are able to think about the node as participating in a sequence of gatherings, where each gathering is an instance of a core.
The node pictured above is a member of 9 cores, each of which has gathered multiple times. If we plot when in time each core is active, it looks like this:
We call this pattern of interactions a person’s social trajectory, because we can think of the person’s journey through the network as jumping from core to core – from social context to social context.
It is a massive simplification of the hairball from Figure 3. And it is this simplification – the fact that we are now able to think about dynamic social network in terms of cores and their activations – that I think is the paper’s main contribution.
(Plus, having seen how the cores work, I hope it’s clear why I said that Figure 4 turned out to be a nice representation of what’s actually happening in real networks. )
In the paper we also spend quite a bit of time showing how this simplification is, in fact, useful for a number of purposes. But since this post is probably already a bit tl;dr I’ll save a detailed description of those results for another day. But I’ll summarize them here.
Firstly, we show that we can use cores to predict where people will be in the future. The idea is simple. A core is a ‘real’ object in the network in the sense that when we see a gathering, all of its members must be present. This means that observing a part of a core is a signal that soon we’ll soon see the remaining members.
In the paper we look at cores of size three and show how a sighting of two core members signals the arrival of the third group member.
Secondly, we realized that social trajectories have a lot in common with spatial trajectories. Spatial trajectories describe how we move from location to location. From ‘home’ to ‘work ‘to ‘supermarket’, etc.
The fact that we move through social contexts (cores) just like we move through physical space opens an interesting connection to work on human mobility. Specifically, we connect the work on cores to a seminal paper on Limits of Predictability in Human Mobility, which showed that for most people, given a sequence of past locations, the next location can be predicted with high accuracy .
We find a similar level of predictability given social trajectories, as well as an interesting interplay between the social and geo-spatial predictability (when people are highly unpredictable wrt. their location, they tend to be highly predictable wrt. their social context).
There is much more in the actual paper. For example, we talk about how the cores leave traces in other communication channels. And the paper also contains the technical details (although a lot of them are contained in the massive Supporting Information document). I will write more about the predictability results in a later post (since those findings are actually pretty cool as well).
In summary, I hope that I’ve managed to give you a sense of the paper’s central contribution – and perhaps also provided a bit more of an explicit link to the literature (including my own past research) than is readily available from the paper.
 The data was retrieved using the following Google Scholar search query: (“complex network” OR “complex networks” OR “network data”) AND (“community detection” OR “community assignment” OR “network community” OR “network communities” OR “community finding”). The idea for that query comes form Conrad Lee.
 I’m exaggerating a little bit for effect here. The approach we’re discussing only works for systems where people are actually meeting face-to-face. Community detection in phone call networks or Facebook is a different story.
 It’s a little bit confusing because we’re talking about two distinct kinds of predictability. The predictability related to a sequence of location/social contexts has to do with to the amount of routine in someone’s behavior.