How to kill a Twitter Bot!

This friday we’re lucky to have visitor Emilio Ferrara presenting a talk on identifying twitter bots. Emilio’s work has been covered extensively in the media, for example MIT Technology Review’s How to spot a social bot on twitterDetails below:

  • Date: Friday September 12th, 2014
  • Time: 11:00-noon
  • Place: DTU Building 321, first floor lab space
  • Speaker: Emilio Ferrara (@jabawack), Post-doctoral Research Fellow at Indiana University Bloomington
  • Title: The rise of social bots: fighting deception and misinformation on social media
  • Abstract: One of the classic problems in Computer Science, recognizing the behavior of a human from that of a computer algorithm (proposed by Alan Turing), has suddenly become very relevant in the context of social media. Limits to the expressive power of humans and real incentives abound to develop human-mimicking software agents called social bots. These elusive entities wildly populate social media ecosystems, often going unnoticed among the population of real people. Bots can be harmful, aiming at persuading, smearing, or deceiving, and for such a reason our research aims at developing efficient systems to detect them. In my talk I will discuss the characteristics of modern, sophisticated social bots, and how their presence can endanger online ecosystems and our society. Characteristics related to content, network, sentiment, and temporal patterns of activity are imitated by bots but at the same time can help discriminate synthetic behaviors from human ones, yielding signatures of engineered social tampering. I will present “Bot or Not?”, a social bot detection framework prototype developed at Indiana University under the Truthy project. My talk will conclude depicting future scenarios and discussing related problems, such as that of studying persuasion campaigns on social media, how they spread, and how we can promptly detect and potentially hinder their diffusion.

 

Dynamic and Multiplex Networks

Network science buffs are in for a treat this Monday (September 1st, 2014), when we have a great set of visitors in my Group at DTU.  I’m excited to present talks on the cutting edge on what we know about networks from János Kertész and Janos Török. The talks will be back to back and detailed info can be found below

The talks are open to the public, so hope to see you there!

  • Time: Monday September 1st, 10am – noon
  • Place: DTU Building 321, room 134 (1st floor lab area).
  • Speakers:
    • János Török (10am-11pm). Associate professor at Budapest University of Technology, Department of Theoretical Physics.
    • János Kertész (11am-12pm). Professor & Director of the Institute of Physics, Budapest University of Technology and Economics

Multi-level, multi-channel, multi-agent modeling of social interactions (János Török)

Abstract: We present a model of society. Human relations are strengthened by communication and eroded by time. Communication is, in general, related to some social activity (work, friendship, hobby) or social context. Therefore we postulate that individuals having different social needs participate in a number of social contexts (family, workplace etc.) – which may also evolve in time – and communicate with other members of the contexts using different communication channels (face to face, phone, email, etc.) for different purposes and with different impact on their relationship. We show that using realistic input data from surveys and statistical data one can reproduce important features of real society like Dunbar’s numbers and their meaning.

Spreading on temporal networks: Results from empirical analysis, model calculations and simulation (János Kertész)

Abstract: Spreading phenomena typically take place on temporal networks, where connections between the nodes are only occasionally and for limited time present. Such events can be, e.g., encounters of people, which are important for contagion or opening a communication channel needed for information transmission. We studied a mobile call network from this point of view: Having the time stamped records of the calls we played a ‘susceptible-infected’ game by infecting one node at random and assuming transmission at every possible event. We introduced different reference systems by appropriate shuffling of the data and identified this way the contributions of the different types of correlations to the speed of spreading. We concluded that there is a considerable slowing down as compared to the random models, mainly due to the correlations between the link weights and the topology and the inhomogeneous, bursty character of the events. We have also shown that the temporal inhomogeneity cannot be characterized by the inter-event time distribution (IETD) alone as there are strong dependencies between the events. In order to understand better the role of the different components we investigated models of temporal networks. In the analytically solvable infinite complete graph we showed that burstiness, i.e., power law IETD distribution always accelerates the process provided the clocks are positioned on the nodes. For the complementary case of link related burstiness we considered a number of models, like the analytically tractable Cayley tree, BA trees and networks. We show that if the stationary bursty process is governed by power-law IETD, the spreading can be slowed down or accelerated as compared to a Poisson process; the speed is determined by the short time behavior, which in our model is controlled by the exponent. We demonstrate that finite, so called “locally tree-like” networks, like the Barabási-Albert networks behave very differently from real tree graphs if the IETD is strongly fat-tailed, as the lack or presence of rare alternative paths modifies the spreading. A further important result is that the non-stationarity of the dynamics has a significant effect on the spreading speed for strongly fat-tailed power-law IETDs, thus bursty processes characterized by small power-law exponents can cause slow spreading in the stationary state but also very rapid spreading heavily depending on the age of the processes.

References:

1. M. Karsai, M. Kivelä, R. K. Pan, K. Kaski, J. Kertész, A.-L. Barabási, J. Saramäki: Small But Slow World: How Network Topology and Burstiness Slow Down Spreading, Phys. Rev. E 83, 025102 (2011)

2. Márton Karsai, Kimmo Kaski, Albert-László Barabási, János Kertész: Universal features of correlated bursty behavior, Scientific Reports 2, Article number 397 (2012)

3. Márton Karsai, Kimmo Kaski, János Kertész: Correlated dynamics in egocentric communication networks, PLoS ONE 7(7) e40612 (2012)

4. Hang-Hyun Jo, Márton Karsai, János Kertész, Kimmo Kaski: Circadian pattern and burstiness in human communication activity, New J. Phys. 14 013055 (2012)

5. Szabolcs Vajna, Bálint Tóth, János Kertész: Modelling power-law distributed interevent times, New J. Phys.15, article 103023 (2013)

6. Hang-Hyun Jo, Juan I. Perotti, Kimmo Kaski, János Kertész: Enhanced Spreading Dynamics by Non-Poissonian Processes, Physical Review X 4, 011041 (2014)

7. Dávid X. Horváth, János Kertész: Spreading dynamics on networks: the role of burstiness, topology and non-stationarity, New Journal of Physics 16 (7), 073037

A note on academic writing

I often give the following writing advice to my students. Today, in honor of efficiency, I decided I’d put my advice in a blog post, so I can just link to it in the future.

Unless you’re a great writer (in which case you don’t have to follow any rules), the structure of academic text is the following:

  • First you tell your readers what you’re about to tell them.
  • Then you tell the readers the thing you want to tell them.
  • Finally you tell them what you’ve just told them.

This structure works on a number of levels in a thesis.

On the level of the entire thesis, the introduction tells the reader what’s going to happen in the text and the conclusion summarizes what just happened, while the chapters in between contain the actual work.

But for each chapter, you should also put an introduction and conclusion around the content, and similarly for each section. Even within each subsection, it might be good idea to start with a introductory sentence or two (setting the stage) and wrapping up. You have to stop before it gets too pedantic, but I hope the point gets across. It’s not exactly fractal, but almost.

Networking doesn’t always work

With collaborators at MIT (first author is Yves-Alexandre de Montjoye) we have just published a paper in Scientific Reports, The Strength of the Strongest Ties in Collaborative Problem Solving.

The paper shows that networking (in the sense of building a larger network of weak ties) does not improve team performance under some circumstances. We showed that for teams of knowledge workers in a competitive environment, the strongest ties (best friends or people you spend a lot of time with) explain much of the team performance in our statistical model.

Said differently, a team’s strongest ties are the best predictor of how the team will perform. They predict performance better than any other factors we looked at such as the technical abilities of its members, how knowledgeable they are about the topic at hand, and even their personality. In fact, once you account for a team’s strongest ties none of these other factors matters.

A neat infographic (created by Yves) explains the main findings and shows some of the key plots.

Dirk Brockmann Visit

Next week, we’re very lucky to have Dirk Brockmann visiting the lab.  If you’re anywhere near Copenhagen there’s no excuse not to come and see him. He’s a world class scientist (see below) and in addition to mind-expanding content,  his talks often feature subtle humor, as well as legendary slideshows.

  • Date: Thursday, February 27th
  • Time: 13:30 – 14:30
  • Location: Technical University of Denmark, Building 324, Room 040 [if you're traveling from Copenhagen, I recommend Bus 150S]
  • Title: The hidden geometry of complex, network driven contagion phenomena
  • Abstract: See below.

Dirk is theoretical physicist turned world expert in spreading patterns of contagious disease. His recent Science paper The hidden geometry of complex, network-driven contagion phenomena [Science 342, 1337 (2013)] shows that it’s possible to replace geographic distance by a probabilistically motivated effective distance which reveals a hidden geometry where disease arrival times can be accurately predicted.

Screenshot 2014-02-21 12.18.09

He’s also done interesting work on human travel patterns based on how money travels, Scaling laws of human travel [Nature 439, 462-465 (2006)]

Screenshot 2014-02-21 12.22.22

Finally Dirk’s work has been used to fight crime on the US hit TV series NUMB3RS – check out the action packed clip below.

Dirk is a professor at Humboldt University (recently returned from Northwestern University).

The paper itself is Big Data

With 29 pages of text and 9 pages of references, the new paper we’ve just put on arXiv is almost big data in its own right (ok, not quite, but it’s still a nice, big chunk of work).

The paper outlines all the work we’ve done over the past couple of years to put together a great big testbed for network science, working to collect a multiplex dataset (face-to-face, telecommunication, social networks, geospatial- and demographic information)  of around 1000 densely connected individuals.

Networks

The abstract reads

This paper describes the deployment of a large-scale study designed to measure human interactions across a variety of communication channels, with high temporal resolution and spanning multiple years – the Copenhagen Networks Study. Specifically, we collect data on face-to-face interactions, telecommunication, social networks, location, and background information (personality, demographic, health, politics) for a densely connected population of 1,000 individuals, using state-of-art smartphones as social sensors. Here we provide an overview of the related work and describe the motivation and research agenda driving the study. Additionally the paper details the data-types measured, and the technical infrastructure in terms of both backend and phone software, as well as an outline of the deployment procedures. We document the participant privacy procedures and their underlying principles. The paper is concluded with early results from data analysis, illustrating the importance of multi-channel high-resolution approach to data collection.

Get it here: http://arxiv.org/abs/1401.7233

Some years ago (…), I said that our networked future was bracketed by the dystopian nightmares of two old-Etonian novelists, George Orwell and Aldous Huxley. Orwell thought we would be destroyed by the things we fear, while Huxley thought that we would be controlled by the things that delight us. What Snowden has taught us is that the two extremes have converged: the NSA and its franchises are doing the Orwellian bit, while Google, Facebook and co are attending to the Huxleyean side of things.

From “Here’s how data thieves have captured our lives on the internet” by John Naughton in the Guardian/the Observer. [link here]