Next week, we’re very lucky to have Dirk Brockmann visiting the lab. If you’re anywhere near Copenhagen there’s no excuse not to come and see him. He’s a world class scientist (see below) and in addition to mind-expanding content, his talks often feature subtle humor, as well as legendary slideshows.
Title: The hidden geometry of complex, network driven contagion phenomena
Abstract: See below.
Dirk is theoretical physicist turned world expert in spreading patterns of contagious disease. His recent Science paper The hidden geometry of complex, network-driven contagion phenomena[Science 342, 1337 (2013)] shows that it’s possible to replace geographic distance by a probabilistically motivated effective distance which reveals a hidden geometry where disease arrival times can be accurately predicted.
With 29 pages of text and 9 pages of references, the new paper we’ve just put on arXiv is almost big data in its own right (ok, not quite, but it’s still a nice, big chunk of work).
The paper outlines all the work we’ve done over the past couple of years to put together a great big testbed for network science, working to collect a multiplex dataset (face-to-face, telecommunication, social networks, geospatial- and demographic information) of around 1000 densely connected individuals.
The abstract reads
This paper describes the deployment of a large-scale study designed to measure human interactions across a variety of communication channels, with high temporal resolution and spanning multiple years – the Copenhagen Networks Study. Specifically, we collect data on face-to-face interactions, telecommunication, social networks, location, and background information (personality, demographic, health, politics) for a densely connected population of 1,000 individuals, using state-of-art smartphones as social sensors. Here we provide an overview of the related work and describe the motivation and research agenda driving the study. Additionally the paper details the data-types measured, and the technical infrastructure in terms of both backend and phone software, as well as an outline of the deployment procedures. We document the participant privacy procedures and their underlying principles. The paper is concluded with early results from data analysis, illustrating the importance of multi-channel high-resolution approach to data collection.
If you’re a PhD student or young PostDoc interested in a “curiosity-driven, bottom-up research project” in my lab, the Ørsted Postdoc positions linked here are a great opportunity. Let me know, and we can consider designing a project together. And don’t forget, the Danish PostDoc salaries are great.
Some years ago (…), I said that our networked future was bracketed by the dystopian nightmares of two old-Etonian novelists, George Orwell and Aldous Huxley. Orwell thought we would be destroyed by the things we fear, while Huxley thought that we would be controlled by the things that delight us. What Snowden has taught us is that the two extremes have converged: the NSA and its franchises are doing the Orwellian bit, while Google, Facebook and co are attending to the Huxleyean side of things.
From “Here’s how data thieves have captured our lives on the internet” by John Naughton in the Guardian/the Observer. [link here]
A couple of days I wrote (with Piotr Sapieżyński) about our influential Twitterbots [click here to read more]. We know the bots are great at getting followers, but what about other measures of influence? Today, on a whim, I checked the bots’ Klout scores, and was both surprised & impressed.
I’m fairly certain that the bots have higher Klout scores than most readers of this page! (Let me know in the comments if I’m wrong). And they only been tweeting for approx 90 days. For reference, my personal Klout score is currently 48, and I’ve been posting on Twitter since 2008, and have literally thousands of tweets to my name.
What’s even more impressive is that the bot Klout-scores are calculated based on Twitter alone. My own score is also partly based on contributions from Facebook and LinkedIn.
Check out the gallery of bot Klout-scores by clicking on the images below.
I suppose this says more about Klout’s algorithm than about the bots’ actual influence, but it’s still an interesting tidbit.
Note: This post is co-written with Piotr Sapieżyński
Is it possible for a small computer science course to exert measurable influence (trending topics) on Twitter, a massive social network with hundreds of millions of users? The surprising answer to that question is “yes”. That’s exactly what we did this year, using simple Python scripts and the Twitter API. Below we explain why & how + some of our findings along the way.
Why Twitter bots?
The standard (spam)bot on Twitter has almost no followers, almost zero activity, and exist for a single simple purpose, for example to increase follower counts for certain individuals.
For this year’s Social Graphs and Interactions course, we wanted to do something different – we wanted to see how “intelligent” we could make our bots, using simple machine learning and network analysis methods (the topics covered in the class).
A large part of our motivation for investigating Twitter bots in class is that the amount of manipulation that humans are experiencing on line is ever increasing. Think, for example, about how Facebook’s time-line filtering algorithm shapes the world view of hundreds of millions around the globe. And that’s just the most main stream example.
Instead of simply pointing out this fact, we thought that investigating how relatively simple bots can interact with and influence a real social system would be an interesting way for the students in our class to become aware of (and potentially counteract) some of those manipulations.
Some basic findings
Our first finding was that getting followers on Twitter is surprisingly easy! By employing a simple strategy, which takes advantage of a tendency for most users to “follow back” if followed by someone (who looks like a not-too-spammy Twitter profile). The recipe is simply
Manually create a realistic profile, including a few tweets
Pick users with between 50 and 300 followers (people with high numbers of followers are less likely to follow back).
Follow about 100 new users per day.
Unfollow whoever doesn’t follow you back within 24 hours (because users with a very asymmetrical ratio of following to followers look like spam-bots).
Repeat steps 2-4.
Our initial target in the class was to get at least 50 followers that way, but in a relatively short time period the most successful bots had gained thousands of followers! Below is a plot of number of followers as a function of time for some of the most successful teams (for the first 50 days of the class).
These large numbers of followers (along with systematic interactions added later) also translated to very high Klout-scores.
Aside: In the beginning of the course, we focused on Justin Bieber followers, here’s a snapshot from a report describing an early avatar
The next finding was that some teams in the class more or less inadvertently connected their bots to large “dark matter” components of the Twitter network, very large systems of spambots posting meaningless content and following back immediately in an automated fashion.
We did not explore these parts of the network in detail, but we note in passing that such areas are highly interesting for actual research, as they may create significant noise for analytics skewing results for algorithms working to predict the stock market or box office revenue based on the Twitter firehose.
The Twitter dark matter may create lots of noise on twitter, but are great for getting lots of followers quickly, and many followers is a key part of a convincing Twitter persona, as many Twitter users tend to think that someone with thousands of followers must have something interesting to say.
As the course progressed, we focused on creating bots that could use machine learning to recognize “good” content for tweeting and retweeting. Bots that are able to detect topics within their tweet-stream … and distinguish between real, human accounts and robots among their followers.
However, the question remained: Can those thousands of followers be converted to influence on Twitter? For the class’ final project, we decided to put that to the test.
The overall goal was to for each team to build a convincing bot, get human followers, and at a specified time, for everyone work together to make specific hashtags trend on twitter. So how to achieve that goal? Here’s an overview of what each team has worked on
Build convincing avatars and use the high follower-counts as part of the disguise.
Use machine learning to tell who’s a bot and who’s not (in order to focus only on humans and ignoring bots).
Use natural language processing & machine learning to discover quality content to re-tweet and tweet.
Use network theory, to explore the network surrounding existing followers, making sure that bot actions reach entire communities.
Trending topics are defined for geographical areas. Since Copenhagen is not very active in the twitter-verse (sadly, Copenhagen does not have trending topics on Twitter), we chose Boston, MA (where both of us have lived) for the experiments. Thus all bots were located in Boston (both terms of in profile text and tweet-geotags) and tweet on an East Coast time table.
Specifically, the bots started following people located in Boston based on self-reported language, location, profile description and geo-tagged tweets. By the end of a three week period, more than 800 individuals in Boston followed at least one of our bots.
In addition to each bot’s idiosyncratic strategy for following new Twitter users, the bots maintained a shared list of Bostonians who had already followed (back) one of the bots. The idea being that if you follow more than one bot, content from the consortium of bots will increase proportionally in your Twitter stream.
With all that in place, we tried three distinct interventions, ordered by what we perceived to be increasing potential for virality. Each intervention consisted of a couple of manual tweets per bot and coordinated/automated re-tweeting and favoriting. The first hashtag #bostonthanks was designed to be an unusual (so as to be specific to our intervention) thanksgiving greeting that we hoped would become one of the chosen thanksgiving greetings for Boston. It didn’t really take off. The idea behind the second hashtag #MeInThree was to start a hashtag that would allow people to describe themselves in three words/concepts (something that is fun and fits within Twitter’s 140 characters). That didn’t work either.
The third hashtag #banksyinboston was designed around secretive British artist and prankster, Banksy, who travels the globe pseudonymously and interacts with the world through movies, graffiti, and happenings. The idea was to create a couple of primitive “fake” Banksy artworks and start a #banksyinboston discussion “Is he really here?” in what we hoped would be the spirit of Banksy himself. Boston also has original Banksy art, which might add to the discussion.
Above is one of the crudely made Banksy fakes with background image of Trinity church from Google Street View: note the artifact in the upper left edge of the photo. (Interestingly another and much more elaborate Banksy-in-Paris hoax/non-hoax started the same day!)
Much to our surprise (after two failed attempts), the third attempt actually succeeded in our stated goal of influencing the trending topics on Twitter!!
We did fall short of trending on Twitter’s own website, but #banksyinboston managed to get to the top of the trending list for the competing site trendsmap.com.
Analyzing the subsequent cascade of tweets reveals a couple of interesting things. Firstly, existing Boston graffiti was indeed re-discovered. Secondly, and most importantly, many Bostonians were highly effective in discovering that #banksyinboston was indeed a prank and spreading the word, here’s one example.
So at the end of the day, did the Twitter bots have influence in Boston? We stress that this is an anecdotal test and only the most viral hashtag made it to the trending list. With a little over 800 Bostonian followers, our bots did not infiltrate Boston – and most Twitter users in Boston never interacted with one of our bots.
But what we did show was that a few dedicated bots can make a difference. In four weeks, we managed to put together a small network with substantially more impact than a single individual with a similar number of followers.
Most importantly, someone with more time & resources could easily put together a much larger system of coordinated bots that – in terms of advertisement – could be used to gently boost interest in an upcoming movie/similar. Or – with malevolent intent – could use a network of “sleeper bots” to systematically spread mis-information, e.g. injecting talking points into Twitter streams on a global scale. We hope that this little experiment can be helpful in creating awareness of such subtle manipulations before they begin shaping our public conversations.
Appendix: Twitter bots – what are those!?
Here, we provide a bit of context on Twitter bots. The earliest recorded document (that we could find) on Twitter bots is a great Ignite talk by Tim Hwang from way back in 2009.
This Thursday (Nov 21st) from 14-16, we’re delighted to present two exciting talks on dynamic & complex networks. Tanya Berger-Wolf from University of Illinois at Chicago will discuss collective dynamics in the social network of primates, and Joachim Mathiesen from the Niels Bohr Institute will talk about excitable dynamics on Twitter.
Location: DTU, Building 306, Room 97 (First floor). Time: November 21st, 14:00-16:00
Speaker: Tanya Berger-Wolf (Associate Professor University of Illinois at Chicago) Title: Animals as Mobile Social Users Abstract: Recent advances in data collection technology, such as GPS and other mobile sensors, high definition cameras, and UAVs, have given biologists access to high spatial and temporal resolution data about animal populations. Many of the questions biologists are asking while trying to leverage those data are similar to questions being asked about mobile users. Why do animals go here rather than there? How does location influence activity and social interactions? How do social interactions influence activity and movement choices? How are movement decision being made in a group and individually? While some of the methodology for answering those questions has been developed for understanding human behavior, animals offer the advantage of visible and trackable interactions and movements, simpler context and rules of behavior, and no privacy issues. I will present examples of the recent developments from the mobile world of animal populations, show some of the methodology we have developed for understanding their mobile social networks, and discuss the challenges for understanding these kinds of data, common to all animals, including humans.
Bio: Dr. Tanya Berger-Wolf is an Associate Professor in the Department of Computer Science at the University of Illinois at Chicago, where she heads the Computational Population Biology Lab. Her research interests are in applications of computational techniques to problems in ecology and population biology of plants, animals, and humans, from genetics to social interactions. As a legitimate part of her research she gets to fly in a super-light airplane over a nature preserve in Kenya, taking a hyper-stereo video of zebra populations. Dr. Berger-Wolf has received her Ph.D. in Computer Science from University of Illinois at Urbana-Champaign in 2002. After spending some time as a postdoctoral fellow working in computational phylogenetics and doing research in computational epidemiology, she returned to Illinois. She has received numerous awards for her research and mentoring, including the US National Science Foundation CAREER Award in 2008 and the UIC Mentor of the Year (2009) and Graduate Mentor (2012) awards.
Speaker: Joachim Mathiesen (Associate Professor, Niels Bohr Institute) Title: Excitable human dynamics driven by extrinsic events in massive communities Abstract: Online social networks are emphatically a global phenomenon which has changed the way people interact. Using data from Twitter and on trading volumes of financial securities, we analyze the correlated human activity in massive social organizations. The activity, typically excited by real-world events and measured by the occurrence rate of international brand names and trading volumes, is characterized by intermittent fluctuations with bursts of high activity separated by quiescent periods. These fluctuations are broadly distributed with an inverse cubic tail and have long-range temporal correlations with a 1/f power spectrum. We describe the activity by a stochastic point process and derive the distribution of activity levels from the corresponding stochastic differential equation. The statistical properties of the systems that we consider have similarities with a wide range of social systems and might therefore provide insight into general human behavior in large social organizations.
If you’re in Copenhagen, if you speak Danish, and if you’re not already an expert on networks, I’ve got a public lecture on Complex Networks coming up at the Danish Royal Academy. The abstract is
Der er netværk overalt. Dybt inde i vore celler regulerer generne hinanden som del af et komplekst netværk. Tegner man et billede af, hvilke dyr der spiser hinanden i fødekæden, får man et komplekst netværk. Mennesker forbinder til hinanden via sociale netværk, der udspænder sig henover personlig kontakt, telefonopkald, online sociale netværk, osv. Og på global skala udgør Wikipedia, internettet og vores samlede viden et komplekst netværk. Siden årtusindskiftet er der sket et skred i vores forståelse af disse netværk og et nyt felt, netværksvidenskab, er i færd med at opstå. I foredraget fortæller jeg historien om moderne netværksteori, forklarer nogle af de nye indsigter og slutter af med at binde an til min egen forskning i sociale netværk.
It’s a fancy place and the building itself is worth a visit. If you’re interested, you must sign up to attend, follow this link to sign up.