A note on academic writing

I often give the following writing advice to my students. Today, in honor of efficiency, I decided I’d put my advice in a blog post, so I can just link to it in the future.

Unless you’re a great writer (in which case you don’t have to follow any rules), the structure of academic text is the following:

  • First you tell your readers what you’re about to tell them.
  • Then you tell the readers the thing you want to tell them.
  • Finally you tell them what you’ve just told them.

This structure works on a number of levels in a thesis.

On the level of the entire thesis, the introduction tells the reader what’s going to happen in the text and the conclusion summarizes what just happened, while the chapters in between contain the actual work.

But for each chapter, you should also put an introduction and conclusion around the content, and similarly for each section. Even within each subsection, it might be good idea to start with a introductory sentence or two (setting the stage) and wrapping up. You have to stop before it gets too pedantic, but I hope the point gets across. It’s not exactly fractal, but almost.

Networking doesn’t always work

With collaborators at MIT (first author is Yves-Alexandre de Montjoye) we have just published a paper in Scientific Reports, The Strength of the Strongest Ties in Collaborative Problem Solving.

The paper shows that networking (in the sense of building a larger network of weak ties) does not improve team performance under some circumstances. We showed that for teams of knowledge workers in a competitive environment, the strongest ties (best friends or people you spend a lot of time with) explain much of the team performance in our statistical model.

Said differently, a team’s strongest ties are the best predictor of how the team will perform. They predict performance better than any other factors we looked at such as the technical abilities of its members, how knowledgeable they are about the topic at hand, and even their personality. In fact, once you account for a team’s strongest ties none of these other factors matters.

A neat infographic (created by Yves) explains the main findings and shows some of the key plots.

Dirk Brockmann Visit

Next week, we’re very lucky to have Dirk Brockmann visiting the lab.  If you’re anywhere near Copenhagen there’s no excuse not to come and see him. He’s a world class scientist (see below) and in addition to mind-expanding content,  his talks often feature subtle humor, as well as legendary slideshows.

  • Date: Thursday, February 27th
  • Time: 13:30 – 14:30
  • Location: Technical University of Denmark, Building 324, Room 040 [if you're traveling from Copenhagen, I recommend Bus 150S]
  • Title: The hidden geometry of complex, network driven contagion phenomena
  • Abstract: See below.

Dirk is theoretical physicist turned world expert in spreading patterns of contagious disease. His recent Science paper The hidden geometry of complex, network-driven contagion phenomena [Science 342, 1337 (2013)] shows that it’s possible to replace geographic distance by a probabilistically motivated effective distance which reveals a hidden geometry where disease arrival times can be accurately predicted.

Screenshot 2014-02-21 12.18.09

He’s also done interesting work on human travel patterns based on how money travels, Scaling laws of human travel [Nature 439, 462-465 (2006)]

Screenshot 2014-02-21 12.22.22

Finally Dirk’s work has been used to fight crime on the US hit TV series NUMB3RS – check out the action packed clip below.

Dirk is a professor at Humboldt University (recently returned from Northwestern University).

The paper itself is Big Data

With 29 pages of text and 9 pages of references, the new paper we’ve just put on arXiv is almost big data in its own right (ok, not quite, but it’s still a nice, big chunk of work).

The paper outlines all the work we’ve done over the past couple of years to put together a great big testbed for network science, working to collect a multiplex dataset (face-to-face, telecommunication, social networks, geospatial- and demographic information)  of around 1000 densely connected individuals.


The abstract reads

This paper describes the deployment of a large-scale study designed to measure human interactions across a variety of communication channels, with high temporal resolution and spanning multiple years – the Copenhagen Networks Study. Specifically, we collect data on face-to-face interactions, telecommunication, social networks, location, and background information (personality, demographic, health, politics) for a densely connected population of 1,000 individuals, using state-of-art smartphones as social sensors. Here we provide an overview of the related work and describe the motivation and research agenda driving the study. Additionally the paper details the data-types measured, and the technical infrastructure in terms of both backend and phone software, as well as an outline of the deployment procedures. We document the participant privacy procedures and their underlying principles. The paper is concluded with early results from data analysis, illustrating the importance of multi-channel high-resolution approach to data collection.

Get it here: http://arxiv.org/abs/1401.7233

Some years ago (…), I said that our networked future was bracketed by the dystopian nightmares of two old-Etonian novelists, George Orwell and Aldous Huxley. Orwell thought we would be destroyed by the things we fear, while Huxley thought that we would be controlled by the things that delight us. What Snowden has taught us is that the two extremes have converged: the NSA and its franchises are doing the Orwellian bit, while Google, Facebook and co are attending to the Huxleyean side of things.

From “Here’s how data thieves have captured our lives on the internet” by John Naughton in the Guardian/the Observer. [link here]

Influential Bots!

A couple of days I wrote (with Piotr Sapieżyński) about our influential Twitterbots [click here to read more]. We know the bots are great at getting followers, but what about other measures of influence? Today, on a whim, I checked the bots’ Klout scores, and was both surprised & impressed.

I’m fairly certain that the bots have higher Klout scores than most readers of this page! (Let me know in the comments if I’m wrong). And they only been tweeting for approx 90 days. For reference, my personal Klout score is currently 48, and I’ve been posting on Twitter since 2008, and have literally thousands of tweets to my name.

What’s even more impressive is that the bot Klout-scores are calculated based on Twitter alone. My own score is also partly based on contributions from Facebook and LinkedIn.

Check out the gallery of bot Klout-scores by clicking on the images below.

I suppose this says more about Klout’s algorithm than about the bots’ actual influence, but it’s still an interesting tidbit.

You’re here because of a robot

Note: This post is co-written with Piotr Sapieżyński

Is it possible for a small computer science course to exert measurable influence (trending topics) on Twitter, a massive social network with hundreds of millions of users? The surprising answer to that question is “yes”. That’s exactly what we did this year, using simple Python scripts and the Twitter API. Below we explain why & how + some of our findings along the way.

Why Twitter bots?

The standard (spam)bot on Twitter has almost no followers, almost zero activity, and exist for a single simple purpose, for example to increase follower counts for certain individuals.

For this year’s Social Graphs and Interactions course, we wanted to do something different – we wanted to see how “intelligent” we could make our bots, using simple machine learning and network analysis methods (the topics covered in the class).

The class

A large part of our motivation for investigating Twitter bots in class is that the amount of manipulation that humans are experiencing on line is ever increasing. Think, for example, about how Facebook’s time-line filtering algorithm shapes the world view of hundreds of millions around the globe. And that’s just the most main stream example.

Instead of simply pointing out this fact, we thought that investigating how relatively simple bots can interact with and influence a real social system would be an interesting way for the students in our class to become aware of (and potentially counteract) some of those manipulations.

Some basic findings

Our first finding was that getting followers on Twitter is surprisingly easy! By employing a simple strategy, which takes advantage of a tendency for most users to “follow back” if followed by someone (who looks like a not-too-spammy Twitter profile). The recipe is simply

  1. Manually create a realistic profile, including a few tweets
  2. Pick users with between 50 and 300 followers (people with high numbers of followers are less likely to follow back).
  3. Follow about 100 new users per day.
  4. Unfollow whoever doesn’t follow you back within 24 hours (because users with a very asymmetrical ratio of following to followers look like spam-bots).
  5. Repeat steps 2-4.

Our initial target in the class was to get at least 50 followers that way, but in a relatively short time period the most successful bots had gained thousands of followers! Below is a plot of number of followers as a function of time for some of the most successful teams (for the first 50 days of the class).

Followers for 4 bots

These large numbers of followers (along with systematic interactions added later) also translated to very high Klout-scores.

Aside: In the beginning of the course, we focused on Justin Bieber followers, here’s a snapshot from a report describing an early avatar

Bieber Snippet

The next finding was that some teams in the class more or less inadvertently connected their bots to large “dark matter” components of the Twitter network, very large systems of spambots posting meaningless content and following back immediately in an automated fashion.

We did not explore these parts of the network in detail, but we note in passing that such areas are highly interesting for actual research, as they may create significant noise for analytics skewing results for algorithms working to predict the stock market or box office revenue based on the Twitter firehose.

The Twitter dark matter may create lots of noise on twitter, but are great for getting lots of followers quickly, and many followers is a key part of a convincing Twitter persona, as many Twitter users tend to think that someone with thousands of followers must have something interesting to say.

Social influence

As the course progressed, we focused on creating bots that could use machine learning to recognize “good” content for tweeting and retweeting. Bots that are able to detect topics within their tweet-stream … and distinguish between real, human accounts and robots among their followers.

However, the question remained: Can those thousands of followers  be converted to influence on Twitter? For the class’ final project, we decided to put that to the test.

The overall goal was to for each team to build a convincing bot, get human followers, and  at a specified time, for everyone work together to make specific hashtags trend on twitter. So how to achieve that goal? Here’s an overview of what each team has worked on

  • Build convincing avatars and use the high follower-counts as part of the disguise.
  • Use machine learning to tell who’s a bot and who’s not (in order to focus only on humans and ignoring bots).
  • Use natural language processing & machine learning to discover quality content to re-tweet and tweet.
  • Use network theory, to explore the network surrounding existing followers, making sure that bot actions reach entire communities.

Trending topics are defined for geographical areas. Since Copenhagen is not very active in the twitter-verse (sadly, Copenhagen does not have trending topics on Twitter), we chose Boston, MA (where both of us have lived) for the experiments. Thus all bots were located in Boston (both terms of in profile text and tweet-geotags) and tweet on an East Coast time table.

Specifically, the bots started following people located in Boston based on self-reported language, location, profile description and geo-tagged tweets. By the end of a three week period, more than 800 individuals in Boston followed at least one of our bots.

Days are counted from the beginning of the the class’ final project. The sudden drop in friends/followers corresponds to one, popular bot being banned for a few days

In addition to each bot’s idiosyncratic strategy for following new Twitter users, the bots maintained a shared list of Bostonians who had already followed (back) one of the bots. The idea being that if you follow more than one bot, content from the consortium of bots will increase proportionally in your Twitter stream.

Bostonians following N bots. Again, days are counted from the beginning of the final project.
Bostonians following N bots. Again, days are counted from the beginning of the final project.

With all that in place, we tried three distinct interventions, ordered by what we perceived to be increasing potential for virality. Each intervention consisted of a couple of manual tweets per bot and coordinated/automated re-tweeting and favoriting. The first hashtag #bostonthanks was designed to be an unusual (so as to be specific to our intervention) thanksgiving greeting that we hoped would become one of the chosen thanksgiving greetings for Boston. It didn’t really take off. The idea behind the second hashtag #MeInThree was to start a hashtag that would allow people to describe themselves in three words/concepts (something that is fun and fits within Twitter’s 140 characters). That didn’t  work either.

The third hashtag #banksyinboston was designed around secretive British  artist and prankster, Banksy, who travels the globe pseudonymously and interacts with the world through movies, graffiti, and happenings. The idea was to create a couple of primitive “fake” Banksy artworks and start a #banksyinboston discussion “Is he really here?” in what we hoped would be the spirit of Banksy himself. Boston also has original Banksy art, which might add to the discussion.


Above is one of the crudely made Banksy fakes with background image of Trinity church from Google Street View: note the artifact in the upper left edge of the photo. (Interestingly another and much more elaborate Banksy-in-Paris hoax/non-hoax started the same day!)

Much to our surprise (after two failed attempts), the third attempt actually succeeded in our stated goal of influencing the trending topics on Twitter!!


We did fall short of trending on Twitter’s own website, but #banksyinboston managed to get to the top of the trending list for the competing site trendsmap.com.

Analyzing the subsequent cascade of tweets reveals a couple of interesting things. Firstly, existing Boston graffiti was indeed re-discovered. Secondly, and most importantly, many Bostonians were highly effective in discovering that #banksyinboston was indeed a prank and spreading the word, here’s one example.

And much of the discussion related to the #banksyinboston was dedicated to putting the notion to rest. This echoes the behavior observed during the London Riots in 2011.


So at the end of the day, did the Twitter bots have influence in Boston? We stress that this is an anecdotal test and only the most viral hashtag made it to the trending list. With a little over 800 Bostonian followers, our bots did not infiltrate Boston – and most Twitter users in Boston never interacted with one of our bots.

But what we did show was that a few dedicated bots can make a difference. In four weeks, we managed to put together a small network with substantially more impact than a single individual with a similar number of followers.

Most importantly, someone with more time & resources could easily put together a much larger system of coordinated bots that – in terms of advertisement – could be used to gently boost interest in an upcoming movie/similar. Or – with malevolent intent – could use a network of “sleeper bots” to systematically spread mis-information, e.g. injecting talking points into Twitter streams on a global scale. We hope that this little experiment can be helpful in creating awareness of such subtle manipulations before they begin shaping our public conversations.

Appendix: Twitter bots – what are those!?

Here, we provide a bit of context on Twitter bots. The earliest recorded document (that we could find) on Twitter bots is a great Ignite talk by Tim Hwang from way back in 2009.

Some of the ideas in Tim’s talk were later tested by the web ecology project, and in a class at University of Washington, and recently bots have received lots of attention in the tech and business press (e.g Wall Street Journal [Inside a Twitter Robot Factory], The Atlantic [Why Did 9,000 Porny Spambots Descend on This San Diego High Schooler?]).