Monday June 13th is shaping up to be an exciting day for data science in Copenhagen. I’ve already announced that Christo Wilson is giving a talk at DTU, but now I’m happy to add Esteban Moro to the speaker line-up for a fantastic double bill. (And Piotr’s PhD defense at 2pm that afternoon will also be quite an event)

Esteban Moro is a researcher at Universidad Carlos III de Madrid in the GISC group working on complex systems. On his superb blog he notes that “The fact that the systems under study are complex does not mean that its behavior cannot be understood or anticipated. I believe research must be interdisciplinary and close to real life problems and because of that, I do research in social networks, financial markets or viral marketing (complex enough!)”.

Esteban’s work is creative, inspiring, and always exciting (plus often covered in the press). We are lucky to have him. The details of Esteban’s talk are

  • Time Monday June 13th, 10:45am
  • Place: DTU, Building 321, 1st floor lab space

Title: Pace of change in urban social networks

Abstract: Urban communities are seen both as highly structured social settings as well as distinctly vibrant environments for interaction, where personal relationships are initiated, consolidated and, eventually, lost and replaced by new relationships. Here we investigate statistical relationships between the social structure of the urban community and the pace at which such structure changes over time. To this end, we analyze the 19-month evolution of the social interactions pertaining to urban communities in England, Wales and Scotland, as described by 700 million of mobile phones calls made among 20 million inhabitants. We find that different urban communities display not only distinct social structures but also alter such structures at widely different paces. Furthermore, we investigate the impact of this heterogeneity in the network varying structure on information diffusion processes by simulating SI models. Our results indicate that time to infection can be well predicted using only static variables of the network, such as the number of connections, leading to the conclusion that the observed vibrant mechanics in link creation have a negligible impact on the information diffusion in terms of geographical spreading.

Christo Wilson Talk

A PhD defence is a great way to bring interesting people to Denmark, and Piotr’s defense on June 13th is no exception. This time we’re lucky to have recent NSF Career grant recipient Christo Wilson from Northeastern University visiting. Christo’s work includes auditing algorithms, security and privacy, and online social networks. Much of his work focuses on using measured data to analyze and understand complex phenomena on the Web. In many cases, he has leveraged the knowledge gained from measurements of the Web to build systems that improve security, privacy, and transparency for users – and getting lots of nice press coverage in the process.

  • Time: Monday June 13th, 10am
  • Location: DTU, Building 321, 1st floor lab space
Title: Caught Red Handed: Tracing Information Flows Between Ad Exchanges Using Retargeted Ads
Abstract: Numerous surveys have shown that Web users are seriously concerned about the loss of privacy associated with online tracking. Alarmingly, these surveys also reveal that people are also unaware of the amount of data sharing that occurs between ad exchanges, and thus underestimate the privacy risks associated with online tracking.
In reality, the modern ad ecosystem is fueled by a flow of user data between trackers and ad exchanges. Although recent work has shown that ad exchanges routinely perform cookie matching with other exchanges, these studies are based on brittle heuristics that cannot detect all forms of information sharing, especially under adversarial conditions.
In this study, we develop a methodology that is able to detect client- and server-side flows of information between arbitrary ad exchanges. Our key insight is to leverage retargeted ads as a mechanism for identifying information flows. Intuitively, our methodology works because it relies on the semantics of how exchanges serve ads, rather than focusing on specific cookie matching mechanisms. Using crawled data on 35,448 ad impressions, we show that our methodology can successfully categorize four different kinds of information sharing between ad exchanges, including cases were existing heuristic methods fail.

Data Stories Winner

Ulf Aslak Jensen, who’s writing his M.Sc thesis in my group (well, actually he’s at the Weizman institute working with Uri Alon, but that’s another story) has just won Science Magazine‘s Data Stories competition with the following video about a cool visualization he created based on SensibleDTU data.

Ulf has gotten lots of nice coverage, both internationally

And in the local Danish Press

Nice work!!

Dave Choffnes visit

Next Thursday, we’re lucky to have Dave Choffnes visiting the lab. David Choffnes is an assistant professor in the College of Computer and Information Science at Northeastern University. His research is primarily in the areas of distributed systems and networking, with a recent focus on mobile systems and privacy. Much of his work entails crowdsourcing measurement and performance evaluation of Internet systems by deploying software to users at the scale of tens or hundreds of thousands of users. He earned his PhD from Northwestern (not in the northwest), and completed a postdoc at the University of Washington (in the northwest) prior to joining Northeastern (both in the northeast and northwest). He sees no reason why this should at all be confusing. He is a co-author of three textbooks, and his research has been supported by the NSF, Google, the Data Transparency Lab, VidScale, M-Lab, and a Computing Innovations Fellowship.

  • Time: Thursday May 19th, 11am
  • Location: DTU, Building 321, 1st floor lab space

Title: ReCon: Identifying and Controlling Privacy Leaks from Mobile Devices

Abstract: Mobile systems have become increasingly popular for providing ubiquitous Internet access; however, recent studies demonstrate that software running on these systems extensively tracks and leaks users’ personally identifiable information (PII). I argue that these privacy leaks persist in large part because mobile users have little visibility into PII leaked through the network traffic generated by their devices, and have poor control over how, when and where that traffic is sent and handled by third parties.

In this talk, I describe ReCon, a cross-platform system that reveals PII leaks and gives users control over them without requiring any special privileges or custom OSes. Specifically, our key observation is that PII leaks must occur over the network, so we implement our system in the network using a software middlebox. We then use a machine learning approach to to efficiently and accurately detect users’ PII without knowing a priori the content that is PII. Further, we develop techniques to block, obfuscate, or ignore the PII leak, by displaying leaks via a visualization tool and letting the user decide how the system should act on transmitted PII. I discuss the design and implementation of the system and evaluate its methodology with measurements from controlled experiments and flows from a user study with more than 100 participants. In addition to revealing and controlling PII leaks, we are using our machine-learning-based techniques to automatically identify and block malware based on network behaviors.

United Stats of Words

In early December, Alan Mislove (who’s spending his sabbatical here in Copenhagen) and I, got the Volvo and headed out to the Amager Campus of University of Copenhagen to pick up Anders Søgaard, a professor of linguistics, to work on a top secret research project.

The project itself is still classified, but one of the things we’re looking into is word-usage in geo-coded tweets across the globe (to begin with, just America). To do this, Alan has trawled through something like 65 billion tweets and extracted the ones with geotags (1-2% of all tweets) further grabbing the ones that are from the US (about a third of those), ending up with a set of around 450 million geotagged tweets.

We couldn’t help ourselves – this dataset was just too cool not to visualize. And because Alan is a wizard, you can try this out for yourself on Once this thing hit twitter, people found lots of fantastic examples, and I’ve included some of my personal favorites below

Try it out for yourself – but be warned – it’s pretty darn addictive.


Individual states show up very nicely

Typing in “mississippi” will show both the river and the state.

Even countries work nicely

And (full disclosure), the title for this post was inspired by this tweet, although I like my own little tweak.

Not a bad year

As we enter the new year, it’s always fun to reflect on the year that’s just passed. And it’s been a good one. So good that I almost entitled this post “Everything is awesome”. Below is a list containing a lot of the stuff I should have written about during the year.

Graduation day

Back in June, Vedran Sekara became the first PhD graduate from my group. His thesis was on Dynamics of High Resolution Networks – a fine piece of work. And we were lucky to have Petter Holme and James Bagrow visit to be on the committee; it was great to see them both again.

2015-06-19 20.29.26

Upon graduating, Vedran landed a nice job with Sony (Lund offices) as a data scientist. He’s still a visiting researcher in the group and we’re currently collaborating on a few super interesting projects based on Sony’s LifeLog App data.

Arek @ Google

And Vedran is not the only person with a cool new job. Arek Stopczynski, a senior postdoc in my group (and all-round awesome data scientist) has landed a super exciting job with Google in California.

Arek’s work with Google is (of course) top-secret, but they’re lucky to have him!

Alan Mislove

Also this year, good friend, brilliant computer scientist, and associate professor at Northeastern University, Alan Mislove (+ familiy) is spending his sabbatical here in Denmark, with Alan visiting my group. Having him around is not only a lot of fun, but also enlightening … and we have a few exciting projects in the ‘under construction’ phase. And Alan is going to be around for another six months 🙂

Still young

For me, it was a big deal to receive the Sapere Aude Young Investigator Grant from the Danish Council for Independent research. The grant title is Microdynamics of Influence in Social Systems, and you can read a popular description of it here (it’s in Danish). This grant is not easy to win, and will  keep me in business for the next few years.

Sune & Hal Varian

And more in Google (and other) news. In September, I gave a talk at the event “Big data til gavn for vækst og velfærd – en unik dansk mulighed“, which took place at the Danish National Museum (a pretty cool venue). I gave the talk with collaborator and all-round great guy David Dreyer Lassen.This event, however, had some pretty cool remaining speakers, which included Hal Varian who’s Google’s chief economist and arguably one of the most influential people on the planet.

There were other fancy speakers, for example the Danish Minister of the Interior (“Social- og indenrigsminister”) Karen Elleman.

Press coverage

This year, my group received lot’s of nice press coverage. Below is a selection.

On TV!

As a first, fun thing I was interviewed on TV for the first time. It was just a local Copenhagen channel, but it was still scary to be right there in a pro studio being interviewed “live on tape”. Oh and the interview (which is in Danish) was about the Science paper Unique in the Shopping Mall by some of our good friends and collaborators at MIT.

There were a couple of additional videos about our works. One created by DEIC as part of their new e-Science knowledge portal. Watch it here. And German TV also sent a crew to report on the SensibleDTU experiment.

Router Crazyness

Another big event was my PhD studen Piotr Sapiezynski’s paper Tracking Human Mobility Using Wifi SignalsThe paper is about how easy it is to recreate human mobility traces using the routers that our smartphones connect to. And has a nice explainer site. I also wrote about it on this blog and tweeted this:

But that was just the beginning. That post was by far my most read in the history of this blog and still skews the month-to-month statistics.

And the paper was covered widely, also in the international press, for example the Atlantic’s CityLab:

Lots of other coverage

We also received lots of other nice Press coverage. I was in the DTU paper talking about how academics can use Twitter. You can find a link in the nice tweet from The Danish Agency for Science, Technology and Innovation (Forsknings og Innovationsstyrelsen).

We were also covered in Politiken and in the magazine Dynamo with a beautiful photo spread, featuring Andrea Cuttone‘s beautiful graphics.

Also, my paper with Jari Saramäki and Talayeh Aladavood also got lots of coverage, below are a couple of examples:

Finally, Vedran and I wrote about Network Science in a danish popular physics journal and made with cover with one of Vedran’s beautiful visualizations.

The full details on all of this can be found on the Press page, when I get around to updating that.

Great exchange visits

This was also the where year two of my PhD students were spending 6 months of their program abroad (this is standard for Danish PhD students). Piotr Sapiezynski visited Jure Leskovec at Stanford and Andrea Cuttone is still visiting Marta Gonzalez at MIT. Feel very lucky to be able to send the guys out to these groups that are among the most exciting places on the planet.

Coursera course

And I also created a Coursera version of my Social Graphs and Interactions course. Here’s  a link to the course page: The video explains it pretty well.


Digital Halo grant

We also got a very nice grant from the Data Transparency Lab to study browsing behavior.

We were in excellent company – the other grantees were from prestigious universities like Princeton University, Carnegie Mellon University, Northwestern University, Columbia University, and many other fine schools. Here’s a little 40 sec. video explaining the project.

Great trip to Cologne

I recently went to the excellent GESIS CSS Winter Symposium in Cologne. The symposium was brilliantly organized by Markus Strohmaier, who has grown it into a major Computational Social Science event (this year 290 participants) within a very short time. So many interesting people to talk to!

My talk was about Fundamental Structures of Complex Social Network and based on forthcoming work with Vedran Sekara. I had a nice timeslot and received lots of Twitter love (reproduced below for easy access), which – I  have to admit – feels pretty darn great when you’ve worked hard to create some exciting science!