Data Stories Winner

Ulf Aslak Jensen, who’s writing his M.Sc thesis in my group (well, actually he’s at the Weizman institute working with Uri Alon, but that’s another story) has just won Science Magazine‘s Data Stories competition with the following video about a cool visualization he created based on SensibleDTU data.

Ulf has gotten lots of nice coverage, both internationally

And in the local Danish Press

Nice work!!

Dave Choffnes visit

Next Thursday, we’re lucky to have Dave Choffnes visiting the lab. David Choffnes is an assistant professor in the College of Computer and Information Science at Northeastern University. His research is primarily in the areas of distributed systems and networking, with a recent focus on mobile systems and privacy. Much of his work entails crowdsourcing measurement and performance evaluation of Internet systems by deploying software to users at the scale of tens or hundreds of thousands of users. He earned his PhD from Northwestern (not in the northwest), and completed a postdoc at the University of Washington (in the northwest) prior to joining Northeastern (both in the northeast and northwest). He sees no reason why this should at all be confusing. He is a co-author of three textbooks, and his research has been supported by the NSF, Google, the Data Transparency Lab, VidScale, M-Lab, and a Computing Innovations Fellowship.

  • Time: Thursday May 19th, 11am
  • Location: DTU, Building 321, 1st floor lab space

Title: ReCon: Identifying and Controlling Privacy Leaks from Mobile Devices

Abstract: Mobile systems have become increasingly popular for providing ubiquitous Internet access; however, recent studies demonstrate that software running on these systems extensively tracks and leaks users’ personally identifiable information (PII). I argue that these privacy leaks persist in large part because mobile users have little visibility into PII leaked through the network traffic generated by their devices, and have poor control over how, when and where that traffic is sent and handled by third parties.

In this talk, I describe ReCon, a cross-platform system that reveals PII leaks and gives users control over them without requiring any special privileges or custom OSes. Specifically, our key observation is that PII leaks must occur over the network, so we implement our system in the network using a software middlebox. We then use a machine learning approach to to efficiently and accurately detect users’ PII without knowing a priori the content that is PII. Further, we develop techniques to block, obfuscate, or ignore the PII leak, by displaying leaks via a visualization tool and letting the user decide how the system should act on transmitted PII. I discuss the design and implementation of the system and evaluate its methodology with measurements from controlled experiments and flows from a user study with more than 100 participants. In addition to revealing and controlling PII leaks, we are using our machine-learning-based techniques to automatically identify and block malware based on network behaviors.

United Stats of Words

In early December, Alan Mislove (who’s spending his sabbatical here in Copenhagen) and I, got the Volvo and headed out to the Amager Campus of University of Copenhagen to pick up Anders Søgaard, a professor of linguistics, to work on a top secret research project.

The project itself is still classified, but one of the things we’re looking into is word-usage in geo-coded tweets across the globe (to begin with, just America). To do this, Alan has trawled through something like 65 billion tweets and extracted the ones with geotags (1-2% of all tweets) further grabbing the ones that are from the US (about a third of those), ending up with a set of around 450 million geotagged tweets.

We couldn’t help ourselves – this dataset was just too cool not to visualize. And because Alan is a wizard, you can try this out for yourself on Once this thing hit twitter, people found lots of fantastic examples, and I’ve included some of my personal favorites below

Try it out for yourself – but be warned – it’s pretty darn addictive.


Individual states show up very nicely

Typing in “mississippi” will show both the river and the state.

Even countries work nicely

And (full disclosure), the title for this post was inspired by this tweet, although I like my own little tweak.

Not a bad year

As we enter the new year, it’s always fun to reflect on the year that’s just passed. And it’s been a good one. So good that I almost entitled this post “Everything is awesome”. Below is a list containing a lot of the stuff I should have written about during the year.

Graduation day

Back in June, Vedran Sekara became the first PhD graduate from my group. His thesis was on Dynamics of High Resolution Networks – a fine piece of work. And we were lucky to have Petter Holme and James Bagrow visit to be on the committee; it was great to see them both again.

2015-06-19 20.29.26

Upon graduating, Vedran landed a nice job with Sony (Lund offices) as a data scientist. He’s still a visiting researcher in the group and we’re currently collaborating on a few super interesting projects based on Sony’s LifeLog App data.

Arek @ Google

And Vedran is not the only person with a cool new job. Arek Stopczynski, a senior postdoc in my group (and all-round awesome data scientist) has landed a super exciting job with Google in California.

Arek’s work with Google is (of course) top-secret, but they’re lucky to have him!

Alan Mislove

Also this year, good friend, brilliant computer scientist, and associate professor at Northeastern University, Alan Mislove (+ familiy) is spending his sabbatical here in Denmark, with Alan visiting my group. Having him around is not only a lot of fun, but also enlightening … and we have a few exciting projects in the ‘under construction’ phase. And Alan is going to be around for another six months 🙂

Still young

For me, it was a big deal to receive the Sapere Aude Young Investigator Grant from the Danish Council for Independent research. The grant title is Microdynamics of Influence in Social Systems, and you can read a popular description of it here (it’s in Danish). This grant is not easy to win, and will  keep me in business for the next few years.

Sune & Hal Varian

And more in Google (and other) news. In September, I gave a talk at the event “Big data til gavn for vækst og velfærd – en unik dansk mulighed“, which took place at the Danish National Museum (a pretty cool venue). I gave the talk with collaborator and all-round great guy David Dreyer Lassen.This event, however, had some pretty cool remaining speakers, which included Hal Varian who’s Google’s chief economist and arguably one of the most influential people on the planet.

There were other fancy speakers, for example the Danish Minister of the Interior (“Social- og indenrigsminister”) Karen Elleman.

Press coverage

This year, my group received lot’s of nice press coverage. Below is a selection.

On TV!

As a first, fun thing I was interviewed on TV for the first time. It was just a local Copenhagen channel, but it was still scary to be right there in a pro studio being interviewed “live on tape”. Oh and the interview (which is in Danish) was about the Science paper Unique in the Shopping Mall by some of our good friends and collaborators at MIT.

There were a couple of additional videos about our works. One created by DEIC as part of their new e-Science knowledge portal. Watch it here. And German TV also sent a crew to report on the SensibleDTU experiment.

Router Crazyness

Another big event was my PhD studen Piotr Sapiezynski’s paper Tracking Human Mobility Using Wifi SignalsThe paper is about how easy it is to recreate human mobility traces using the routers that our smartphones connect to. And has a nice explainer site. I also wrote about it on this blog and tweeted this:

But that was just the beginning. That post was by far my most read in the history of this blog and still skews the month-to-month statistics.

And the paper was covered widely, also in the international press, for example the Atlantic’s CityLab:

Lots of other coverage

We also received lots of other nice Press coverage. I was in the DTU paper talking about how academics can use Twitter. You can find a link in the nice tweet from The Danish Agency for Science, Technology and Innovation (Forsknings og Innovationsstyrelsen).

We were also covered in Politiken and in the magazine Dynamo with a beautiful photo spread, featuring Andrea Cuttone‘s beautiful graphics.

Also, my paper with Jari Saramäki and Talayeh Aladavood also got lots of coverage, below are a couple of examples:

Finally, Vedran and I wrote about Network Science in a danish popular physics journal and made with cover with one of Vedran’s beautiful visualizations.

The full details on all of this can be found on the Press page, when I get around to updating that.

Great exchange visits

This was also the where year two of my PhD students were spending 6 months of their program abroad (this is standard for Danish PhD students). Piotr Sapiezynski visited Jure Leskovec at Stanford and Andrea Cuttone is still visiting Marta Gonzalez at MIT. Feel very lucky to be able to send the guys out to these groups that are among the most exciting places on the planet.

Coursera course

And I also created a Coursera version of my Social Graphs and Interactions course. Here’s  a link to the course page: The video explains it pretty well.


Digital Halo grant

We also got a very nice grant from the Data Transparency Lab to study browsing behavior.

We were in excellent company – the other grantees were from prestigious universities like Princeton University, Carnegie Mellon University, Northwestern University, Columbia University, and many other fine schools. Here’s a little 40 sec. video explaining the project.

Great trip to Cologne

I recently went to the excellent GESIS CSS Winter Symposium in Cologne. The symposium was brilliantly organized by Markus Strohmaier, who has grown it into a major Computational Social Science event (this year 290 participants) within a very short time. So many interesting people to talk to!

My talk was about Fundamental Structures of Complex Social Network and based on forthcoming work with Vedran Sekara. I had a nice timeslot and received lots of Twitter love (reproduced below for easy access), which – I  have to admit – feels pretty darn great when you’ve worked hard to create some exciting science!


Alan Mislove

This whole year, we’re lucky enough to have collaborator & all-round awesome guy Alan Mislove spending his sabbatical connected to my group. Alan is an associate professor College of Computer and Information Science at Northeastern University. His research concerns distributed systems and networks, with a focus on using social networks to enhance the security, privacy, and efficiency of newly emerging systems.  He is a recipient of an NSF CAREER Award (2011), and his work has been covered by the Wall Street Journal, the New York Times, and the CBS Evening News.

In October, Alan will give a talk about recent work that has been widely covered in the media – and that I think will be interesting to many of you – the details are here:

  • Date & Time: October 9th, 2015, 11am
  • Venue: DTU, Building 321, first floor lab space
  • Title: Measuring personalization of online services
  • Abstract: Today, many web services personalize their content, including Netflix (movie recommendations), Amazon (product suggestions), and Yelp (business reviews). In many cases, personalization provides advantages for users: for example, when a user searches for an ambiguous query such as “router,” Amazon may be able to suggest the woodworking tool instead of the networking device.  However, personalization is rarely transparent (or even labeled), and has the potential be used to the user’s disadvantage.  For example, on e-commerce sites, personalization could be used to manipulate the set of products shown (price steering) or by customizing the prices of products (price discrimination).  Unfortunately, today, we lack the tools and techniques necessary to be able to detect when personalization is occurring, as well as what inputs are used to perform personalization.

    In this talk, I discuss my group’s recent work that aims to address this problem.  First, we develop a methodology for accurately measuring when web services are personalizing their content.  While conceptually simple, there are numerous details that our methodology must handle in order to accurately attribute differences in results to personalization (as opposed to other sources of noise).  Second, we apply this methodology to two domains:  Web search services (e.g., Google, Bing) and e-commerce sites (e.g.,, Expedia).  We find evidence of personalization for real users on both Google search and nine of the popular e-commerce sites.  Third, using fake accounts, we investigate the effect of user attributes and behaviors on personalization; we find that the choice of browser, logging in, and a user’s previously content can significantly affect the results presented.