Jim Bagrow Visit and Talk

This week my good friend & collaborator James Bagrow (assistant professor at University of Vermont) is visiting the group. He’s an excellent speaker, and we’re lucky enough that he’s agreed to give a talk as part of his visit. If you’re anywhere near Copenhagen, his talk is worth the trip out to DTU. Here are the details:

  • Time: Friday June 19th, 2015. 10:00am
  • Location: Technical University of Denmark, Building 321. First floor “Lab Space”. If you need directions, click here.
  • Title: Data-driven approaches to studying human dynamics
  • Abstract: Research on human dynamics and computational social science has been revolutionized by new data taken from online social networks. These modern datasets capture activity patterns across very large populations. Using these records, new results have been discovered and existing hypotheses have been tested. But what is the fundamental limit of social information stored in these data? These data also have sampling biases and other issues that make uncertainty quantification crucial. Along these lines, I will discuss current projects related to inferring hidden structure in partially observed networks and using large-scale Twitter data to estimate how information is stored and flows through social networks.  

(And Vedran Sekara’s PhD defense is that same afternoon).

Tracking Human Mobility using WiFi signals

When I started working on understanding social systems, privacy really wasn’t on my mind. (I generally want to write down equations, understand the universe and all that). But one of the central realizations arising from our SensibleDTU experiment is that privacy needs to be an important part of this kind of research. I’ve written about this at length elsewhere. One of the things we noticed while digging into terabytes of social data is that data-channels are highly correlated. Information “bleeds through” … something which has serious implications for privacy. Case in point: My group has just released a new preprint (get it here) that shows how the WiFi information routinely collected by your smartphone can easily be converted to precise information about your location. WiFi routers reveal where you live, work, and spend your leisure time. While your phone may have told you that WiFi helps “improve location accuracy”, it may come as a surprise that

  • A majority of apps in the store have access to the list of routers around you (scanned every 20 seconds).
  • Your Android smartphone by default scans for WiFi routers even if you disable WiFi.

Our research shows

  • How to easily convert WiFi information into geographical position.
  • That although it sounds like all WiFi scans might be a lot of data to process, your mobility can be described using just a few of access points. And we have built an Android app which only requires WiFi data to illustrate how this works for your own mobility: Download here.
  • That if someone knows these routers at some point in time, they will still know a lot about your mobility six months later.

Thus, while WiFi networks are intended for enabling connectivity, they are also a de facto location tracking infrastructure. More generally, our world is becoming more enclosed in a web infrastructures supporting communication, mobility, payments, and advertising. Logs from mobile phone networks (call detail records, CDRs) constitute a global database of human mobility and communication networks. Credit card records form high-resolution traces of our spending behaviors.

The figure shows 48 hours of location data of one of the authors, with the four visited locations visited marked in blue: home, two offices, and a food market. Even though the author's phone has sensed 3,822 unique routers in this period, only a few are enough to describe the location more than 90% of time. (a) traces recorded with GPS; (b) traces reconstructed using all available data on WiFi routers locations - the transition traces are distorted, but all stop locations are visible and the location is known 97% of the time. (c) with 8 top routers it is still possible to discover stop locations in which the author spent 95% of the time. In this scenario transitions are lost. (d) timeseries showing when during 48 hours each of the top routers were seen. It can be assumed that AP 1 is home, as it's seen every night, while AP 2 and AP 3 are offices, as they are seen during working hours. The last row shows the combined 95% of time coverage provided by the top 8 routers.
The figure shows 48 hours of location data of one of the authors, with the four visited locations visited marked in blue: home, two offices, and a food market. Even though the author’s phone has sensed 3,822 unique routers in this period, only a few are enough to describe the location more than 90% of time. (a) traces recorded with GPS; (b) traces reconstructed using all available data on WiFi routers locations – the transition traces are distorted, but all stop locations are visible and the location is known 97% of the time. (c) with 8 top routers it is still possible to discover stop locations in which the author spent 95% of the time. In this scenario transitions are lost. (d) timeseries showing when during 48 hours each of the top routers were seen. It can be assumed that AP 1 is home, as it’s seen every night, while AP 2 and AP 3 are offices, as they are seen during working hours. The last row shows the combined 95% of time coverage provided by the top 8 routers.

It is already a well know fact, that the so-called “WiFi scanners” can be used to track individuals. This is done by cities, airports, shopping centers, and advertisers (and perhaps intelligence agencies). Some OS manufacturers (e.g. Apple and Chainfire) have recently responded to such tracking by frequently randomizing the unique identifier of each phone. Randomizing the phone identifier, however, does not address the threat presented in our work—where data is collected by an application on the phone, not by external devices. The privacy of WiFi scan results is often overlooked. In the Android ecosystem the WiFi scans are not considered as a location signal. WiFi information can be collected by applications without location permission, do not show up in the overview of applications using location data, and the WiFi permission is not considered sensitive. This makes it possible for 3rd party developers to collect high-resolution mobility data under the radar, circumventing the policy and the privacy model of the Android ecosystem. Any app with just the WiFi permission can track your position, although they don’t necessary do (there are legitimate reasons for applications to ask for WiFi permission, although this permission seems to be requested more often than required). Last time we checked (February 2015), 17 out of 20 top games on Android Play Store required access to your WiFi data; in only 6 of those 17 cases their privacy policy provided reasons why this information is required. For more information, email the paper’s first author Piotr (pisa@dtu.dk), who collaborated on this post. Or me (sljo@dtu.dk). The preprint is available on arXiv.

Update June 3rd, 2015 (maybe-our-paper-played-a-role-in-this edition)

Yesterday, while scouring Google I/O for details on the updated permissions (and to see if anyone mentioned our work), we found that a Google engineer (Ben Poiesz) was asked about the issue of WiFi tracking during the session discussing the new permission model. The session took place on May 29th – the clip is here:

In the video, the friendly Google engineer notes that that – under the new system – apps without the location permission will no longer be able to see the mac addresses of WiFi and Bluetooth devices around … because that’s that’s equivalent to location.

No one is claiming (least of all us) that our work caused the change, but we would like to point out a couple of things about the way Google chose to announce it, which might indicate that the choice of fixing wifi is a recent decision on Google’s part:

  • The published source code [find it here] (lines 99-114) and documentation [find it here] do not yet indicate that WiFi information is to be treated as location.
  • When you install the current Android M beta on your phone, our “WiFi Watchdog”app still works … and WiFi is not treated as location. And a technical point: This it’s not just because of the “legacy mode” – according to the same presentation (https://youtu.be/f17qe9vZ8RM?t=13m): “WiFi Watchdog” should just receive empty data on Android M, but instead it continues to receive the same data as on Lollipop
  • The announcement of this arguably major change (80% of apps on the market would potentially be affected) was not a part of the main presentation … but an answer during the Q&A session.

Now, it is probably just a coincidence, and maybe a fix for the WiFi permissions has been in the works for months. But it’s quite striking that Google decided to fix wifi permissions 7 years after the existing scheme was introduced (and just days after we published our paper).

What it means to be a pro

I just love this quote which uses a Tiger Woods anecdote to illustrate what it means to be a professional. It’s from The War of Art by Steven Pressfield (a great read, btw).

With four holes to go on the final day of the 2001 Masters (which Tiger went on to win, completing the all-four-majors-at-one-time Slam), some chucklehead in the gallery snapped a camera shutter at the top of Tiger’s backswing. Incredibly, Tiger was able to pull up in mid-swing and back off the shot. But that wasn’t the amazing part. After looking daggers at the malefactor, Tiger recomposed himself, stepped back to the ball, and striped it 310 down the middle.
That’s a professional. It is tough-mindedness at a level most of us can’t comprehend, let alone emulate. But let’s look more closely at what Tiger did, or rather what he didn’t do.
First, he didn’t react reflexively. He didn’t allow an act that by all rights should have provoked an automatic response of rage to actually produce that rage. He controlled his reaction. He governed his emotion.
Second, he didn’t take it personally. He could have perceived this shutterbug’s act as a deliberate blow aimed at him individually, with the intention of throwing him off his shot. He could have reacted with outrage or indignation or cast himself as a victim. He didn’t.
Third, he didn’t take it as a sign of heaven’s malevolence. He could have experienced this bolt as the malice of the golfing gods, like a bad hop in baseball or a linesman’s miscall in tennis. He could have groaned or sulked or surrendered mentally to this injustice, this interference, and used it as an excuse to fail. He didn’t.
What he did do was maintain his sovereignty over the moment. He understood that, no matter what blow had befallen him from an outside agency, he himself still had his job to do, the shot he needed to hit right here, right now. And he knew that it remained within his power to produce that shot. Nothing stood in his way except whatever emotional upset he himself chose to hold on to.
That’s something to aspire to.

Visitors this month

This month we have a two excellent of long-term visitors in the group.

Visiting all month is Ivan Brugere a graduate from Tanya Berger-Wolff‘s group at University of Illinois, Chicago. Ivan is interested in Spatiotemporal network mining, Network inference and prediction, and Social network privacy modeling.

Stopping by between April 12th and April 18th is Laura Allesandretti, who’s a graduate student with Andrea Baronchelli at City University London. Laura, Andrea and I are studying the long-term changes in individual and collective mobility patterns. In the literature, human mobility is typically described on a meta-stable time-scale, where mobility is characterized by regular patterns. We are interested in how this meta-stable regime evolves over long stretches of time (years).

overal_network

Ivan & Laura will both be giving talks during their visits, so stay tuned for more info.

Petter Holme

Emphasizing our focus on temporal networks, I am happy to announce that temporal networks czar,  Petter Holme will visit the lab on Feb 18th. Petter is the author (with Jari Saramäki, who visited last week) of the recent  & excellent review on temporal networks.

He will be giving giving an talk, and if you’re in the neighborhood, I highly recommend attending.

  • Speaker: Petter Holme. Associate Professor. Department of Energy Science. Sungkyunkwan University, Suwon Korea
  • Date. Feburary 18th, 2015
  • Time. 14:00
  • Location. DTU, Building 321, room 134
  • Title: Temporal networks of human interaction

Abstract: Since the turn of the millennium, networks have become a universal paradigm for simplifying large-scale complex systems, and for studying their system-wide functionalities. At the same time, there is considerable evidence that temporal structures, such as the burst-like behavior of human activity, affect dynamic systems on the network. These two lines of research come together in the study of temporal networks. Over the last five years, there has been a growing interest in how to analyze and model datasets in which we not only know which units interact (like in a traditional, static network), but also when the interactions take place. Just like static network analysis, the development of temporal network theory has been accelerated by the availability of new datasets. It should be noted that temporal networks are more than just extensions of static networks—they are e.g. (unlike simple, directed, weighted and multiplex networks) not transitive. In other words, if A and B are connected, and B and C are also connected, this does not imply that A and C are connected. Perhaps for this reason, temporal network theory has focused less on structural measures and studies of simple evolutionary models, and more on randomization studies and the simulation of spreading on empirical data. I will describe the state of the field, my own contributions (mostly about how temporal contact patterns affect infectious disease spreading), and discuss some future challenges.

Privacy Part II: Some examples of why privacy is important.

[This is part II of a series, you can find an overview here]

There are many reasons why privacy is important. I will not try to cover them all here, but instead I have chosen two central topics, which I find particularly important.

“I have nothing to hide, so why should I care?”

This one is a classic retort against privacy advocates. It has been used by Google’s then-CEO Eric Schmidt, who famously noted “If you have something that you don’t want anyone to know, maybe you shouldn’t be doing it in the first place” during a TV interview. And on the surface, it looks like a pretty good one (one that I might have used a few years ago).

Keeping private does not imply wrongdoing

There are many reasons why the nothing-to-hide stance is problematic. To me, the central reason is that it pre-supposes that things we want to keep private are “secrets”, the argument insinuates that a kind of wrongdoing is taking place whenever we want to keep something to ourselves.

But imagine that you have just found out you and your significant other are pregnant. Maybe that’s a piece of information that you would like to wait approximately 12 weeks before you share with the world? And maybe you would want to tell your close family before before announcing the news to a broader circle of friends? On a less cheery note, you might want to control how your surroundings learn about other big personal events, for example a serious disease, such as cancer. As another example consider someone who just fell in love. Not being able to control who knows about your deepest feelings could potentially be deeply embarrassing.

These are not exactly dark secrets – just personal issues. But something that most people can understand, why we’d like to keep private.

Personal freedom is restricted

More generally, a world where all of your actions are known to everyone, becomes a world where personal freedom is restricted. I feel like I’m already experiencing this on e.g. Facebook, it looks like many present a curated, version of reality to the world, focusing mostly on positive aspects of their life (think photos of cute kids + delicious meals), while ignoring moments of doubt and insecurity. On Twitter, I know that what I say is persistent, so I usually avoid saying anything negative. In writing this post, I searched for “US torture war on terror” on Google and wondered if that would put me on som kind of watch list.

Because there is a multitude of things that are completely legitimate, but that we might not want to share with everyone – we risk inhibiting ourselves whenever one of those topic come up in a “persistent medium”.  That also means that your freedom is particularly reduced if your personal preferences do not line up completely with main-stream social norms. For example, in a world where every action is know to everyone, young gay or transgender people might have a difficult time finding themselves (= even more difficult than now).

Nothing-to-hide and the government

“Trust is good, but control is better”, as Lenin probably said. If a government systematically collects data on its citizens, the nothing-to-hide discussion finds new nuances.

Private information can be used as a means of control (e.g. via blackmail). Now, if your opponent has lots of ressources as well as access to a powerful legal system, this type of control  is not limited to individuals with “something to hide”. There are some great quotes on this. Bruce Schneier points to  Cardinal Richelieu who said “If one would give me six lines written by the hand of the most honest man, I would find something in them to have him hanged.” The russian dissident Aleksandr Solzhenitsyn said “Everyone is guilty of something or has something to conceal. All one has to do is look hard enough to find what it is”.

I tend to trust my government, so I’m not too worried about being blackmailed. As a pretty main-stream person, that’s probably a good assessment of my situation. But what if you’re in a minority? The United States has tortured innocent people  as part of the war on terror. Homosexual acts were illegal in the UK until 1967, and it can be argued that racial segregation in the US persists in varying degrees to the present day. Additionally, much of the world is run by governments that are not democratic and whose choices and inner workings are not transparent to their citizens.

Finally, even if you truly feel like sharing everything, there is a strong argument that as a society we want some people to have secrets. We want a free press with journalists that can protect their sources. Protected sources give citizens access to parts of society that we can otherwise never access – the criminal world – or inside governments.

The future: loss of self

There is another argument for privacy. It’s a a little more out here, but still central to the debate. The essence of the argument is that data collected about individuals can be used for other kinds of control than simply using blackmail. The next level is individualized manipulation.

In order to get that part up and running, recall that there has been some very interesting developments in the behavioral science, such as cognitive psychology, social psychology, behavioral economy, etc. The term that embodies all of these developments is nudging. The general idea is that, during human evolution, our brain has developed to make very quick decisions in a world that looks very different from our modern surroundings. Because of the need for milli-second speed, many of our decisions are not based on rational chains of thought, but on built-in heuristics. If you have a few hours to kill, check out of this list of known cognitive biases. Nudging is essentially the practice of “hacking” these heuristics to manipulate human behavior – and can be used as a force for good (e.g. to recycle or promote saving for retirement) or in more questionable ways (e.g. to sell us stuff).

One can imagine that data mining of personal data can be used to create personalized nudges. This is already happening to some extent – for example people with macs are steered towards more expensive hotel rooms than windows users on some sites.

Clearly, humans have always been manipulating each other – just think back to last time you purchased a car. But algorithmic nudging is different. In part because it runs at scale, with a single company potentially reaching hundreds of millions of users, and in part because the nudging potentially can be much more precise and effective.

We’re not there yet, but the long term perspectives are terrifying. In a fascinating piece in the New York Times called “Privacy and the Threat to the Self“, the philosopher Michael P. Lynch makes the case that complete loss of privacy effectively dehumanizes us and takes away our “self”. He writes:

To get a sense of what I mean, imagine that I could telepathically read all your conscious and unconscious thoughts and feelings — I could know about them in as much detail as you know about them yourself — and further, that you could not, in any way, control my access. You don’t, in other words, share your thoughts with me; I take them. The power I would have over you would of course be immense. Not only could you not hide from me, I would know instantly a great amount about how the outside world affects you, what scares you, what makes you act in the ways you do.  And that means I could not only know what you think, I could to a large extent control what you do.

Here, Lynch – from another vantage point – discusses what we have covered above. The fact that knowing about people allow you to control them. But we begin to see that it’s not just about blackmail, but also about manipulation. That’s where the personalize nudging comes in. Knowing enough allows you to accurately “read people’s mind” or at least anticipate their actions. He continues to say:

That is the political worry about the loss of privacy: it threatens  a loss of freedom. And the worry, of course, is not merely theoretical. Targeted ad programs, like Google’s, which track your Internet searches for the purpose of sending you ads that reflect your interests can create deeply complex psychological profiles — especially when one conducts searches for emotional or personal advice information: Am I gay? What is terrorism? What is atheism? If the government or some entity should request the identity of the person making these searches for national security purposes, we’d be on the way to having a real-world version of our thought experiment.

In the second paragraph, Lynch discusses another point that I have touched upon above. Surveillance is already happening. And the quote contains nice examples of how our online behavior may reveal lots of information about us, which is private in the sense that we might not want to share with everyone, but not something which implies any kind of wrongdoing. The final paragraph goes into why all this implies a loss of self:

But the loss of privacy doesn’t just threaten political freedom. Return for a moment to our thought experiment where I telepathically know all your thoughts whether you like it or not From my perspective, the perspective of the knower — your existence as a distinct person would begin to shrink. Our relationship would be so lopsided that there might cease to be, at least to me, anything subjective about you. As I learn what reactions you will have to stimuli, why you do what you do, you will become like any other object to be manipulated. You would be, as we say, dehumanized.

If someone knows everything about you, and can manipulate you to their whim, you cease to be a human being. If that’s not scary, I don’t know what is.

Now, we’re still far from this scenario (last time I bought a pair of sunglasses on line, I encountered pointless ads for sunglasses on most sites for months after) . In fact, it’s not clear to me that we will ever get to the point where we can accurately make predictions on the actions of single human beings. [BTW another great article on the topic, from a more practical standpoint can be found here]. But I hope that I’ve made a case that privacy is something that is so important that we should all be discussing it.