High Throughput Humanities: Final Call for Abstracts

Note: This post was originally posted on the Complexity and Social Networks Blog.

A quick reminder that April 30th is the final chance to submit an abstract to the High Throughput Humanities Workshop that I’m organizing along with Riley Crane , Gourab Ghoshal, and Max Schich, at this years European Conference on Complex Systems in Lisbon this September (I wrote about this in more detail a couple of months ago).

We have an amazing Program Committee that includes:

Albert-László Barabási, CCNR Northeastern University, USA.
Guido Caldarelli, INFM-CNR Rome, Italy.
Gregory Crane, Tufts University, USA.
Lars Kai Hansen, Technical University of Denmark.
Bernardo Huberman, HP Laboratories, USA.
Martin Kemp, Trinity College, Oxford, UK.
Roger Malina, Leonardo/ISAST, France.
Franco Moretti, Stanford University, USA.
Didier Sornette, ETH Zurich, Switzerland.

Full details can be found at the workshop website http://hth.eccs2010.eu/. There’s even a neat little introductory video (from our talk at Ignite Boston 7):

We hope you will submit an abstract!

Worlds Colliding

Note: This post was originally posted on the Complexity and Social Networks Blog.

During a press conference at last week’s SxSW conference, product manager of Google’s gmail team, Todd Jackson, revealed an interesting bit of information about the company’s problem-ridden new service Google Buzz:

Jackson told the crowd, as he’s previously said to reporters, that too much was assumed about how Buzz would work best and be received based on Google’s internal testing. Google employees didn’t have a strong use case for “muting” their fellow Google employees, and the people they’d want to follow and be followed by closely matched up to their contact lists. In general, too, Jackson suggested that Google underestimated the impact of “having a social, public service appear inside … what is a very private thing (email) for some people [1].

So by testing their social service inside a single context (Google employees only), the developers failed to notice that in real life, people participate in multiple contexts (family, work, friends, etc) that they work actively to keep separate. The reasons for wanting to keep these groups separate can range from wanting to keep an illicit affair secret from your spouse to political activists in oppressive regimes wanting to keep certain connections secret from the government [2]. Another important reason to keep our communities separate, is that we often play different roles – and communicate differently – in different contexts, as illustrated beautifully in the following clip from TV’s Seinfeld:

So, ironically, the key problem for Buzz, Google’s social network service was that the engineers at the Googleplex had failed to understand an essential property of real-world social networks. Figure 1 illustrates the problem:

google_vs_real.jpg

Figure 1A shows a cartoon version of Google’s internal testing situation. It’s clear that in this situation, since an individual (the gray node) only belongs to a single social context, sharing contact information with his neighbors reveals no new information to his social network. However, an ego-centered network in the wild looks more like the situation depicted in Figure 1B. Here, the gray node is a member of several communities (nodes with different colors) with very little communication between communities. Now, because people typically manage all of their ‘worlds’ from their email inbox, what Google did when they created Buzz’ automatic friends-lists, was to implicitly link people’s worlds, revealing the precisely the information that people work to supress. Sometimes with serious implications.

It is interesting to consider what the structure displayed in Figure 1B implies for the full graph. For an individual, the world breaks neatly into a small set of social contexts, but when every single node is in this situation, then the resulting total structure becomes very different from many of the model networks that are currently in use. In my own corner of the complex networks world, this has serious implications for rapidly growing field of community detection [3]. Currently, most algorithms are designed to search for densely connected sets of nodes that are weakly connected to the rest of the network, and while some methods do include the possibility of community overlap, most break down if the overlap constitutes more than a small fraction of the number of nodes. If Figure 1B is correct and overlap is present for all nodes, then the idea of communities as weakly connected to the remainder of the network is false — since communities will have many more links to the outside world than to the inside.

I hope to see more research investigating this problem!

Oh – and George Costanza gets to have the last word…

Update April 3rd, 2010

I’ve just become aware of a few excellent blog posts that discuss problems related to buzz, drawing on ideas very similar to what I present above. Fred Stutzman writes eloquently about buzz and colliding worlds inspired by Erving Goffman here. That post sparked additional ‘world-colliding’ thoughts from David Truss (via this post from George Siemens).

References

High Throughput Humanities

Note: This post was originally posted on the Complexity and Social Networks Blog.

Along with Riley Crane (of Darpa Challenge and Colbert Report fame), physicist Gourab Ghoshal, and quantitatively minded art historian Max Schich, I’m putting together a workshop on High Throughput Humanities as a satellite meeting at this years European Conference on Complex Systems in Lisbon this September. The general idea is to put together people who ask interesting questions of massive data sets. More specifically – as the title implies – we want to figure out how to use computers to do research in the humanities in a way extends beyond what can currently be accomplished by human beings.

Entire libraries are in the process of being scanned and we would like to begin to investigate questions like: Are there patterns in history that are currently ‘invisible’ due to the fact that humans have limited bandwidth – that we can only read small fraction of all books in a lifetime?

We have an exciting program committee so it should be an interesting day!

Confirmed Programme Committee Members

  • Albert-László Barabási, CCNR Northeastern University, USA.
  • Guido Caldarelli, INFM-CNR Rome, Italy.
  • Gregory Crane, Tufts University, USA.
  • Lars Kai Hansen, Technical University of Denmark.
  • Bernardo Huberman, HP Laboratories, USA.
  • Martin Kemp, Trinity College, Oxford, UK.
  • Roger Malina, Leonardo/ISAST, France.
  • Franco Moretti, Stanford University, USA.
  • Didier Sornette, ETH Zurich, Switzerland.

Practical information can be found at the conference website. Oh, and did I mention that Lisbon is beautiful in September! Sign up an join us. The workshop abstract is reprinted below.

Abstract

The High Throughput Humanities satellite event at ECCS’10 establishes a forum for high throughput approaches in the humanities and social sciences, within the framework of complex systems science. The symposium aims to go beyond massive data aquisition and to present results beyond what can be manually achieved by a single person or a small group. Bringing together scientists, researchers, and practitioners from relevant fields, the event will stimulate and facilitate discussion, spark collaboration, as well as connect approaches, methods, and ideas.

The main goal of the event is to present novel results based on analyses of Big Data (see NATURE special issue 2009), focusing on emergent complex properties and dynamics, which allow for new insights, applications, and services.

With the advent of the 21st century, increasing amounts of data from the domain of qualitative humanities and social science research have become available for quantitative analysis. Private enterprises (Google Books and Earth, Youtube, Flickr, Twitter, Freebase, IMDb, among others) as well as public and non-profit institutions (Europeana, Wikipedia, DBPedia, Project Gutenberg, WordNet, Perseus, etc) are in the process of collecting, digitizing, and structuring vast amounts of information, and creating technologies, applications, and services (Linked Open Data, Open Calais, Amazon’s Mechanical Turk, ReCaptcha, ManyEyes, etc), which are transforming the way we do research.

Utilizing a complex systems approach to harness these data, the contributors of this event aim to make headway into the territory of traditional humanities and social sciences, understanding history, arts, literature, and society on a global-, meso- and granular level, using computational methods to go beyond the limitations of the traditional researcher.