United Stats of Words

In early December, Alan Mislove (who’s spending his sabbatical here in Copenhagen) and I, got the Volvo and headed out to the Amager Campus of University of Copenhagen to pick up Anders Søgaard, a professor of linguistics, to work on a top secret research project.

The project itself is still classified, but one of the things we’re looking into is word-usage in geo-coded tweets across the globe (to begin with, just America). To do this, Alan has trawled through something like 65 billion tweets and extracted the ones with geotags (1-2% of all tweets) further grabbing the ones that are from the US (about a third of those), ending up with a set of around 450 million geotagged tweets.

We couldn’t help ourselves – this dataset was just too cool not to visualize. And because Alan is a wizard, you can try this out for yourself on http://twitter-research.ccs.neu.edu/language/index.html. Once this thing hit twitter, people found lots of fantastic examples, and I’ve included some of my personal favorites below

Try it out for yourself – but be warned – it’s pretty darn addictive.

States

Individual states show up very nicely

Typing in “mississippi” will show both the river and the state.

Even countries work nicely

And (full disclosure), the title for this post was inspired by this tweet, although I like my own little tweak.

Published by

Sune Lehmann

I’m an Associate Professor at the Department of Applied Mathematics and Computer Science, at the Technical University of Denmark.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s