This is the page that accompanies the my Network Factory Lecture, due the afternoon of June 11th 2013.
Pre-workshop work: Installing Python & relevant modules
Before we can even get started with interacting with the web, we’ll need to get your system in shape.
The first task is to get Python up and running on your system. Important: We use Python 2.7, so don’t install python 3.X – it will only cause you pain and suffering. Also – on Windows make sure you install 32 bit Python, even if you’re on a 64 bit system (since NumPy only works with 32 bit).
- If you’re on linux, you’re in luck. Just use apt-get or similar to install what you need.
- If you’re on Windows or Mac, a good option is Enthought’s free distribution. This one includes numpy, scipy, and matplotlib.
- [advanced option] On Mac, you can also consider SciPy Superpac, MacPorts or Homebrew. (I personally love MacPorts, but I think that’s now an outdated option).
- [advanced option] On windows you can also install standard Python and then required packages via easy_install & pip. pip is preferred/
The second task is to install the twitter module which interacts with the twitter RESTful API in a way that is intuitive and easy to use (see https://github.com/sixohsix/twitter/ for more details). It’s a wrapper for the Twitter API that mimics the public API semantics almost one-to-one. Like most other Python packages, you install it with pip/easy_install by typing pip install twitter in a terminal.
Thirdly, you also need NetworkX – that also works with pip/easy_install.
Finally, we’ll be using Python heavily in the lecture (you need some kind of programming skill to access data from the web). To check your skill level, I recommend working through the following exercises
- Create a list a that contains the numbers from 1 to 990, incremented by one, using the range function.
- Show that you understand slicing in Python by extracting a list b with the numbers from 42 to 79 from the list created above.
- Using def, define a function that takes as input a number x and outputs the number multiplied by itself plus eight f(x) = x(x+8). Apply this function to every element of the list b using a for loop.
- Write the output of your function to a text file with one number per line.
- Show that you know about strings typing and understanding everything in the example in http://learnpythonthehardway.org/book/ex6.html If you feel this is too complex, try completing exercises 0-5 first.
- Learn about JSON by reading the wikipedia page. Why is json superior to xml? (… or why not?)
- Use the json module (http://docs.python.org/library/json.html). First use urllib2 (http://docs.python.org/howto/urllib2.html) to download this file, then load the json as a python object and use pprint to make it look good when written to the terminal.
These should be easy, and if they’re not, work through the tutorial/book/course
which will give you a better chance at following all the cool stuff we’ll be doing in Python.
0) If you don’t have one, open a Twitter account (and follow @suneman for great updates!).
1) Create a Twitter application and access the Twitter API from Python. Go to https://dev.twitter.com/apps and create one (you’ll also need the ordinary Twitter account to get all this started). In order to get OAuth working, you’ll need the following:
- Consumer key
- Consumer secret
- Access token
- Access token secret
When you create the app, you have to fill in some basic info, such as app name and purpose. I urge you to be creative here 🙂 The various keys, tokens, and secrets will be displayed once you finish creating the app. Write them down in a file somewhere where you can find them again.
Now you’re ready to log on to Twitter using Python. Try adding something like the following to a .py file and run using IPython
import twitter import json import networkx as nx CONSUMER_KEY = '[your consumer key]' CONSUMER_SECRET = '[your consumer secret]' OAUTH_TOKEN = '[your token]' OAUTH_TOKEN_SECRET = '[your token secret]' auth = twitter.oauth.OAuth(OAUTH_TOKEN, OAUTH_TOKEN_SECRET, CONSUMER_KEY, CONSUMER_SECRET) twitter_api = twitter.Twitter(domain='api.twitter.com', api_version='1.1', auth=auth)
That should get you logged in: Now you’re ready to interact with the Twitter API.
2). Taking the API for a spin: get all tweets from a specific person. You can figure out how to do all kinds of stuff from Python, for example it’s easy get all of my (@suneman) tweets. Just try running the line
Feel free to substitute any other screen name.
3) Get all tweets corresponding to a hashtag – e.g. #netsci13. (Pro tip: There are many more results if you just look for the string “netsci13”.) To get the first 100 results, simply type
If you want more than 100 tweets, you’ll have to iterate through pages of results. Here you’re just going to have to have faith in the process, try something like
q = '#netsci13' count = 100 search_results = twitter_api.search.tweets(q=q, count=count) statuses = search_results['statuses'] # Iterate through 6 more batches of results for dummy in range(5): print "Status length", len(statuses) try: next_results = \ search_results['search_metadata']['next_results'] except KeyError, e: break kwargs = dict([ kv.split('=') for kv \ in next_results[1:].split("&") ]) kwargs[unicode("q")] = unicode("%23netsci13") search_results = twitter_api.search.tweets(**kwargs) statuses += search_results['statuses']
After that bit of magic, we can now access each tweet, by looking at entries in statuses, e.g. tweet 11 is simply
print json.dumps(statuses, indent=1)
4) Parse the statuses list to create a mention network. We loop through statuses, recording the network structure in a NetworkX DiGraph structure. One way to do this is:
G = nx.DiGraph() for status in statuses: for mention in status['entities']['user_mentions']: if G.has_edge(status['user']['screen_name'], mention['screen_name']): G[status['user']['screen_name']][mention['screen_name']]['weight']+=1 else: G.add_edge(status['user']['screen_name'], mention['screen_name'], weight=1)
And now you have a nice graph! Keep this one for analysis (see exercise 7).
5) Export an edgelist of the mention network. Ok – if you did the pre-class exercises, you’ll know how do this one. No help – but come and talk to us if you’re lost.
7) Analyze the network using the tools you’ve acquired in the workshop so far (e.g. from “Network Literacy” and “Advanced Network Theory”) and answer the following question: Who was the most important network scientist at NetSci this year? (Hint: Best student award (for this class) goes to the first student who can find a metric where the answer is @suneman).
Can you tell anything else interesting from the graph.
8) Bonus exercise. Create the retweet network and work through execise 5) – 7). What are the main differences between the mention network and the retweet network?
Note: I learned how to do all this using the excellent Mining the Social Web book from O’Reilly. If you want to learn much more about Twitter, Facebook, LinkedIn, Google+, etc, I highly recommend that you buy the book!