In this section we will discuss the process of accessing JSON information as served via Web services. We will do so in the particular case of the Twitter API, and more specifically we will look at tweets that mention the "user" HanoverCollege
. The steps are roughly the following:
You need to store these pieces of information in a location that can be accessed by your scripts, but that is otherwise not visible to the world. A good way to do that is to create a JSON file in your work directory, that your scripts can access. We will call ours keys.json
, and this is how it would look:
{
"twitter": {
"key": "Your key here",
"secret": "Your secret here"
}
}
From your script, you need to first read your access information and then authenticate yourself with the application in question:
#!/usr/bin/python3
# Loading libraries needed for authentication and requests
from requests_oauthlib import OAuth2Session
from oauthlib.oauth2 import BackendApplicationClient
import json
# In order to use this script, you must:
# - Have a Twitter account and create an app
# - Store in keys.json a property called "twitter" whose value is an
# object with two keys, "key" and "secret"
with open('keys.json', 'r') as f:
keys = json.loads(f.read())['twitter']
twitter_key = keys['key']
twitter_secret = keys['secret']
# We authenticate ourselves with the above credentials
# We will demystify this process later
client = BackendApplicationClient(client_id=twitter_key)
oauth = OAuth2Session(client=client)
token = oauth.fetch_token(token_url='https://api.twitter.com/oauth2/token',
client_id=twitter_key,
client_secret=twitter_secret)
# At this point the "oauth" object can be used to make requests
At the end of the above lines, the "oauth" object represents an authenticated client that you can use to make further requests. Since many of the requests will have a URL in common, you should create some variables for that purpose:
# Base url needed for all subsequent queries
base_url = 'https://api.twitter.com/1.1/'
# Particular page requested. The specific query string will be appended to that.
page = 'search/tweets.json'
# Depending on the query we are interested in, we append the necessary string
# As you read through the twitter API, you'll find more possibilities
req_url = base_url + page + '?q=%40HanoverCollege&tweet_mode=extended'
We then perform a request, and process the result as JSON:
# We perform a request. Contains standard HTTP information
response = oauth.get(req_url)
# Read the query results
results = json.loads(response.content.decode('utf-8'))
Typically the "results" dictionary will contain some metadata that tells you information about the request, as well as the actual results. This is a standard Python dictionary that you can examine. It is very useful at this point to be using Python interactively.
You now have a dictionary that contains the important information you want to access. Let us look at its keys:
>>> results.keys()
dict_keys(['search_metadata', 'statuses'])
The search_metadata
entry contains information about the search performed. It sounds like statuses
contains what we need.
>>> type(results['statuses'])
<class 'list'>
Ok so that is a list. We could access the first element and take a look at it:
type(results['statuses'][0]) # It's a dictionary
results['statuses'][0].keys() # Its keys
Ok that is a dictionary, and we can look at its keys, and it seems there are a number of possibilities there. For example we can access the tweet's text via:
results['statuses'][0]['full_text']
Or we can see if the currently authorized user has retweeted or favorited it:
results['statuses'][0]['retweeted']
results['statuses'][0]['favorited']
The possibilities are endless. Most of the times you would want to iterate over the statuses list:
for status in results['statuses']:
dosomething(status)
Or you can operate on the list of statuses via map
and filter
(see the docs). Or even better, use list comprehensions:
texts = [tweet['text'] for tweet in results['statuses']]
many_mentions = [tweet['text'] for tweet in results['statuses']
if len(tweet['entities']['user_mentions']) > 2]