Using a REST API in Python

A great deal of intelligence data is available through REST APIs. Much of the data is available in simple JSON, CSV, or XML documents. In order to make sense of this data, we need to be able to parse these various kinds of serialization formats. We'll focus on JSON because it's widely used. Sadly, it's not universal.

A REST protocol is essentially HTTP. It will leverage POST, GET, PUT, and DELETE requests to implement the essential four stages in the life of persistent data: Create, Retrieve, Update, and Delete (CRUD) rules.

We'll look at currency conversion as a simple web API. This can both help us bribe our information sources as well as provide important information on the overall state of a nation's economy. We can measure national economies against each other as well as measure them against non-national crypto currencies such as bitcoins.

We'll get exchange and currency information from http://www.coinbase.com. There are a lot of similar services; this one seems reasonably complete. They seem to have up-to-date currency information that we can report to HQ as part of an overall intelligence assessment.

Their API documentation is available at https://coinbase.com/api/doc. This tells us what URLs to use, what data to provide with the URL, and what kind of response to expect.

Getting simple REST data

We can get the currency exchange data either with the http.client or urllib.request module. This won't be new to us; we already grabbed data using both libraries. The responses from this website will be in the JSON notation. For more information, see http://www.json.org/.

To parse a JSON document, we'll need to import the json module from the standard library. The response that we get from urllib is a sequence of bytes. We'll need to decode these bytes to get a string. We can then use the json.loads() function to build Python objects from that string. Here's how it looks:

import urllib.request
import json
query_currencies= "http://www.coinbase.com/api/v1/currencies/"
with urllib.request.urlopen( query_currencies ) as document:
    print(document.info().items())
    currencies= json.loads( document.read().decode("utf-8") )
    print(currencies)

We imported the two libraries that we need: urllib.request to get the data and json to parse the response.

The currency query (/api/v1/currencies/) is described in the API documentation on the Coinbase website. When we make this request, the resulting document will have all of the currencies they know about.

We printed document.info().items(); this is the collection of headers that came back with the response. Sometimes, these are interesting. In this case, they don't tell us too much that we don't already know. What's important is that the Content-Type header has a application/json; charset=utf-8 value . This tells us how to decode the bytes.

We read the resulting document (document.read()) and then converted the bytes to characters. The Content-Type header says that the characters were encoded using utf-8, so we'll use utf-8 to decode the bytes and recover the original sequence of characters. Once we have the characters, we can use json.loads() to create a Python object from the characters.

This will get us a list of currencies we can work with. The response object looks like this:

[['Afghan Afghani (AFN)', 'AFN'], ['Albanian Lek (ALL)', 'ALL'], 
['Algerian Dinar (DZD)', 'DZD'], … ]

It is a list of lists that provides the names of 161 currencies.

In the next section, we'll look at ways to work with a list-of-tuple structure. Working with a list of list is going to be very similar to working with a list of tuple.

To make this more flexible, we need to turn the header items() list into a dictionary. From this, we can get the Content-Type value string from the dictionary. This string can be partitioned on ; to locate the charset=utf-8 substring. This string can subsequently be partitioned on the = character to locate the utf-8 encoding information. This would be slightly better than assuming a utf-8 encoding. The first step, creating a dictionary from the headers, has to wait until the Organizing collections of data section. First, we'll look at getting other information using the REST protocol.

Using more complex RESTful queries

Once we have a list of currencies, we can request spot conversion rates. This involves a somewhat more complex URL. We need to provide a currency code to get the current bitcoin exchange rate for that specific currency.

While it's not perfectly clear from the API documentation, the RFCs for the web state that we should encode the query string as part of our processing. In this specific situation, it doesn't seem possible for the query string to contain any characters that require encoding.

We're going to be fussy though and encode the query string properly using the urllib module. Encoding will be essential for a number of examples in Chapter 4, Drops, Hideouts, Meetups, and Lairs.

Query string encoding is done using the urllib.parse module. It looks like this:

    scheme_netloc_path= "https://coinbase.com/api/v1/prices/spot_rate"
    form= {"currency": currency}
    query= urllib.parse.urlencode(form)

The scheme_netloc_path variable has a portion of the URL. It has the scheme (http), network location (coinbase.com), and path (api/v1/prices/spot_rate). This fragment of the URL doesn't have the query string; we'll encode this separately because it had dynamic information that changes from request to request.

Technically, a query string is a bunch of parameters that have been encoded so that certain reserved characters such as ? and # don't cause any confusion to the web server. Pragmatically, the query string used here is very simple with only a single parameter.

To handle query strings in a general-purpose way, we defined an HTML form using a dictionary and assigned it to the form variable. This dictionary is a model of a form on an HTML web page with a single input field. We modeled an input field with a name, currency, that has an EUR value.

The urllib.parse.urlencode() function encodes all the fields of the form into a tidy representation with any reserved characters handled properly. In this case, there's only one field, and no reserved characters are used by the field name or the field value.

We can play with this in interactive Python:

>>> import urllib.parse
>>> form= {"currency": "EUR"}
>>> urllib.parse.urlencode(form)
'currency=EUR'

The preceding code shows how we built a form object as a dictionary and then encoded it to create a valid URL-encoded query string. As the data was so simple, the encoding is quite simple.

Here's an example with a more complex piece of data in the form:

>>> form['currency']= "Something with # or ?"
>>> urllib.parse.urlencode(form)
'currency=Something+with+%23+or+%3F'

First, we updated the form with different input; we changed the currency value to Something with # or ?. We'll look at dictionary updates in the next section. The updated value has reserved characters in it. When we encoded this form, the result shows how reserved characters are handled by URL encoding.

As we start working with more complex structures, we'll find that the built-in print() function isn't going to do everything we need. In the pprint module, the pprint() function does a much nicer job working with complex data. We can use this to get the pretty-print function:

import pprint

We can use our query template and the encoded data like this:

with urllib.request.urlopen( scheme_netloc_path+"?"+query ) as document:
    pprint.pprint( document.info().items() )
    spot_rate= json.loads( document.read().decode("utf-8") )

The expression, scheme_netloc_path+"?"+query, assembled the complete URL from, the relatively static portions, and the dynamic query string. We've used a with statement to be sure that all of the network resources are properly released when we're done. We used the pprint() function to show the headers, which tell us the content type. The headers also include three cookies, which we're studiously ignoring for these examples.

When we print the spot_rate value, we see that the Python object looks like this:

{'currency': 'USD', 'amount': '496.02'}
Or this
{'currency': 'EUR', 'amount': '361.56'}

These are Python dictionary objects. We'll need to learn more about dictionaries to be able to work with these responses. Stay tuned for the Using a Python dictionary mapping section.

Saving our data via JSON

What if we want to save the data we downloaded? This is something in which JSON excels. We can use the JSON module to serialize objects into a string and write this string to a file.

Here's how we can save our two-spot currency rate pieces of data into a JSON document. First, we need to turn our spot_rate example from the Getting more RESTful data section into a function. Here's how it might look:

def get_spot_rate( currency ):
    scheme_netloc_path= "https://coinbase.com/api/v1/prices/spot_rate"
    form= {"currency":currency}
    query= urllib.parse.urlencode(form)

    with urllib.request.urlopen( scheme_netloc_path+"?"+query ) as document:
        spot_rate= json.loads( document.read().decode("utf-8") )
    return spot_rate

This function requires the currency code as an argument. Given the currency code, it creates a tiny input form and encodes this to create the query string. In this case, we saved that string in the query variable.

We created the URL from a template and the data. This URL was used as a request to get a currency spot rate. We read the entire response and decoded the string from bytes. Once we had the string, we loaded a Python dictionary object using this string. We returned this dictionary using the get_spot_rate() function. We can now use this function to get some spot-rate dictionary objects:

rates = [
    get_spot_rate("USD"), get_spot_rate("GBP"),
    get_spot_rate("EUR") ]

This statement built a list-of-dictionary structure from our three spot-rate dictionaries. It assigned the collection to the rates variable. Once we have this, we can serialize it and create a file that has some useful exchange-rate information.

Here's how we use JSON to save a Python object to a file:

with open("rate.json","w") as save:
    json.dump( rates, save )

We opened a file to write something and used this as a processing context to be assured that the file will be properly closed when we're done. We then used the json.dump() function to dump our rates object to this file.

What's important about this is that JSON works most simply when we encode one object to a file. In this case, we built a list of individual objects and encoded that list into the file. As we can't easily perform any sort of partial or incremental encoding of objects into a JSON file, we built a list with everything in it. Except in cases of huge mountains of data, this technique of building and dumping a list works very nicely.

Previous Chapter

2. Acquiring Intelligence Data

Next Chapter

Organizing collections of data

Table of Contents for Python for Secret Agents

Using a REST API in Python

Getting simple REST data

Using more complex RESTful queries

Saving our data via JSON

Table of Contents for
Python for Secret Agents