Detecting and preventing tampering

We can use steganography to assure that our message isn't tampered with. If we can't find our digital watermark properly encoded, we know that our picture was touched. This is one way to detect tampering. A more robust technique to detect tampering is to use hash totals. There are a number of hash algorithms used to produce a summary or signature of a sequence of bytes. We send both the message and the hash code separately. If the received message doesn't match the hash code, we know something went wrong. One common use case for hashes is to confirm a proper download of a file. After downloading a file, we should compare the hash of the file we got with a separately published hash value; if the hash values don't match, something's wrong with the file. We can delete it before opening it.

While it seems like encryption would prevent tampering, it requires careful management of the encryption keys. Encryption is no panacea. It's possible to employ a good encryption algorithm but lose control of the keys, rendering the encryption useless. Someone with unauthorized access to the key can rewrite the file and no one would know.

Using hash totals to validate a file

Python has a number of hash algorithms available in the hashlib module. Software downloads are often provided with MD5 hashes of the software package. We can compute an MD5 digest of a file using hashlib, as shown in the following code:

import hashlib
md5 = hashlib.new("md5")
with open( "LHD_warship.jpg", "rb" ) as some_file:
    md5.update( some_file.read() )
print( md5.hexdigest() )

We've created an MD5 digest object using the hashlib.new() function; we named the algorithm to be used. We opened the file in bytes mode. We provided the entire file to the digest object's update() method. For really large files, we might want to read the file in blocks rather than read the entire file into memory in one swoop. Finally, we printed the hex version of the digest.

This will provide a hexadecimal string version of the MD5 digest, as follows:

0032e5b0d9dd6e3a878a611b49807d24

Having this secure hash allows us to confirm that the file has not been tampered with in its journey through the Internet from sender to receiver.

Using a key with a digest

We can provide considerably more security by adding a key to a message digest. This doesn't encrypt the message; it encrypts the digest to be sure that the digest is not touched during transmission.

The hmac module in the Python standard library handles this for us, as shown in the following code:

import hmac
with open( "LHD_warship.jpg", "rb" ) as some_file:
    keyed= hmac.new( b"Agent Garbo", some_file.read() )
print( keyed.hexdigest() )

In this example, we've created an HMAC digest object and also passed the message content to that digest object. The hmac.new() function can accept both the key (as a string of bytes) and the body of a message.

The hex digest from this HMAC digest object includes both the original message and a key we provided. Here's the output:

42212d077cc5232f3f2da007d35a726c

As HQ knows our key, they can confirm that a message comes from us.

Similarly, HQ must use our key when sending us a message. We can then use our key when we read the message to confirm that it was sent to us by HQ.