Identifying alternative sites by spoofing user agents

Some websites restrict access or display different content-based on the browser or device you're using to view it. For example, a web site may show a mobile-oriented theme for users browsing from an iPhone or display a warning to users with an old and vulnerable version of Internet Explorer. This can be a good place to find vulnerabilities because these might have been tested less rigorously or even forgotten about by the developers.

How to do it…

In this recipe, we will show you how to spoof your user agent, so you appear to the website as if you're using a different device in an attempt to uncover alternative content:

import requests
import hashlib

user_agents = { 'Chrome on Windows 8.1' : 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.115 Safari/537.36',
'Safari on iOS' : 'Mozilla/5.0 (iPhone; CPU iPhone OS 8_1_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12B466 Safari/600.1.4',
'IE6 on Windows XP' : 'Mozilla/5.0 (Windows; U; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727)',
'Googlebot' : 'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)' }

responses = {}
for name, agent in user_agents.items():
  headers = {'User-Agent' : agent}
  req = requests.get('http://packtpub.com', headers=headers)
  responses[name] = req

md5s = {}
for name, response in responses.items():
  md5s[name] = hashlib.md5(response.text.encode('utf- 8')).hexdigest()

for name,md5 in md5s.iteritems():
    if name != 'Chrome on Windows 8.1':
        if md5 != md5s['Chrome on Windows 8.1']:
            print name, 'differs from baseline'
        else:
            print 'No alternative site found via User-Agent spoofing:', md5

How it works…

We first set up an array of user agents, with a friendly name assigned to each key:

user_agents = { 'Chrome on Windows 8.1' : 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.115 Safari/537.36',
'Safari on iOS' : 'Mozilla/5.0 (iPhone; CPU iPhone OS 8_1_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12B466 Safari/600.1.4',
'IE6 on Windows XP' : 'Mozilla/5.0 (Windows; U; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727)',
'Googlebot' : 'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)' }

There are four user agents here: Chrome on Windows 8.1, Safari on iOS, Internet Explorer 6 on Windows XP, and finally, the Googlebot. This gives a wide range of browsers and examples of which you would expect to find different content behind each request. The final user agent in the list, Googlebot, is the crawler that Google sends when spidering data for their search engine.

The next part loops through each of the user agents and sets the User-Agent header in the request:

responses = {}
for name, agent in user_agents.items():
  headers = {'User-Agent' : agent}

The next section sends the HTTP request, using the familiar requests library, and stores each response in the responses array, using the user friendly name as the key:

req = requests.get('http://www.google.com', headers=headers)
  responses[name] = req

The next part of the code creates an md5s array and then iterates through the responses, grabbing the response.text file. From this, it generates an md5 hash of the response content and stores it into the md5s array:

md5s = {}
for name, response in responses.items():
  md5s[name] = hashlib.md5(response.text.encode('utf- 8')).hexdigest()

The final part of the code iterates through the md5s array and compares each item to the original baseline request, in this recipe Chrome on Windows 8.1:

for name,md5 in md5s.iteritems():
    if name != 'Chrome on Windows 8.1':
        if md5 != md5s['Chrome on Windows 8.1']:
            print name, 'differs from baseline'
        else:
            print 'No alternative site found via User-Agent spoofing:', md5

We hashed the response text so that it keeps the resulting array small, thus reducing the memory footprint. You could compare each response directly by its content, but this would be slower and use more memory to process.

This script will print out the user agent friendly name if the response from the web server is different from the Chrome on Windows 8.1 baseline response, as seen in the following screenshot:

Table of Contents for
Python Web Penetration Testing Cookbook

Identifying alternative sites by spoofing user agents

How to do it…

How it works…

See also

Table of Contents for Python Web Penetration Testing Cookbook

Identifying alternative sites by spoofing user agents

How to do it…

How it works…

See also

Table of Contents for
Python Web Penetration Testing Cookbook