Table of Contents for
Python Web Penetration Testing Cookbook

Version ebook / Retour

Cover image for bash Cookbook, 2nd Edition Python Web Penetration Testing Cookbook by Dave Mound Published by Packt Publishing, 2015
  1. Cover
  2. Table of Contents
  3. Python Web Penetration Testing Cookbook
  4. Python Web Penetration Testing Cookbook
  5. Credits
  6. About the Authors
  7. About the Reviewers
  8. www.PacktPub.com
  9. Disclamer
  10. Preface
  11. What you need for this book
  12. Who this book is for
  13. Sections
  14. Conventions
  15. Reader feedback
  16. Customer support
  17. 1. Gathering Open Source Intelligence
  18. Gathering information using the Shodan API
  19. Scripting a Google+ API search
  20. Downloading profile pictures using the Google+ API
  21. Harvesting additional results from the Google+ API using pagination
  22. Getting screenshots of websites with QtWebKit
  23. Screenshots based on a port list
  24. Spidering websites
  25. 2. Enumeration
  26. Performing a ping sweep with Scapy
  27. Scanning with Scapy
  28. Checking username validity
  29. Brute forcing usernames
  30. Enumerating files
  31. Brute forcing passwords
  32. Generating e-mail addresses from names
  33. Finding e-mail addresses from web pages
  34. Finding comments in source code
  35. 3. Vulnerability Identification
  36. Automated URL-based Directory Traversal
  37. Automated URL-based Cross-site scripting
  38. Automated parameter-based Cross-site scripting
  39. Automated fuzzing
  40. jQuery checking
  41. Header-based Cross-site scripting
  42. Shellshock checking
  43. 4. SQL Injection
  44. Checking jitter
  45. Identifying URL-based SQLi
  46. Exploiting Boolean SQLi
  47. Exploiting Blind SQL Injection
  48. Encoding payloads
  49. 5. Web Header Manipulation
  50. Testing HTTP methods
  51. Fingerprinting servers through HTTP headers
  52. Testing for insecure headers
  53. Brute forcing login through the Authorization header
  54. Testing for clickjacking vulnerabilities
  55. Identifying alternative sites by spoofing user agents
  56. Testing for insecure cookie flags
  57. Session fixation through a cookie injection
  58. 6. Image Analysis and Manipulation
  59. Hiding a message using LSB steganography
  60. Extracting messages hidden in LSB
  61. Hiding text in images
  62. Extracting text from images
  63. Enabling command and control using steganography
  64. 7. Encryption and Encoding
  65. Generating an MD5 hash
  66. Generating an SHA 1/128/256 hash
  67. Implementing SHA and MD5 hashes together
  68. Implementing SHA in a real-world scenario
  69. Generating a Bcrypt hash
  70. Cracking an MD5 hash
  71. Encoding with Base64
  72. Encoding with ROT13
  73. Cracking a substitution cipher
  74. Cracking the Atbash cipher
  75. Attacking one-time pad reuse
  76. Predicting a linear congruential generator
  77. Identifying hashes
  78. 8. Payloads and Shells
  79. Extracting data through HTTP requests
  80. Creating an HTTP C2
  81. Creating an FTP C2
  82. Creating an Twitter C2
  83. Creating a simple Netcat shell
  84. 9. Reporting
  85. Converting Nmap XML to CSV
  86. Extracting links from a URL to Maltego
  87. Extracting e-mails to Maltego
  88. Parsing Sslscan into CSV
  89. Generating graphs using plot.ly
  90. Index

Spidering websites

Many tools provide the ability to map out websites, but often you are limited to style of output or the location in which the results are provided. This base plate for a spidering script allows you to map out websites in short order with the ability to alter them as you please.

Getting ready

In order for this script to work, you'll need the BeautifulSoup library, which is installable from the apt command with apt-get install python-bs4 or alternatively pip install beautifulsoup4. It's as easy as that.

How to do it…

This is the script that we will be using:

import urllib2 
from bs4 import BeautifulSoup
import sys
urls = []
urls2 = []

tarurl = sys.argv[1] 

url = urllib2.urlopen(tarurl).read()
soup = BeautifulSoup(url)
for line in soup.find_all('a'):
    newline = line.get('href')
    try: 
        if newline[:4] == "http": 
            if tarurl in newline: 
            urls.append(str(newline)) 
        elif newline[:1] == "/": 
            combline = tarurl+newline urls.append(str(combline)) except: 
               pass

    for uurl in urls: 
        url = urllib2.urlopen(uurl).read() 
        soup = BeautifulSoup(url) 
        for line in soup.find_all('a'): 
            newline = line.get('href') 
            try: 
                if newline[:4] == "http": 
                    if tarurl in newline:
                        urls2.append(str(newline)) 
                elif newline[:1] == "/": 
                    combline = tarurl+newline 
                    urls2.append(str(combline)) 
                    except: 
                pass 
            urls3 = set(urls2) 
    for value in urls3: 
    print value

How it works…

We first import the necessary libraries and create two empty lists called urls and urls2. These will allow us to run through the spidering process twice. Next, we set up input to be added as an addendum to the script to be run from the command line. It will be run like:

$ python spider.py http://www.packtpub.com

We then open the provided url variable and pass it to the beautifulsoup tool:

url = urllib2.urlopen(tarurl).read() 
soup = BeautifulSoup(url) 

The beautifulsoup tool splits the content into parts and allows us to only pull the parts that we want to:

for line in soup.find_all('a'): 
newline = line.get('href') 

We then pull all of the content that is marked as a tag in HTML and grab the element within the tag specified as href. This allows us to grab all the URLs listed in the page.

The next section handles relative and absolute links. If a link is relative, it starts with a slash to indicate that it is a page hosted locally to the web server. If a link is absolute, it contains the full address including the domain. What we do with the following code is ensure that we can, as external users, open all the links we find and list them as absolute links:

if newline[:4] == "http": 
if tarurl in newline: 
urls.append(str(newline)) 
  elif newline[:1] == "/": 
combline = tarurl+newline urls.append(str(combline))

We then repeat the process once more with the urls list that we identified from that page by iterating through each element in the original url list:

for uurl in urls:

Other than a change in the referenced lists and variables, the code remains the same.

We combine the two lists and finally, for ease of output, we take the full list of the urls list and turn it into a set. This removes duplicates from the list and allows us to output it neatly. We iterate through the values in the set and output them one by one.

There's more…

This tool can be tied in with any of the functionality shown earlier and later in this book. It can be tied to Getting Screenshots of a website with QtWeb Kit to allow you to take screenshots of every page. You can tie it to the email address finder in the Chapter 2, Enumeration, to gain email addresses from every page, or you can find another use for this simple technique to map web pages.

The script can be easily changed to add in levels of depth to go from the current level of 2 links deep to any value set by system argument. The output can be changed to add in URLs present on each page, or to turn it into a CSV to allow you to map vulnerabilities to pages for easy notation.