Table of Contents for
Python Web Penetration Testing Cookbook

Version ebook / Retour

Cover image for bash Cookbook, 2nd Edition Python Web Penetration Testing Cookbook by Dave Mound Published by Packt Publishing, 2015
  1. Cover
  2. Table of Contents
  3. Python Web Penetration Testing Cookbook
  4. Python Web Penetration Testing Cookbook
  5. Credits
  6. About the Authors
  7. About the Reviewers
  8. www.PacktPub.com
  9. Disclamer
  10. Preface
  11. What you need for this book
  12. Who this book is for
  13. Sections
  14. Conventions
  15. Reader feedback
  16. Customer support
  17. 1. Gathering Open Source Intelligence
  18. Gathering information using the Shodan API
  19. Scripting a Google+ API search
  20. Downloading profile pictures using the Google+ API
  21. Harvesting additional results from the Google+ API using pagination
  22. Getting screenshots of websites with QtWebKit
  23. Screenshots based on a port list
  24. Spidering websites
  25. 2. Enumeration
  26. Performing a ping sweep with Scapy
  27. Scanning with Scapy
  28. Checking username validity
  29. Brute forcing usernames
  30. Enumerating files
  31. Brute forcing passwords
  32. Generating e-mail addresses from names
  33. Finding e-mail addresses from web pages
  34. Finding comments in source code
  35. 3. Vulnerability Identification
  36. Automated URL-based Directory Traversal
  37. Automated URL-based Cross-site scripting
  38. Automated parameter-based Cross-site scripting
  39. Automated fuzzing
  40. jQuery checking
  41. Header-based Cross-site scripting
  42. Shellshock checking
  43. 4. SQL Injection
  44. Checking jitter
  45. Identifying URL-based SQLi
  46. Exploiting Boolean SQLi
  47. Exploiting Blind SQL Injection
  48. Encoding payloads
  49. 5. Web Header Manipulation
  50. Testing HTTP methods
  51. Fingerprinting servers through HTTP headers
  52. Testing for insecure headers
  53. Brute forcing login through the Authorization header
  54. Testing for clickjacking vulnerabilities
  55. Identifying alternative sites by spoofing user agents
  56. Testing for insecure cookie flags
  57. Session fixation through a cookie injection
  58. 6. Image Analysis and Manipulation
  59. Hiding a message using LSB steganography
  60. Extracting messages hidden in LSB
  61. Hiding text in images
  62. Extracting text from images
  63. Enabling command and control using steganography
  64. 7. Encryption and Encoding
  65. Generating an MD5 hash
  66. Generating an SHA 1/128/256 hash
  67. Implementing SHA and MD5 hashes together
  68. Implementing SHA in a real-world scenario
  69. Generating a Bcrypt hash
  70. Cracking an MD5 hash
  71. Encoding with Base64
  72. Encoding with ROT13
  73. Cracking a substitution cipher
  74. Cracking the Atbash cipher
  75. Attacking one-time pad reuse
  76. Predicting a linear congruential generator
  77. Identifying hashes
  78. 8. Payloads and Shells
  79. Extracting data through HTTP requests
  80. Creating an HTTP C2
  81. Creating an FTP C2
  82. Creating an Twitter C2
  83. Creating a simple Netcat shell
  84. 9. Reporting
  85. Converting Nmap XML to CSV
  86. Extracting links from a URL to Maltego
  87. Extracting e-mails to Maltego
  88. Parsing Sslscan into CSV
  89. Generating graphs using plot.ly
  90. Index

Finding comments in source code

A common security issue is caused by good programming practices. During the development phase of web applications, developers will comment their code. This is very useful during this phase, as it helps with understanding the code and will serve as useful reminders for various reasons. However, when the web application is ready to be deployed in a production environment, it is best practice to remove all these comments as they may prove useful to an attacker.

This recipe will use a combination of Requests and BeautifulSoup in order to search a URL for comments, as well as searching for links on the page and searching those subsequent URLs for comments as well. The technique of following links from a page and analysing those URLs is known as spidering.

How to do it…

The following script will scrape a URL for comments and links in the source code. It will then also perform limited spidering and search linked URLs for comments:

import requests
import re

from bs4 import BeautifulSoup
import sys

if len(sys.argv) !=2:
    print "usage: %s targeturl" % (sys.argv[0])
    sys.exit(0)

urls = []

tarurl = sys.argv[1]
url = requests.get(tarurl)
comments = re.findall('<!--(.*)-->',url.text)
print "Comments on page: "+tarurl
for comment in comments:
    print comment

soup = BeautifulSoup(url.text)
for line in soup.find_all('a'):
    newline = line.get('href')
    try:
        if newline[:4] == "http":
            if tarurl in newline:
                urls.append(str(newline))
        elif newline[:1] == "/":
            combline = tarurl+newline
            urls.append(str(combline))
    except:
        pass
        print "failed"
for uurl in urls:
    print "Comments on page: "+uurl
    url = requests.get(uurl)
    comments = re.findall('<!--(.*)-->',url.text)
    for comment in comments:
        print comment

How it works…

After the initial import of the necessary modules and setting up of variables, the script first gets the source code of the target URL.

You may have noticed that for Beautifulsoup, we have the following line:

from bs4 import BeautifulSoup

This is so that when we use BeautifulSoup, we just have to type BeautifulSoup instead of bs4.BeautifulSoup.

It then searches for all instances of HTML comments and prints them out:

url = requests.get(tarurl)
comments = re.findall('<!--(.*)-->',url.text)
print "Comments on page: "+tarurl
for comment in comments:
    print comment

The script will then use Beautifulsoup in order to scrape the source code for any instances of absolute (starting with http) and relative (starting with /) links:

if newline[:4] == "http":
            if tarurl in newline:
                urls.append(str(newline))
        elif newline[:1] == "/":
            combline = tarurl+newline
            urls.append(str(combline))

Once the script has collated a list of URLs linked to from the page, it will then search each page for HTML comments.

There's more…

This recipe shows a basic example of comment scraping and spidering. It is possible to add more intelligence to this recipe to suit your needs. For instance, you may want to account for relative links that use start with . or .. to denote the current and parent directories.

You can also add more control to the spidering part. You could extract the domain from the supplied target URL and create a filter that does not scrape links for domains external to the target. This is especially useful for professional engagements where you need to adhere to a scope of targets.