Table of Contents for
Python: Penetration Testing for Developers

Version ebook / Retour

Cover image for bash Cookbook, 2nd Edition Python: Penetration Testing for Developers by Dave Mound Published by Packt Publishing, 2016
  1. Cover
  2. Table of Contents
  3. Python: Penetration Testing for Developers
  4. Python: Penetration Testing for Developers
  5. Python: Penetration Testing for Developers
  6. Credits
  7. Preface
  8. What you need for this learning path
  9. Who this learning path is for
  10. Reader feedback
  11. Customer support
  12. 1. Module 1
  13. 1. Understanding the Penetration Testing Methodology
  14. Understanding what penetration testing is not
  15. Assessment methodologies
  16. The penetration testing execution standard
  17. Penetration testing tools
  18. Summary
  19. 2. The Basics of Python Scripting
  20. Python – the good and the bad
  21. A Python interactive interpreter versus a script
  22. Environmental variables and PATH
  23. Understanding dynamically typed languages
  24. The first Python script
  25. Developing scripts and identifying errors
  26. Python formatting
  27. Python variables
  28. Operators
  29. Compound statements
  30. Functions
  31. The Python style guide
  32. Arguments and options
  33. Your first assessor script
  34. Summary
  35. 3. Identifying Targets with Nmap, Scapy, and Python
  36. Understanding Nmap
  37. Nmap libraries for Python
  38. The Scapy library for Python
  39. Summary
  40. 4. Executing Credential Attacks with Python
  41. Identifying the target
  42. Creating targeted usernames
  43. Testing for users using SMTP VRFY
  44. Summary
  45. 5. Exploiting Services with Python
  46. Understanding the chaining of exploits
  47. Automating the exploit train with Python
  48. Summary
  49. 6. Assessing Web Applications with Python
  50. Identifying hidden files and directories with Python
  51. Credential attacks with Burp Suite
  52. Using twill to walk through the source
  53. Understanding when to use Python for web assessments
  54. Summary
  55. 7. Cracking the Perimeter with Python
  56. Understanding the link between accounts and services
  57. Cracking inboxes with Burp Suite
  58. Identifying the attack path
  59. Gaining access through websites
  60. Summary
  61. 8. Exploit Development with Python, Metasploit, and Immunity
  62. Understanding the Windows memory structure
  63. Understanding memory addresses and endianness
  64. Understanding the manipulation of the stack
  65. Understanding immunity
  66. Understanding basic buffer overflow
  67. Writing a basic buffer overflow exploit
  68. Understanding stack adjustments
  69. Understanding the purpose of local exploits
  70. Understanding other exploit scripts
  71. Reversing Metasploit modules
  72. Understanding protection mechanisms
  73. Summary
  74. 9. Automating Reports and Tasks with Python
  75. Understanding how to create a Python class
  76. Summary
  77. 10. Adding Permanency to Python Tools
  78. Understanding the difference between multithreading and multiprocessing
  79. Building industry-standard tools
  80. Summary
  81. 2. Module 2
  82. 1. Python with Penetration Testing and Networking
  83. Approaches to pentesting
  84. Introducing Python scripting
  85. Understanding the tests and tools you'll need
  86. Learning the common testing platforms with Python
  87. Network sockets
  88. Server socket methods
  89. Client socket methods
  90. General socket methods
  91. Moving on to the practical
  92. Summary
  93. 2. Scanning Pentesting
  94. What are the services running on the target machine?
  95. Summary
  96. 3. Sniffing and Penetration Testing
  97. Implementing a network sniffer using Python
  98. Learning about packet crafting
  99. Introducing ARP spoofing and implementing it using Python
  100. Testing the security system using custom packet crafting and injection
  101. Summary
  102. 4. Wireless Pentesting
  103. Wireless attacks
  104. Summary
  105. 5. Foot Printing of a Web Server and a Web Application
  106. Introducing information gathering
  107. Information gathering of a website from SmartWhois by the parser BeautifulSoup
  108. Banner grabbing of a website
  109. Hardening of a web server
  110. Summary
  111. 6. Client-side and DDoS Attacks
  112. Tampering with the client-side parameter with Python
  113. Effects of parameter tampering on business
  114. Introducing DoS and DDoS
  115. Summary
  116. 7. Pentesting of SQLI and XSS
  117. Types of SQL injections
  118. Understanding the SQL injection attack by a Python script
  119. Learning about Cross-Site scripting
  120. Summary
  121. 3. Module 3
  122. 1. Gathering Open Source Intelligence
  123. Gathering information using the Shodan API
  124. Scripting a Google+ API search
  125. Downloading profile pictures using the Google+ API
  126. Harvesting additional results from the Google+ API using pagination
  127. Getting screenshots of websites with QtWebKit
  128. Screenshots based on a port list
  129. Spidering websites
  130. 2. Enumeration
  131. Performing a ping sweep with Scapy
  132. Scanning with Scapy
  133. Checking username validity
  134. Brute forcing usernames
  135. Enumerating files
  136. Brute forcing passwords
  137. Generating e-mail addresses from names
  138. Finding e-mail addresses from web pages
  139. Finding comments in source code
  140. 3. Vulnerability Identification
  141. Automated URL-based Directory Traversal
  142. Automated URL-based Cross-site scripting
  143. Automated parameter-based Cross-site scripting
  144. Automated fuzzing
  145. jQuery checking
  146. Header-based Cross-site scripting
  147. Shellshock checking
  148. 4. SQL Injection
  149. Checking jitter
  150. Identifying URL-based SQLi
  151. Exploiting Boolean SQLi
  152. Exploiting Blind SQL Injection
  153. Encoding payloads
  154. 5. Web Header Manipulation
  155. Testing HTTP methods
  156. Fingerprinting servers through HTTP headers
  157. Testing for insecure headers
  158. Brute forcing login through the Authorization header
  159. Testing for clickjacking vulnerabilities
  160. Identifying alternative sites by spoofing user agents
  161. Testing for insecure cookie flags
  162. Session fixation through a cookie injection
  163. 6. Image Analysis and Manipulation
  164. Hiding a message using LSB steganography
  165. Extracting messages hidden in LSB
  166. Hiding text in images
  167. Extracting text from images
  168. Enabling command and control using steganography
  169. 7. Encryption and Encoding
  170. Generating an MD5 hash
  171. Generating an SHA 1/128/256 hash
  172. Implementing SHA and MD5 hashes together
  173. Implementing SHA in a real-world scenario
  174. Generating a Bcrypt hash
  175. Cracking an MD5 hash
  176. Encoding with Base64
  177. Encoding with ROT13
  178. Cracking a substitution cipher
  179. Cracking the Atbash cipher
  180. Attacking one-time pad reuse
  181. Predicting a linear congruential generator
  182. Identifying hashes
  183. 8. Payloads and Shells
  184. Extracting data through HTTP requests
  185. Creating an HTTP C2
  186. Creating an FTP C2
  187. Creating an Twitter C2
  188. Creating a simple Netcat shell
  189. 9. Reporting
  190. Converting Nmap XML to CSV
  191. Extracting links from a URL to Maltego
  192. Extracting e-mails to Maltego
  193. Parsing Sslscan into CSV
  194. Generating graphs using plot.ly
  195. A. Bibliography
  196. Index

Chapter 9. Automating Reports and Tasks with Python

We covered in previous chapters a good amount of information that highlights where Python can help optimize technical fieldwork. We even showed methods in which Python can be used to automate follow-on tasks from one process to another. Each of these will help you better spend your time on priority tasks. This is important because there are three things that potentially limit the successful completion of a penetration test: the time an assessor has to complete the assessment, the limits of the scope of the penetration test, and the skill of the assessor. In this chapter, we are going to show you how to automate tasks such as parsing eXtensible Markup Language (XML) to generate reports from tool data.

Understanding how to parse XML files for reports

We are going to use nmap XMLs as an example to show how you can parse data into a useable format. Our end goal will be to place the data in a Python dictionary of unique results. We can then use that data to build structured outputs that we find useful. To begin, we need an XML file that can be parsed and reviewed. Run an nmap scan of your localhost with the nmap -oX test 127.0.0.1 command.

This will produce a file that highlights the two open ports using XML markup language, as shown here:

Understanding how to parse XML files for reports

With an actual XML file, we can review the components of the data structure. Understanding how an XML file is designed will better prepare you to generate the code that will read it. Specifically, the descriptions here are based on what the etree library classifies the components of an XML file as. The etree library handles the XML data conceptually like a tree, with relevant branches, subbranches, and even twigs. In computer science terms, we call this a parent-child relationship.

Using the etree library, you are going to load the data into variables. These variables will hold composite pieces of data within themselves. These are referred to as elements, which can be further dissected to find useful information. For example, if you load the root of an XML nmap structure into a variable and then print it, you will see the reference and a tag that describes the element and the data within it, as seen in the following screenshot:

Understanding how to parse XML files for reports

Note

Additional details related to the etree library can be found at https://docs.python.org/2/library/xml.etree.elementtree.html.

Each element can have a parent-child relationship with other nodes and even sub-children nodes, known as grandchildren. Each node holds the information that we are trying to parse. A node typically has a tag, which is the description of the data it holds, and an attribute, which is the actual data. To better highlight how this information is presented in XML, we have captured an element of the nmap XML, the hostname's node, and a single resulting child, as seen here:

Understanding how to parse XML files for reports

As you look at an XML file, you may notice that you can have multiple nodes within an element. For example, a host may have a number of different hostnames for the same Internet Protocol (IP) address due to multiple references. As such, to iterate over all the nodes of an element, you need to use a for loop to capture all the possible data components. The parsing of this data is for producing an output, which is only as good as the data samples you have.

This means that you should take multiple sample XML files to get a better cross-section of information. The point is to get the majority of the possible data combinations. Even with samples that should cover the majority of issues that you will run into, there will be examples that are not accounted for. So, do not get discouraged if your script breaks in the middle of its use. Trace the errors and determine what needs to be adjusted.

For our tests, we are going to use multiple nmap scans and our Kali instance and output the details to XML file.

Tip

Python has a fantastic library, called libnmap, that can be used to run and schedule scans and even help parse output files to generate reports. More details on this can be found at https://libnmap.readthedocs.org/en/latest/. We could use this library to parse the output and generate a report, but this library works only for nmap. If you want to parse other XML outputs from other tools to add details to a more manageable format, this library will not help you.

When we are getting ready to write a parser, the first stage is to map the file that we are going to parse. So, we take notes of the likely ways in which we need to have our script interact with the output. After mapping the file, we place several print statements throughout the file to show what elements our script has stopped or broken its processing at. To better understand each element, you should load the example XMLs into a tool that allows proper XML viewing. Notepad++ works very well, provided you have the XML tools plugin installed.

Once you have loaded the file into Notepad++, you should collapse the XML tree down to its root. The following screenshot shows that the root of this tree is nmaprun:

Understanding how to parse XML files for reports

After you expand it once, you get a number of subnodes, which can be further expanded and broken down.

Understanding how to parse XML files for reports

From these details, we see that we have to load the XML file into the handler and then walk through the host element. We should, however, consider the fact that this is a single host, so there will only be one host element. As such, we should iterate through the host element with a for loop to capture other hosts that would be scanned in future iterations.

When the host element is expanded, we can find that there are nodes for the address, hostnames, ports, and the time. The nodes we are interested in would be the address, hostnames, and ports. Both the hostnames and ports nodes are expandable, which means that they probably need to be iterated as well.

Tip

You can iterate through any node with a for loop even if there is only one entry. This ensures you will capture all the information in child nodes and prevent the breaking of the parser.

This screenshot highlights the details of the expanded XML tree, with the details that we care about:

Understanding how to parse XML files for reports

For the address, we can see there are different address types, as highlighted by the addrtype tag. In nmap XML outputs, you will find the ipv4, ipv6, and mac addresses. If you want different address types in your output, you can get them by pulling the data with simple if-then statements and then loading it into the appropriate variables. If you just want an address to be loaded into a variable regardless of the type, you will have to create an order of precedence.

The nmap tool may or may not find a hostname for each target scanned. This depends on how the scanner attempted to retrieve the information. For example, if Domain Name Service (DNS) requests were enabled or the scan was against the localhost, a hostname may have been identified. Other instances of scans may not identify an actual hostname. We have to build our script to take into consideration the different outputs that may be provided depending on the scan. Our localhost scan, as seen in the following screenshot, did provide a hostname, so we have information that we can extract in this example:

Understanding how to parse XML files for reports

Thus, we have determined that we are going to load the hostnames and addresses into variables. We are going to look at the ports element to identify the parent and child node data we are going to extract. The XML nodes in this area of the tree have a large amount of data since they have to be represented by numerous tags and attributes, as shown in this screenshot:

Understanding how to parse XML files for reports

While looking at the details of these nodes, we should consider what components we would like to extract. We know that we will have to iterate all the ports, and we can uniquely identify the ports by the portid tag, which represents the port number, but we have to consider what data is useful to us as assessors. The protocol of the port, such as Transmission Control Protocol (TCP) and User Datagram Protocol (UDP), is useful. Also, the state of the port and whether it is open, closed, filtered, or open|filtered is important. Finally, the name of the service that may have been identified would be good to catalogue in a report.

Tip

Remember that a service name may be inaccurate, depending on the type of scan. If there is no service detection, nmap uses the defaults described in Linux's /etc/services file for those ports. So, if you are generating reports for a client as part of a footprinting exercise, make sure that you enable some form of service detection. Otherwise, the data that you provide could be considered inaccurate.

After reviewing the XML file, we have determined that in addition to the addresses and hostnames, we are also going to capture every port number, the protocol, the service attached to it, and the state. With these details, we can consider how we want to format our report. As previous images have shown, data from the nmap XMLs is not narrative in format, so a Microsoft Word document will not be as useful as a spreadsheet—potentially.

Therefore, we have to consider the manner in which the data will be represented in the report: a line per host or a line per port. There are benefits and trade-offs for each of these representations. A line-by-line host representation means that composite information is easy to represent, but if we want to filter our data, we can only filter on unique information about the host or port groups, and not on individual ports.

To make this more useful, each line in the spreadsheet will represent a port, which means that the particulars of each port can be represented on a line. This can help our clients filter on each item that we extract from the XML to include the hostname, address, port, service name, protocol, and port state. The following screenshot shows what we will be working towards:

Understanding how to parse XML files for reports

Since we are writing a parser and a report generator, it would be good to create two separate classes to handle this information. The added benefit is that the XML parser can be instantiated, which means that we can use the parser to run against more than one XML file and then combine combine each iteration into holistic and unique results. This is extremely beneficial for us, since we typically run more than one nmap scan during an engagement, and combining results and eliminating duplicates can be a rather laborious process. Again, this is an ideal example in which scripting can make our lives easier.