Table of Contents for
Mastering Wireshark 2

Version ebook / Retour

Cover image for bash Cookbook, 2nd Edition Mastering Wireshark 2 by Andrew Crouthamel Published by Packt Publishing, 2018
  1. Mastering Wireshark 2
  2. Title Page
  3. Copyright and Credits
  4. Mastering Wireshark 2
  5. Packt Upsell
  6. Why subscribe?
  7. PacktPub.com
  8. Contributor
  9. About the author
  10. Packt is searching for authors like you
  11. Table of Contents
  12. Preface
  13. Who this book is for
  14. What this book covers
  15. To get the most out of this book
  16. Download the color images
  17. Conventions used
  18. Get in touch
  19. Reviews
  20. Installing Wireshark 2
  21. Installation and setup
  22. Installing Wireshark on Windows
  23. Installing Wireshark on macOS
  24. Installing Wireshark on Linux
  25. Summary
  26. Getting Started with Wireshark
  27. What's new in Wireshark 2?
  28. Capturing traffic
  29. How to capture traffic
  30. Saving and exporting packets
  31. Annotating and printing packets
  32. Remote capture setup
  33. Prerequisites
  34. Remote capture usage
  35. Summary
  36. Filtering Traffic
  37. Berkeley Packet Filter (BPF) syntax
  38. Capturing filters
  39. Displaying filters
  40. Following streams
  41. Advanced filtering
  42. Summary
  43. Customizing Wireshark
  44. Preferences
  45. Appearance
  46. Layout
  47. Columns
  48. Fonts and colors
  49. Capture
  50. Filter buttons
  51. Name resolution
  52. Protocols
  53. Statistics
  54. Advanced
  55. Profiles
  56. Colorizing traffic
  57. Examples of colorizing traffic
  58. Example 1
  59. Example 2
  60. Summary
  61. Statistics
  62. TCP/IP overview
  63. Time values and summaries
  64. Trace file statistics
  65. Resolved addresses
  66. Protocol hierarchy
  67. Conversations
  68. Endpoints
  69. Packet lengths
  70. I/O graph
  71. Load distribution
  72. DNS statistics
  73. Flow graph
  74. Expert system usage
  75. Summary
  76. Introductory Analysis
  77. DNS analysis
  78. An example for DNS request failure
  79. ARP analysis
  80. An example for ARP request failure
  81. IPv4 and IPv6 analysis
  82. ICMP analysis
  83. Using traceroute
  84. Summary
  85. Network Protocol Analysis
  86. UDP analysis
  87. TCP analysis I
  88. TCP analysis II
  89. Graph I/O rates and TCP trends
  90. Throughput
  91. I/O graph
  92. Summary
  93. Application Protocol Analysis I
  94. DHCP analysis
  95. HTTP analysis I
  96. HTTP analysis II
  97. FTP analysis
  98. Summary
  99. Application Protocol Analysis II
  100. Email analysis
  101. POP and SMTP
  102. 802.11 analysis
  103. VoIP analysis
  104. VoIP playback
  105. Summary
  106. Command-Line Tools
  107. Running Wireshark from a command line
  108. Running tshark
  109. Running tcpdump
  110. Running dumpcap
  111. Summary
  112. A Troubleshooting Scenario
  113. Wireshark plugins
  114. Lua programming
  115. Determining where to capture
  116. Capturing scenario traffic
  117. Diagnosing scenario traffic
  118. Summary
  119. Other Books You May Enjoy
  120. Leave a review - let other readers know what you think

HTTP analysis I

In this section, we'll take a look at how HTTP works (what are some of the codes within HTTP and what's inside a packet), source and destination information and some of the options there, and how servers and clients interact and show a connection between a server and a client.

What we'll do is start another packet capture and open up a website. In this example, I opened up a web page to https://www.npr.org/, which happens to be an unencrypted website. It uses plain HTTP by default so, that way, the communication is not hidden behind TLS encryption. This way, we can take a look at what actually happens within the HTTP headers.

If we scroll down, we can see we have the www.npr.org DNS resolution, our answer, and the beginning of the SYN, ACK three-way handshake for the TCP connection:

We will also see some akamai DNS resolutions, as well, and that's because if we take a look at www.npr.org it is actually hosted off of some akamai servers, which is a content distribution network that's distributed around the world, so it's very quick to respond. Hence, it has to resolve some of these additional servers as we go along.

We can see that we have an initial TCP request to the actual server, and then my system asks for sections /news because I was opening up the news section on https://www.npr.org/:

If we scroll down, we'll see that there's an HTTP protocol and some TCP reassembled segment stuff. This is a lot of TCP and I can't see anything for HTTP. Why is that? It's because we have the reassembly enabled in the options. This is something you'll probably want to turn off if you're doing HTTP analysis.

Go to Edit | Preferences... | ProtocolsTCP and turn off Allow subdissector to reassemble TCP streams:

If you turn that off, you can see that we now get some insight into HTTP, and it actually shows up in the Info column what are the commands back and forth for the HTTP traffic. You can see that they will now show up properly as HTTP in the Protocol column, and it will say that it's a continuation of HTTP as it transmits all of the website information from the server to my client. You can see that the Window size is actually used, as well. We have a nice, big window size and then we have a list of packets that we then acknowledge:

If we take a look at the HTTP here, my system 77.160 did a GET request. HTTP has two primary commands that we use: GET and POST. A GET request retrieves information while a POST request sends information. So, you know how, in some forms on certain websites or if you make changes to web settings in a profile, you're sending data to the server, telling it to change something on the server: you do that with POST. With GET we are asking for information. So, in this example, I am getting /sections/news, and I'm requesting it over version HTTP 1.1. There is a new version of HTTP, which has recently come in use, and it's based off of Google's SPDY protocol which they had previously created.

If you want to learn more about the SPDY protocol and what it was, you can take a look at that on Wikipedia or on https://www.chromium.org/—they have a page on this as well. It was really an experimental protocol to speed up traffic, and it has since been deprecated in favor of HTTP 2.0, which has now become the standard. So, the ideas within SPDY have been merged into the HTTP 2.0 standard.

What it did was it basically optimized the HTTP header information and the communications so that it could achieve up to a 50% speed increase on loading a website: that's very powerful and impressive.

So, when you see these GET requests, you'll most likely be seeing 1.1 for some really old clients if you're using, like, a really old program on a very old system, maybe even asking for 1.0. But you may now see 2.0 requests. Probably about a third of the major websites out there now use HTTP 2.0, and these will of course only increase over time. So, we are asking for /sections/news, and then it's insinuated there that we are asking for the index.html page from inside that. Thus, we're asking for a folder structure. By default, HTTP will look for index.html or a couple of other different files indexed at HTML or some other file format. It's the responsibility of the server to serve up that core page that'll first show up.

From the server, we can see that we have a TCP acknowledgment to that GET request, and then the server responds back with: "ok, that sounds good. I will send that to you because I've been able to find that page." Thus, if you request a page that's incorrect, you'll get an error message.

In HTTP, we have different types of commands with different numbers. If you'd like to learn about HTTP in more depth, take a look at the RFC on the IETF website. You're looking for number 2616, and that's for 1.1. Remember I said there's a new one, 2.0, coming out, so of course that's going to have a higher number. When you look through the standard, you'll see a bunch of different codes. You will see a bunch of the code blocks that are available, and the details of each code within it but, if you look to the left, you will see a status code number. Anything that is 200 or 300 are OK. So a 200 OK means "I found the file, no problem." 201 says "ok, I created it." 202 is an accept. These are all good things. If you get a 300, this might be a redirect or to move a file somewhere else. A 400 or a 500 is an error. So, a 400 is a server error. All 400 numbers are server errors saying: "I can't find the file. You're not allowed to get there"; or "It's forbidden"; or "Your method is not allowed"; or "The server is rejecting your request". A 500 error is a client error, so there's a problem on your client side. It is very common to see a 404 error when you try to request a web page that does not exist. You'll see it all over the internet and now everyone's used to it, but a 404 error is: "I cannot find the file." The server says: "I don't know what you're requesting It's, not where you say it is" and it sends back a 404 error.

If we look in the packet details into the HTTP header information you see that we have a server, Apache; this may also say something such as nginx or some other server that's running—Apache is still the most common one. It'll tell you what it's running. If it's running PHP of different versions or Python or something like that, it'll tell you what the content type is. Is it an HTML page? Is it some other kind of content type? Is it an XML page? Sometimes you can have encoding as well. Some pages and some servers allow for compression. So, they'll compress with gzip, which is like creating a ZIP file of the server page that it's sending back so that it is smaller, and so uses less packets and is quicker to send to the client. It takes a little bit of processing power on the server or the clients' sides to do that, but it's usually beneficial. It'll also tell us how long the content length is.

Now that we've gone through all these different protocols, they almost always tell you how long the content is so that you can validate whether you've received everything. We also have an expiry and a cache control. This tells the system how long to save a page. When your client receives the page, it will cache it for a period of time based on this so that it can refer to its local cache if it goes back there again.

So, if you're constantly going back to the same web page all the time, it will load it off of your local cache rather than constantly pull the server for it and use up unnecessary bandwidth and resources on the server. If you don't wish to use the cache, that's when you refresh a page. You can usually press Ctrl + F5 and it will force the cache in your browser to delete that page and request a new one. If we expand the packet details, we will see Line-based text data; it will actually show us the web page itself, as it's sent to us:

In the next section, we'll dive into HTTP a little bit more, talk about some more problems, and take a look at how you can decrypt TLS-encrypted HTTP data—HTTPS in Wireshark.