The status code of the request page is shown in field 9 of the log. The 404 status will represent the page not found error on the server. I am sure we have all seen that in our browsers at some stage. This may be indicative of a misconfigured link on your site or just produced by a browser searching for the icon image to display in tabbed browsers for the page. You can also identify potential threats to your site by requests looking for standard pages that may give access to additional information on PHP driven sites, such as WordPress.
Firstly, we can solely print the status of the request:
$ awk '{ print $9 } ' access.log
We can now extend the code a little as well as ourselves and just print the 404 errors:
$ awk ' ( $9 ~ /404/ ) { print $9 } ' access.log
We can extend this a little further by printing both the status code and the page that was being accessed. This will need us to print field 9 and field 7. Simply put, this will be as shown in the following code:
$ awk ' ( $9 ~ /404/ ) { print $9, $7 } ' access.log
Many of these failed accessed pages will be duplicated. To summarize these records, we can use the command pipeline to achieve this with theĀ sort and uniq commands:
$ awk ' ( $9 ~ /404/ ) { print $9, $7 } ' access.log | sort -u
To use the uniq command, the data must be pre-sorted; hence, we use the sort command to prepare the data.