Linux Shell Scripting Cookbook

lynx -traversal URL will produce a number of files in the working directory. It includes a reject.dat file, which will contain all the links in the website. sort -u is used to build a list by avoiding duplicates. Then, we iterate through each link and check the header response using curl -I. If the first line of the header contains HTTP/ and either OK or 200, it means that the link is valid. If the link is not valid, it is rechecked and tested for a 301-link moved-reply. If that test also fails, the broken link is printed on the screen.

From its name, it might seem like reject.dat should contain a list of URLs that were broken or unreachable. However, this is not the case, and lynx just adds all the URLs there.
Also note that lynx generates a file called traverse.errors, which contains all the URLs that had problems in browsing. However, lynx will only add URLs that return HTTP 404 (not found), and so we will lose other errors (for instance, HTTP 403 Forbidden). This is why we manually check for statuses.

Table of Contents for
Linux Shell Scripting Cookbook - Third Edition

How it works...

Table of Contents for Linux Shell Scripting Cookbook - Third Edition

Table of Contents for
Linux Shell Scripting Cookbook - Third Edition