A redirector is an external process that rewrites URIs from client requests. For example, although a user requests the page http://www.example.com/page1.html, a redirector can change the request to something else, such as http://www.example.com/page2.html. Squid fetches the new URI automatically, as though the client originally requested it. If the response is cachable, Squid stores it under the new URI.
The redirector feature allows you to implement a number of interesting things with Squid. Many sites use them for access controls, removing advertisements, local mirrors, or even working around browser bugs.
One of the nice things about using a redirector for access control is that you can send the user to a page that explains exactly why her request is denied. You may also find that a redirector offers more flexibility than Squid’s built-in access controls. As you’ll see shortly, however, a redirector doesn’t have access to the full spectrum of information contained in a client’s request.
Many people use a redirector to filter out web page advertisements. In most cases, this involves changing a request for a GIF or JPEG advertisement image into a request for a small, blank image, located on a local server. Thus, the advertisement just “disappears” and doesn’t interfere with the page layout.
So in essence, a redirector is really just a program that reads a URI and other information from its input and writes a new URI on its output. Perl and Python are popular languages for redirectors, although some authors use compiled languages such as C for better performance.
The Squid source code doesn’t come with any redirector programs. As an administrator, you are responsible for writing your own or downloading one written by someone else. The first part of this chapter describes the interface between Squid and a redirector process. I also provide a couple of simple redirector examples in Perl. If you’re interested in using someone else’s redirector, rather than programming your own, skip ahead to Section 11.3.
A redirector receives data from Squid on stdin one line at a time. Each line contains the following four tokens separated by whitespace:
Request-URI
Client IP address and fully qualified domain name
User’s name, via either RFC 1413 ident or proxy authentication
HTTP request method
For example:
http://www.example.com/page1.html 192.168.2.3/user.host.name jabroni GET
The Request-URI is taken from the client’s request, including
query terms, if any. Fragment identifier components (e.g., the # character and subsequent text) are removed,
however.
The second token contains the client IP address and, optionally, its fully qualified
domain name (FQDN). The FQDN is set only if you enable the
log_fqdn directive or use a
srcdomain ACL element. Even then, the FQDN may be
unknown because the client’s network administrators didn’t properly set
up the reverse pointer zones in their DNS. If Squid doesn’t know the
client’s FQDN, it places a hyphen (-)
in the field. For example:
http://www.example.com/page1.html 192.168.2.3/- jabroni GET
The client ident field is set if Squid knows the name of the user behind the request. This
happens if you use proxy authentication, ident ACL
elements, or enable ident_lookup_access. Remember,
however, that the ident_lookup_access directive
doesn’t cause Squid to delay request processing. In other words, if you
enable that directive, but don’t use the access controls, Squid may not
yet know the username when writing to the redirector process. If Squid
doesn’t know the username, it displays a -. For example:
http://www.example.com/page1.html 192.168.2.3/- - GET
Squid reads back one token from the redirector process: a URI. If Squid reads a blank line, the original URI remains unchanged.
A redirector program should never exit until end-of-file occurs on stdin. If the process does exit prematurely, Squid writes a warning to cache.log:
WARNING: redirector #2 (FD 18) exited
If 50% of the redirector processes exit prematurely, Squid aborts with a fatal error message.
If the Request-URI contains whitespace, and the uri_whitespace directive is
set to allow, any whitespace in the
URI is passed to the redirector. A redirector with a simple parser may
become confused in this case. You have two options for handling
whitespace in URIs when using a redirector.
One option is to set the uri_whitespace
directive to anything except allow.
The default setting, strip, is
probably a good choice in most situations because Squid simply removes
the whitespace from the URI when it parses the HTTP request. See Appendix A for information on the other
values for this directive.
If that isn’t an option, you need to make sure the redirector’s parser is smart enough to detect the extra tokens. For example, if it finds more than four tokens in the line received from Squid, it can assume that the last three are the IP address, ident, and request method. Everything before the third-to-last token comprises the Request-URI.
When a redirector changes the client’s URI, it normally doesn’t know that Squid
decided to fetch a different resource. This is, in all likelihood, a
gross violation of the HTTP RFC. If you want to be nicer, and remain
compliant, there is a little trick that makes Squid return an HTTP
redirect message. Simply have the redirector insert 301:, 302:, 303:, or 307:, before the new URI.
For example, if a redirector writes this line on its stdout:
301:http://www.example.com/page2.html
Squid sends a response like this back to the client:
HTTP/1.0 301 Moved Permanently Server: squid/2.5.STABLE4 Date: Mon, 29 Sep 2003 04:06:23 GMT Content-Length: 0 Location: http://www.example.com/page2.html X-Cache: MISS from zoidberg Proxy-Connection: close
Example 11-1 is a very simple redirector written in Perl. Its purpose is to send HTTP requests for the squid-cache.org site to a local mirror site in Australia. If the requested URI looks like it is for www.squid-cache.org or one of its mirror sites, this script outputs a new URI with the hostname set to www1.au.squid-cache.org.
A common problem first-time redirector writers encounter is buffered I/O. Note that here I make sure stdout is unbuffered.
#!/usr/bin/perl -wl
$|=1; # don't buffer the output
while (<>) {
($uri,$client,$ident,$method) = ( );
($uri,$client,$ident,$method) = split;
next unless ($uri =~ m,^http://.*\.squid-cache\.org(\S*),);
$uri = "http://www1.au.squid-cache.org$1";
} continue {
print "$uri";
}Example 11-2 is another, somewhat more complicated, example. Here I make a feeble attempt to deny requests when the URI contains “bad words.” This script demonstrates an alternative way to parse the input fields. If I don’t get all five required fields, the redirector returns a blank line, leaving the request unchanged.
This example also gives preferential treatment to some users. If
the ident string is equal to “BigBoss,” or comes from the 192.168.4.0
subnet, the request is passed through. Finally, I use the 301: trick to make Squid return an HTTP
redirect to the client. Note, this program is neither efficient nor
smart enough to correctly deny so-called bad requests.
#!/usr/bin/perl -wl
$|=1; # don't buffer the output
$DENIED = "http://www.example.com/denied.html";
&load_word_list( );
while (<>) {
unless (m,(\S+) (\S+)/(\S+) (\S+) (\S+),) {
$uri = '';
next;
}
$uri = $1;
$ipaddr = $2;
#$fqdn = $3;
$ident = $4;
#$method = $5;
next if ($ident eq 'TheBoss');
next if ($ipaddr =~ /^192\.168\.4\./);
$uri = "301:$DENIED" if &word_match($uri);
} continue {
print "$uri";
}
sub load_word_list {
@words = qw(sex drugs rock roll);
}
sub word_match {
my $uri = shift;
foreach $w (@words) { return 1 if ($uri =~ /$w/); }
return 0;
}For more ideas about writing your own redirector, I recommend reading the source code for the redirectors mentioned in Section 11.5.
A redirector can take an arbitrarily long time to return its answer. For example, it may need to make a database query, search through long lists of regular expressions, or make some complex computations. Squid uses a pool of redirector processes so that they can all work in parallel. While one is busy, Squid hands a new request off to another.
For each new request, Squid examines the pool of redirector processes in order. It submits the request to the first idle process. If your request rate is very low, the first redirector may be able to handle all requests itself.
You can control the size of the redirector pool with the redirect_children directive. The default value is five processes. Note that Squid doesn’t dynamically increase or decrease the size of the pool depending on the load. Thus, it is a good idea to be a little liberal. If all redirectors are busy, Squid queues pending requests. If the queue becomes too large (bigger than twice the pool size), Squid exits with a fatal error message:
FATAL: Too many queued redirector requests
In this case, you need to increase the size of the redirector pool or change something so that the redirectors can process requests faster. You can use the cache manager’s redirector page to find out if you have too few, or too many redirectors running. For example:
% squidclient mgr:redirector
...
Redirector Statistics:
program: /usr/local/squid/bin/myredir
number running: 5 of 5
requests sent: 147
replies received: 142
queue length: 2
avg service time: 953.83 msec
# FD PID # Requests Flags Time Offset Request
1 10 35200 46 AB 0.902 0 http://...
2 11 35201 29 AB 0.401 0 http://...
3 12 35202 25 AB 1.009 1 cache_o...
4 14 35203 25 AB 0.555 0 http://...
5 15 35204 21 AB 0.222 0 http://...If, as in this example, you see that the last redirector has almost as many requests as the second to last, you should probably increase the size of the redirector pool. If, on the other hand, you see many redirectors with no requests, you can probably decrease the pool size.
The following five squid.conf directives control the behavior of redirectors in Squid.
The redirect_program directive specifies the command line for the redirector program. For example:
redirect_program /usr/local/squid/bin/my_redirector -xyz
Note, the redirector program must be executable by the Squid user ID. If, for some reason, Squid can’t execute the redirector, you should see an error message in cache.log.[1] For example:
ipcCreate: /usr/local/squid/bin/my_redirector: (13) Permission denied
Due to the way Squid works, the main Squid process may be unaware of problems executing the redirector program. Squid doesn’t detect the error until it tries to write a request and read a response. It then prints:
WARNING: redirector #1 (FD 6) exited
Thus, if you see such a message for the first request sent to Squid, check cache.log closely for other errors, and make sure the program is executable by Squid.
The redirect_children directive specifies how many redirector processes Squid should start. For example:
redirect_children 20
Squid warns you (via cache.log) when all redirectors are simultaneously busy:
WARNING: All redirector processes are busy. WARNING: 1 pending requests queued.
If you see this warning, you should increase the number of child processes and restart (or reconfigure) Squid. If the queue size becomes twice the number of redirectors, Squid aborts with a fatal message.
Don’t attempt to disable Squid’s use of the redirectors by
setting redirect_children to 0. Instead, simply remove the
redirect_program line from squid.conf.
Squid normally updates a request’s Host header when using a redirector. That is, if the redirector
returns a new URI with a different hostname, Squid puts the new
hostname in the Host header. If you
use Squid as a surrogate (see Chapter
15), you might want to disable this behavior by setting the
redirect_rewrites_host_header directive to
off:
redirect_rewrites_host_header off
Squid normally sends every request through a redirector. However, you can use the redirector_access rules to send certain requests through selectively. The syntax is identical to http_access:
redirector_access allow|deny [!]ACLname ...For example:
acl Foo src 192.168.1.0/24 acl All src 0/0 redirector_access deny Foo redirector_access allow All
In this case, Squid skips the redirector for any request that matches the Foo ACL.
If you enable the redirector_bypass directive, Squid bypasses the redirectors when all of them are busy. Normally, Squid queues pending requests until a redirector process becomes available. If this queue grows too large, Squid exits with a fatal error message. Enabling this directive ensures that Squid never reaches that state.
The tradeoff, of course, is that some user requests may not be redirected when the load is high. If that’s all right with you, simply enable the directive with this line:
redirector_bypass on
As I already mentioned, the Squid source code doesn’t include any redirectors. However, you can find a number of useful third-party redirectors linked from the Related Software page on http://www.squid-cache.org. Here are some of the more popular offerings:
Squirm comes from Chris Foote. It is written in C and distributed as source code under the GNU General Public License (GPL). Squirm’s features include:
Being very fast with minimal memory usage
Full regular expression pattern matching and replacement
Ability to apply different redirection lists to different client groups
Interactive mode for testing on the command line
Fail-safe mode passes requests through unchanged in the event that configuration files contain errors
Writing debugging, errors, and more to various log files
http://www.linofee.org/~elkner/webtools/jesred/
Jesred comes from Jens Elkner. It is written in C, based on Squirm, and also released under the GNU GPL. Its features include:
Being faster than Squirm, with slightly more memory usage
Ability to reread its configuration files while running
Full regular expression pattern matching and replacement
Fail-safe mode passes requests through unchanged in the event that configuration files contain errors
Optionally logging rewritten requests to a log file
squidGuard comes from Pål Baltzersen and Lars Erik Håland at Tele Danmark InterNordia. It is released under the GNU GPL. The authors also make sure squidGuard compiles easily on modern Unix systems. Their site contains a lot of good documentation. Here are some of squidGuard’s features:
Highly configurable; you can apply different rules to different groups of clients or users and at different times or days
URI substitution, not just replacement, à la sed
printf-like substitutions allow passing parameters to CGI scripts for customized messages
Supportive of the 301/302/303/307 HTTP redirect status code feature for redirectors
Selective logging for rewrite rule sets
At the squidGuard site, you can also find a blacklist of more than 100,000 sites categorized as porn, aggressive, drugs, hacking, ads, and more.
http://www.adzapper.sourceforge.net
AdZapper is a popular redirector because it specifically targets removal of advertisements from HTML pages. It is a Perl script written by Cameron Simpson. AdZapper can block banners (images), pop-up windows, flash animations, page counters, and web bugs. The script includes a list of regular expressions that match URIs known to contain ads, pop-ups, etc. Cameron updates the script periodically with new patterns. You can also maintain your own list of patterns.
Write a redirector that never changes the requested URI and configure Squid to use it.
While running tail
-f cache.log, kill Squid’s redirector
processes one by one until something interesting happens.
Download and install one of the redirectors mentioned in the previous section.
[1] This message appears only in cache.log, and not on
stdout, if you use the -d
option, or in syslog, if you use the -s
option.