Table of Contents for
Squid: The Definitive Guide

Version ebook / Retour

Cover image for bash Cookbook, 2nd Edition Squid: The Definitive Guide by Duane Wessels Published by O'Reilly Media, Inc., 2004
  1. Cover
  2. Squid: The Definitive Guide
  3. Squid: The Definitive Guide
  4. Dedication
  5. Preface
  6. 1. Introduction
  7. 2. Getting Squid
  8. 3. Compiling and Installing
  9. 4. Configuration Guide for the Eager
  10. 5. Running Squid
  11. 6. All About Access Controls
  12. 7. Disk Cache Basics
  13. 8. Advanced Disk Cache Topics
  14. 9. Interception Caching
  15. 10. Talking to Other Squids
  16. 11. Redirectors
  17. 12. Authentication Helpers
  18. 13. Log Files
  19. 14. Monitoring Squid
  20. 15. Server Accelerator Mode
  21. 16. Debugging and Troubleshooting
  22. A. Config File Reference
  23. http_port
  24. https_port
  25. ssl_unclean_shutdown
  26. icp_port
  27. htcp_port
  28. mcast_groups
  29. udp_incoming_address
  30. udp_outgoing_address
  31. cache_peer
  32. cache_peer_domain
  33. neighbor_type_domain
  34. icp_query_timeout
  35. maximum_icp_query_timeout
  36. mcast_icp_query_timeout
  37. dead_peer_timeout
  38. hierarchy_stoplist
  39. no_cache
  40. cache_access_log
  41. cache_log
  42. cache_store_log
  43. cache_swap_log
  44. emulate_httpd_log
  45. log_ip_on_direct
  46. cache_dir
  47. cache_mem
  48. cache_swap_low
  49. cache_swap_high
  50. maximum_object_size
  51. minimum_object_size
  52. maximum_object_size_in_memory
  53. cache_replacement_policy
  54. memory_replacement_policy
  55. store_dir_select_algorithm
  56. mime_table
  57. ipcache_size
  58. ipcache_low
  59. ipcache_high
  60. fqdncache_size
  61. log_mime_hdrs
  62. useragent_log
  63. referer_log
  64. pid_filename
  65. debug_options
  66. log_fqdn
  67. client_netmask
  68. ftp_user
  69. ftp_list_width
  70. ftp_passive
  71. ftp_sanitycheck
  72. cache_dns_program
  73. dns_children
  74. dns_retransmit_interval
  75. dns_timeout
  76. dns_defnames
  77. dns_nameservers
  78. hosts_file
  79. diskd_program
  80. unlinkd_program
  81. pinger_program
  82. redirect_program
  83. redirect_children
  84. redirect_rewrites_host_header
  85. redirector_access
  86. redirector_bypass
  87. auth_param
  88. authenticate_ttl
  89. authenticate_cache_garbage_interval
  90. authenticate_ip_ttl
  91. external_acl_type
  92. wais_relay_host
  93. wais_relay_port
  94. request_header_max_size
  95. request_body_max_size
  96. refresh_pattern
  97. quick_abort_min
  98. quick_abort_max
  99. quick_abort_pct
  100. negative_ttl
  101. positive_dns_ttl
  102. negative_dns_ttl
  103. range_offset_limit
  104. connect_timeout
  105. peer_connect_timeout
  106. read_timeout
  107. request_timeout
  108. persistent_request_timeout
  109. client_lifetime
  110. half_closed_clients
  111. pconn_timeout
  112. ident_timeout
  113. shutdown_lifetime
  114. acl
  115. http_access
  116. http_reply_access
  117. icp_access
  118. miss_access
  119. cache_peer_access
  120. ident_lookup_access
  121. tcp_outgoing_tos
  122. tcp_outgoing_address
  123. reply_body_max_size
  124. cache_mgr
  125. cache_effective_user
  126. cache_effective_group
  127. visible_hostname
  128. unique_hostname
  129. hostname_aliases
  130. announce_period
  131. announce_host
  132. announce_file
  133. announce_port
  134. httpd_accel_host
  135. httpd_accel_port
  136. httpd_accel_single_host
  137. httpd_accel_with_proxy
  138. httpd_accel_uses_host_header
  139. dns_testnames
  140. logfile_rotate
  141. append_domain
  142. tcp_recv_bufsize
  143. err_html_text
  144. deny_info
  145. memory_pools
  146. memory_pools_limit
  147. forwarded_for
  148. log_icp_queries
  149. icp_hit_stale
  150. minimum_direct_hops
  151. minimum_direct_rtt
  152. cachemgr_passwd
  153. store_avg_object_size
  154. store_objects_per_bucket
  155. client_db
  156. netdb_low
  157. netdb_high
  158. netdb_ping_period
  159. query_icmp
  160. test_reachability
  161. buffered_logs
  162. reload_into_ims
  163. always_direct
  164. never_direct
  165. header_access
  166. header_replace
  167. icon_directory
  168. error_directory
  169. maximum_single_addr_tries
  170. snmp_port
  171. snmp_access
  172. snmp_incoming_address
  173. snmp_outgoing_address
  174. as_whois_server
  175. wccp_router
  176. wccp_version
  177. wccp_incoming_address
  178. wccp_outgoing_address
  179. delay_pools
  180. delay_class
  181. delay_access
  182. delay_parameters
  183. delay_initial_bucket_level
  184. incoming_icp_average
  185. incoming_http_average
  186. incoming_dns_average
  187. min_icp_poll_cnt
  188. min_dns_poll_cnt
  189. min_http_poll_cnt
  190. max_open_disk_fds
  191. offline_mode
  192. uri_whitespace
  193. broken_posts
  194. mcast_miss_addr
  195. mcast_miss_ttl
  196. mcast_miss_port
  197. mcast_miss_encode_key
  198. nonhierarchical_direct
  199. prefer_direct
  200. strip_query_terms
  201. coredump_dir
  202. ignore_unknown_nameservers
  203. digest_generation
  204. digest_bits_per_entry
  205. digest_rebuild_period
  206. digest_rewrite_period
  207. digest_swapout_chunk_size
  208. digest_rebuild_chunk_percentage
  209. chroot
  210. client_persistent_connections
  211. server_persistent_connections
  212. pipeline_prefetch
  213. extension_methods
  214. request_entities
  215. high_response_time_warning
  216. high_page_fault_warning
  217. high_memory_warning
  218. ie_refresh
  219. vary_ignore_expire
  220. sleep_after_fork
  221. B. The Memory Cache
  222. C. Delay Pools
  223. D. Filesystem Performance Benchmarks
  224. E. Squid on Windows
  225. F. Configuring Squid Clients
  226. About the Author
  227. Colophon
  228. Copyright

Chapter 11. Redirectors

A redirector is an external process that rewrites URIs from client requests. For example, although a user requests the page http://www.example.com/page1.html, a redirector can change the request to something else, such as http://www.example.com/page2.html. Squid fetches the new URI automatically, as though the client originally requested it. If the response is cachable, Squid stores it under the new URI.

The redirector feature allows you to implement a number of interesting things with Squid. Many sites use them for access controls, removing advertisements, local mirrors, or even working around browser bugs.

One of the nice things about using a redirector for access control is that you can send the user to a page that explains exactly why her request is denied. You may also find that a redirector offers more flexibility than Squid’s built-in access controls. As you’ll see shortly, however, a redirector doesn’t have access to the full spectrum of information contained in a client’s request.

Many people use a redirector to filter out web page advertisements. In most cases, this involves changing a request for a GIF or JPEG advertisement image into a request for a small, blank image, located on a local server. Thus, the advertisement just “disappears” and doesn’t interfere with the page layout.

So in essence, a redirector is really just a program that reads a URI and other information from its input and writes a new URI on its output. Perl and Python are popular languages for redirectors, although some authors use compiled languages such as C for better performance.

The Squid source code doesn’t come with any redirector programs. As an administrator, you are responsible for writing your own or downloading one written by someone else. The first part of this chapter describes the interface between Squid and a redirector process. I also provide a couple of simple redirector examples in Perl. If you’re interested in using someone else’s redirector, rather than programming your own, skip ahead to Section 11.3.

The Redirector Interface

A redirector receives data from Squid on stdin one line at a time. Each line contains the following four tokens separated by whitespace:

  • Request-URI

  • Client IP address and fully qualified domain name

  • User’s name, via either RFC 1413 ident or proxy authentication

  • HTTP request method

For example:

http://www.example.com/page1.html 192.168.2.3/user.host.name jabroni GET

The Request-URI is taken from the client’s request, including query terms, if any. Fragment identifier components (e.g., the # character and subsequent text) are removed, however.

The second token contains the client IP address and, optionally, its fully qualified domain name (FQDN). The FQDN is set only if you enable the log_fqdn directive or use a srcdomain ACL element. Even then, the FQDN may be unknown because the client’s network administrators didn’t properly set up the reverse pointer zones in their DNS. If Squid doesn’t know the client’s FQDN, it places a hyphen (-) in the field. For example:

http://www.example.com/page1.html 192.168.2.3/- jabroni GET

The client ident field is set if Squid knows the name of the user behind the request. This happens if you use proxy authentication, ident ACL elements, or enable ident_lookup_access. Remember, however, that the ident_lookup_access directive doesn’t cause Squid to delay request processing. In other words, if you enable that directive, but don’t use the access controls, Squid may not yet know the username when writing to the redirector process. If Squid doesn’t know the username, it displays a -. For example:

http://www.example.com/page1.html 192.168.2.3/- - GET

Squid reads back one token from the redirector process: a URI. If Squid reads a blank line, the original URI remains unchanged.

A redirector program should never exit until end-of-file occurs on stdin. If the process does exit prematurely, Squid writes a warning to cache.log:

WARNING: redirector #2 (FD 18) exited

If 50% of the redirector processes exit prematurely, Squid aborts with a fatal error message.

Handling URIs That Contain Whitespace

If the Request-URI contains whitespace, and the uri_whitespace directive is set to allow, any whitespace in the URI is passed to the redirector. A redirector with a simple parser may become confused in this case. You have two options for handling whitespace in URIs when using a redirector.

One option is to set the uri_whitespace directive to anything except allow. The default setting, strip, is probably a good choice in most situations because Squid simply removes the whitespace from the URI when it parses the HTTP request. See Appendix A for information on the other values for this directive.

If that isn’t an option, you need to make sure the redirector’s parser is smart enough to detect the extra tokens. For example, if it finds more than four tokens in the line received from Squid, it can assume that the last three are the IP address, ident, and request method. Everything before the third-to-last token comprises the Request-URI.

Generating HTTP Redirect Messages

When a redirector changes the client’s URI, it normally doesn’t know that Squid decided to fetch a different resource. This is, in all likelihood, a gross violation of the HTTP RFC. If you want to be nicer, and remain compliant, there is a little trick that makes Squid return an HTTP redirect message. Simply have the redirector insert 301:, 302:, 303:, or 307:, before the new URI.

For example, if a redirector writes this line on its stdout:

301:http://www.example.com/page2.html

Squid sends a response like this back to the client:

HTTP/1.0 301 Moved Permanently
Server: squid/2.5.STABLE4
Date: Mon, 29 Sep 2003 04:06:23 GMT
Content-Length: 0
Location: http://www.example.com/page2.html
X-Cache: MISS from zoidberg
Proxy-Connection: close

Some Sample Redirectors

Example 11-1 is a very simple redirector written in Perl. Its purpose is to send HTTP requests for the squid-cache.org site to a local mirror site in Australia. If the requested URI looks like it is for www.squid-cache.org or one of its mirror sites, this script outputs a new URI with the hostname set to www1.au.squid-cache.org.

A common problem first-time redirector writers encounter is buffered I/O. Note that here I make sure stdout is unbuffered.

Example 11-1. A simple redirector in Perl
#!/usr/bin/perl -wl
$|=1;   # don't buffer the output
while (<>) {
        ($uri,$client,$ident,$method) = ( );
        ($uri,$client,$ident,$method) = split;
        next unless ($uri =~ m,^http://.*\.squid-cache\.org(\S*),);
        $uri = "http://www1.au.squid-cache.org$1";
} continue {
        print "$uri";
}

Example 11-2 is another, somewhat more complicated, example. Here I make a feeble attempt to deny requests when the URI contains “bad words.” This script demonstrates an alternative way to parse the input fields. If I don’t get all five required fields, the redirector returns a blank line, leaving the request unchanged.

This example also gives preferential treatment to some users. If the ident string is equal to “BigBoss,” or comes from the 192.168.4.0 subnet, the request is passed through. Finally, I use the 301: trick to make Squid return an HTTP redirect to the client. Note, this program is neither efficient nor smart enough to correctly deny so-called bad requests.

Example 11-2. A slightly less simple redirector in Perl
#!/usr/bin/perl -wl
$|=1;   # don't buffer the output

$DENIED = "http://www.example.com/denied.html";
&load_word_list( );

while (<>) {
        unless (m,(\S+) (\S+)/(\S+) (\S+) (\S+),) {
                $uri = '';
                next;
        }
        $uri = $1;
        $ipaddr = $2;
        #$fqdn = $3;
        $ident = $4;
        #$method = $5;
        next if ($ident eq 'TheBoss');
        next if ($ipaddr =~ /^192\.168\.4\./);
        $uri = "301:$DENIED" if &word_match($uri);
} continue {
        print "$uri";
}

sub load_word_list {
        @words = qw(sex drugs rock roll);
}

sub word_match {
        my $uri = shift;
        foreach $w (@words) { return 1 if ($uri =~ /$w/); }
        return 0;
}

For more ideas about writing your own redirector, I recommend reading the source code for the redirectors mentioned in Section 11.5.

The Redirector Pool

A redirector can take an arbitrarily long time to return its answer. For example, it may need to make a database query, search through long lists of regular expressions, or make some complex computations. Squid uses a pool of redirector processes so that they can all work in parallel. While one is busy, Squid hands a new request off to another.

For each new request, Squid examines the pool of redirector processes in order. It submits the request to the first idle process. If your request rate is very low, the first redirector may be able to handle all requests itself.

You can control the size of the redirector pool with the redirect_children directive. The default value is five processes. Note that Squid doesn’t dynamically increase or decrease the size of the pool depending on the load. Thus, it is a good idea to be a little liberal. If all redirectors are busy, Squid queues pending requests. If the queue becomes too large (bigger than twice the pool size), Squid exits with a fatal error message:

FATAL: Too many queued redirector requests

In this case, you need to increase the size of the redirector pool or change something so that the redirectors can process requests faster. You can use the cache manager’s redirector page to find out if you have too few, or too many redirectors running. For example:

% squidclient mgr:redirector
...
Redirector Statistics:
program: /usr/local/squid/bin/myredir
number running: 5 of 5
requests sent: 147
replies received: 142
queue length: 2
avg service time: 953.83 msec

      #      FD     PID  # Requests     Flags      Time  Offset Request
      1      10   35200          46     AB        0.902       0 http://...
      2      11   35201          29     AB        0.401       0 http://...
      3      12   35202          25     AB        1.009       1 cache_o...
      4      14   35203          25     AB        0.555       0 http://...
      5      15   35204          21     AB        0.222       0 http://...

If, as in this example, you see that the last redirector has almost as many requests as the second to last, you should probably increase the size of the redirector pool. If, on the other hand, you see many redirectors with no requests, you can probably decrease the pool size.

Configuring Squid

The following five squid.conf directives control the behavior of redirectors in Squid.

redirect_program

The redirect_program directive specifies the command line for the redirector program. For example:

redirect_program /usr/local/squid/bin/my_redirector -xyz

Note, the redirector program must be executable by the Squid user ID. If, for some reason, Squid can’t execute the redirector, you should see an error message in cache.log.[1] For example:

ipcCreate: /usr/local/squid/bin/my_redirector: (13) Permission denied

Due to the way Squid works, the main Squid process may be unaware of problems executing the redirector program. Squid doesn’t detect the error until it tries to write a request and read a response. It then prints:

WARNING: redirector #1 (FD 6) exited

Thus, if you see such a message for the first request sent to Squid, check cache.log closely for other errors, and make sure the program is executable by Squid.

redirect_children

The redirect_children directive specifies how many redirector processes Squid should start. For example:

redirect_children 20

Squid warns you (via cache.log) when all redirectors are simultaneously busy:

WARNING: All redirector processes are busy.
WARNING: 1 pending requests queued.

If you see this warning, you should increase the number of child processes and restart (or reconfigure) Squid. If the queue size becomes twice the number of redirectors, Squid aborts with a fatal message.

Don’t attempt to disable Squid’s use of the redirectors by setting redirect_children to 0. Instead, simply remove the redirect_program line from squid.conf.

redirect_rewrites_host_header

Squid normally updates a request’s Host header when using a redirector. That is, if the redirector returns a new URI with a different hostname, Squid puts the new hostname in the Host header. If you use Squid as a surrogate (see Chapter 15), you might want to disable this behavior by setting the redirect_rewrites_host_header directive to off:

redirect_rewrites_host_header off

redirector_access

Squid normally sends every request through a redirector. However, you can use the redirector_access rules to send certain requests through selectively. The syntax is identical to http_access:

redirector_access allow|deny [!]ACLname ...

For example:

acl Foo src 192.168.1.0/24
acl All src 0/0
redirector_access deny Foo
redirector_access allow All

In this case, Squid skips the redirector for any request that matches the Foo ACL.

redirector_bypass

If you enable the redirector_bypass directive, Squid bypasses the redirectors when all of them are busy. Normally, Squid queues pending requests until a redirector process becomes available. If this queue grows too large, Squid exits with a fatal error message. Enabling this directive ensures that Squid never reaches that state.

The tradeoff, of course, is that some user requests may not be redirected when the load is high. If that’s all right with you, simply enable the directive with this line:

redirector_bypass on

Popular Redirectors

As I already mentioned, the Squid source code doesn’t include any redirectors. However, you can find a number of useful third-party redirectors linked from the Related Software page on http://www.squid-cache.org. Here are some of the more popular offerings:

Squirm

http://squirm.foote.com.au/

Squirm comes from Chris Foote. It is written in C and distributed as source code under the GNU General Public License (GPL). Squirm’s features include:

  • Being very fast with minimal memory usage

  • Full regular expression pattern matching and replacement

  • Ability to apply different redirection lists to different client groups

  • Interactive mode for testing on the command line

  • Fail-safe mode passes requests through unchanged in the event that configuration files contain errors

  • Writing debugging, errors, and more to various log files

Jesred

http://www.linofee.org/~elkner/webtools/jesred/

Jesred comes from Jens Elkner. It is written in C, based on Squirm, and also released under the GNU GPL. Its features include:

  • Being faster than Squirm, with slightly more memory usage

  • Ability to reread its configuration files while running

  • Full regular expression pattern matching and replacement

  • Fail-safe mode passes requests through unchanged in the event that configuration files contain errors

  • Optionally logging rewritten requests to a log file

squidGuard

http://www.squidguard.org/

squidGuard comes from Pål Baltzersen and Lars Erik Håland at Tele Danmark InterNordia. It is released under the GNU GPL. The authors also make sure squidGuard compiles easily on modern Unix systems. Their site contains a lot of good documentation. Here are some of squidGuard’s features:

  • Highly configurable; you can apply different rules to different groups of clients or users and at different times or days

  • URI substitution, not just replacement, à la sed

  • printf-like substitutions allow passing parameters to CGI scripts for customized messages

  • Supportive of the 301/302/303/307 HTTP redirect status code feature for redirectors

  • Selective logging for rewrite rule sets

At the squidGuard site, you can also find a blacklist of more than 100,000 sites categorized as porn, aggressive, drugs, hacking, ads, and more.

AdZapper

http://www.adzapper.sourceforge.net

AdZapper is a popular redirector because it specifically targets removal of advertisements from HTML pages. It is a Perl script written by Cameron Simpson. AdZapper can block banners (images), pop-up windows, flash animations, page counters, and web bugs. The script includes a list of regular expressions that match URIs known to contain ads, pop-ups, etc. Cameron updates the script periodically with new patterns. You can also maintain your own list of patterns.

Exercises

  • Write a redirector that never changes the requested URI and configure Squid to use it.

  • While running tail -f cache.log, kill Squid’s redirector processes one by one until something interesting happens.

  • Download and install one of the redirectors mentioned in the previous section.



[1] This message appears only in cache.log, and not on stdout, if you use the -d option, or in syslog, if you use the -s option.