Table of Contents for
Squid: The Definitive Guide

Version ebook / Retour

Cover image for bash Cookbook, 2nd Edition Squid: The Definitive Guide by Duane Wessels Published by O'Reilly Media, Inc., 2004
  1. Cover
  2. Squid: The Definitive Guide
  3. Squid: The Definitive Guide
  4. Dedication
  5. Preface
  6. 1. Introduction
  7. 2. Getting Squid
  8. 3. Compiling and Installing
  9. 4. Configuration Guide for the Eager
  10. 5. Running Squid
  11. 6. All About Access Controls
  12. 7. Disk Cache Basics
  13. 8. Advanced Disk Cache Topics
  14. 9. Interception Caching
  15. 10. Talking to Other Squids
  16. 11. Redirectors
  17. 12. Authentication Helpers
  18. 13. Log Files
  19. 14. Monitoring Squid
  20. 15. Server Accelerator Mode
  21. 16. Debugging and Troubleshooting
  22. A. Config File Reference
  23. http_port
  24. https_port
  25. ssl_unclean_shutdown
  26. icp_port
  27. htcp_port
  28. mcast_groups
  29. udp_incoming_address
  30. udp_outgoing_address
  31. cache_peer
  32. cache_peer_domain
  33. neighbor_type_domain
  34. icp_query_timeout
  35. maximum_icp_query_timeout
  36. mcast_icp_query_timeout
  37. dead_peer_timeout
  38. hierarchy_stoplist
  39. no_cache
  40. cache_access_log
  41. cache_log
  42. cache_store_log
  43. cache_swap_log
  44. emulate_httpd_log
  45. log_ip_on_direct
  46. cache_dir
  47. cache_mem
  48. cache_swap_low
  49. cache_swap_high
  50. maximum_object_size
  51. minimum_object_size
  52. maximum_object_size_in_memory
  53. cache_replacement_policy
  54. memory_replacement_policy
  55. store_dir_select_algorithm
  56. mime_table
  57. ipcache_size
  58. ipcache_low
  59. ipcache_high
  60. fqdncache_size
  61. log_mime_hdrs
  62. useragent_log
  63. referer_log
  64. pid_filename
  65. debug_options
  66. log_fqdn
  67. client_netmask
  68. ftp_user
  69. ftp_list_width
  70. ftp_passive
  71. ftp_sanitycheck
  72. cache_dns_program
  73. dns_children
  74. dns_retransmit_interval
  75. dns_timeout
  76. dns_defnames
  77. dns_nameservers
  78. hosts_file
  79. diskd_program
  80. unlinkd_program
  81. pinger_program
  82. redirect_program
  83. redirect_children
  84. redirect_rewrites_host_header
  85. redirector_access
  86. redirector_bypass
  87. auth_param
  88. authenticate_ttl
  89. authenticate_cache_garbage_interval
  90. authenticate_ip_ttl
  91. external_acl_type
  92. wais_relay_host
  93. wais_relay_port
  94. request_header_max_size
  95. request_body_max_size
  96. refresh_pattern
  97. quick_abort_min
  98. quick_abort_max
  99. quick_abort_pct
  100. negative_ttl
  101. positive_dns_ttl
  102. negative_dns_ttl
  103. range_offset_limit
  104. connect_timeout
  105. peer_connect_timeout
  106. read_timeout
  107. request_timeout
  108. persistent_request_timeout
  109. client_lifetime
  110. half_closed_clients
  111. pconn_timeout
  112. ident_timeout
  113. shutdown_lifetime
  114. acl
  115. http_access
  116. http_reply_access
  117. icp_access
  118. miss_access
  119. cache_peer_access
  120. ident_lookup_access
  121. tcp_outgoing_tos
  122. tcp_outgoing_address
  123. reply_body_max_size
  124. cache_mgr
  125. cache_effective_user
  126. cache_effective_group
  127. visible_hostname
  128. unique_hostname
  129. hostname_aliases
  130. announce_period
  131. announce_host
  132. announce_file
  133. announce_port
  134. httpd_accel_host
  135. httpd_accel_port
  136. httpd_accel_single_host
  137. httpd_accel_with_proxy
  138. httpd_accel_uses_host_header
  139. dns_testnames
  140. logfile_rotate
  141. append_domain
  142. tcp_recv_bufsize
  143. err_html_text
  144. deny_info
  145. memory_pools
  146. memory_pools_limit
  147. forwarded_for
  148. log_icp_queries
  149. icp_hit_stale
  150. minimum_direct_hops
  151. minimum_direct_rtt
  152. cachemgr_passwd
  153. store_avg_object_size
  154. store_objects_per_bucket
  155. client_db
  156. netdb_low
  157. netdb_high
  158. netdb_ping_period
  159. query_icmp
  160. test_reachability
  161. buffered_logs
  162. reload_into_ims
  163. always_direct
  164. never_direct
  165. header_access
  166. header_replace
  167. icon_directory
  168. error_directory
  169. maximum_single_addr_tries
  170. snmp_port
  171. snmp_access
  172. snmp_incoming_address
  173. snmp_outgoing_address
  174. as_whois_server
  175. wccp_router
  176. wccp_version
  177. wccp_incoming_address
  178. wccp_outgoing_address
  179. delay_pools
  180. delay_class
  181. delay_access
  182. delay_parameters
  183. delay_initial_bucket_level
  184. incoming_icp_average
  185. incoming_http_average
  186. incoming_dns_average
  187. min_icp_poll_cnt
  188. min_dns_poll_cnt
  189. min_http_poll_cnt
  190. max_open_disk_fds
  191. offline_mode
  192. uri_whitespace
  193. broken_posts
  194. mcast_miss_addr
  195. mcast_miss_ttl
  196. mcast_miss_port
  197. mcast_miss_encode_key
  198. nonhierarchical_direct
  199. prefer_direct
  200. strip_query_terms
  201. coredump_dir
  202. ignore_unknown_nameservers
  203. digest_generation
  204. digest_bits_per_entry
  205. digest_rebuild_period
  206. digest_rewrite_period
  207. digest_swapout_chunk_size
  208. digest_rebuild_chunk_percentage
  209. chroot
  210. client_persistent_connections
  211. server_persistent_connections
  212. pipeline_prefetch
  213. extension_methods
  214. request_entities
  215. high_response_time_warning
  216. high_page_fault_warning
  217. high_memory_warning
  218. ie_refresh
  219. vary_ignore_expire
  220. sleep_after_fork
  221. B. The Memory Cache
  222. C. Delay Pools
  223. D. Filesystem Performance Benchmarks
  224. E. Squid on Windows
  225. F. Configuring Squid Clients
  226. About the Author
  227. Colophon
  228. Copyright

Chapter 15. Server Accelerator Mode

Throughout most of this book, I’ve been talking about Squid as a client-side caching proxy. However, with just a few special squid.conf settings, Squid is able to function as an origin server accelerator as well. In this mode, it accepts normal HTTP requests and forwards cache misses to the real origin server (or backend server). In the parlance of RFC 3040, Squid is operating as a surrogate. This configuration is similar to what I talked about in Chapter 9. The primary difference is that, as a surrogate, Squid accepts requests for one, or maybe a few, origin server(s), rather than any and all origins. HTTP interception isn’t required for server acceleration.

As the name implies, server acceleration is generally used as a technique to improve the performance of slow, or heavily loaded, backend servers. It works well because origin servers tend to have a relatively small hot set. Most likely, the objects responsible for 90% of origin server traffic can fit entirely in memory. Depending on your particular backend server software and configuration, Squid may be able to serve requests much faster.

Security is another good reason to consider Squid as a surrogate. Think of Squid as a dedicated firewall in front of your origin server. The Squid source code is too large to be trusted as completely secure. However, you may sleep better with Squid protecting your backend server. It is simply a cache, so it doesn’t permanently store the source of your data. If the Squid box is attacked or compromised, you won’t lose any data. You may find it easier to secure a system running Squid than the system running your backend server application(s).

You might also be interested in server acceleration to implement load balancing. If your origin server runs on expensive boxes, you can save money by deploying Squid on a number of cheaper boxes. By placing Squid at a number of different locations, you can even build your own content delivery network (CDN).

Overview

Assuming that you already have an origin server in place, you need to move it to a different IP address or TCP port. For example, you can (1) install Squid on a separate machine, (2) give the origin server a new IP address, and (3) give Squid the origin server’s old IP address. In the interest of security, you can use non-globally routable addresses (i.e., from RFC 1918) on the link between Squid and the backend server. See Figure 15-1.

How to replace your origin server with Squid
Figure 15-1. How to replace your origin server with Squid

Another option is to configure Squid for HTTP interception, as described in Chapter 9. For example, you can configure the origin server’s nearest router or switch to intercept HTTP requests and divert them to Squid.

If you don’t have the resources to put Squid on a dedicated system, you can run it alongside the HTTP server. However, both applications can’t share the same IP address and port number. You need to make the backend server bind to a different address (e.g., 127.0.0.1) or move it to another port number. It might seem easiest to change the port number, but I recommend changing the IP address instead.

Changing the port number can be problematic. For example, when the backend server generates an error message, it may expose the “wrong” port. Even worse, if the server generates an HTTP redirect, it typically appends the nonstandard port number to the Location URI:

HTTP/1.1 301 Moved Permanently
Date: Mon, 29 Sep 2003 03:36:13 GMT
Server: Apache/1.3.26 (Unix)
Location: http://www.squid-cache.org:81/Doc/

If a client receives this response, it makes a connection to the nonstandard port (81), thus bypassing the server accelerator. If you must run Squid on the same host as your backend server, it is better to tell the backend server to listen on the loopback address (127.0.0.1). With Apache, you’d do it like this:

BindAddress 127.0.0.1
ServerName www.squid-cache.org

Once you’ve decided how to relocate your origin server, the next step is to configure Squid.

Configuring Squid

Technically, a single configuration file directive is all it takes to change Squid from a caching proxy into a surrogate. Unfortunately, life is never quite that simple. Due to the myriad of ways that different organizations design their web services, Squid has a number of directives to worry about.

http_port

Most likely, Squid is acting as a surrogate for your HTTP server on port 80. Use the http_port directive to make Squid listen on that port:

http_port 80

If you want Squid to act as surrogate and a caching proxy at the same time, list both port numbers:

http_port 80
http_port 3128

You can configure your clients to send their proxy requests to port 80 as well, but I strongly discourage that. By using separate ports, you’ll find it easier to migrate the two services to separate boxes later if it becomes necessary.

https_port

You can configure Squid to terminate encrypted HTTP (SSL and TLS) connections. This feature requires the —enable-ssl option when running ./configure. In this mode, Squid decrypts SSL/TLS connections from clients and forwards unencrypted requests to your backend server. The https_port directive has the following format:

https_port [host:]port cert=certificate.pem [key=key.pem] [version=1-4]
           [cipher=list] [options=list]

The cert and key arguments are pathnames to OpenSSL-compatible certificate and private key files. If you omit the key argument, the OpenSSL library looks for the private key in the certificate file.

The (optional) version argument specifies your requirements for various SSL and TLS protocols to support: 1=automatic, 2=SSLv2 only, 3=SSLv3 only, 4=TLSv1 only.

The (optional) cipher argument is a colon-separated list of ciphers. Squid simply passes it to the SSL_CTX_set_cipher_list() function. For more information, read the ciphers(1) manpage on your system or try running: openssl ciphers.

The (optional) options argument is a colon-separated list of OpenSSL options. Squid simply passes these to the SSL_CTX_set_options() function. For more information, read the SSL_CTX_set_options(3) manpage on your system.

Here are a few example https_port lines:

https_port 443 cert=/usr/local/etc/certs/squid.cert
https_port 443 cert=/usr/local/etc/certs/squid.cert version=2
https_port 443 cert=/usr/local/etc/certs/squid.cert cipher=SHA1
https_port 443 cert=/usr/local/etc/certs/squid.cert options=MICROSOFT_SESS_ID_BUG

httpd_accel_host

This is where you tell Squid the IP address, or hostname, of the backend server. If you use the loopback trick described previously, you write:

httpd_accel_host 127.0.0.1

Squid then prepends this value to partial URIs that get accelerated. It also changes the value of the Host header.[1] For example, if the client makes this request:

GET /index.html HTTP/1.1
Host: squidbook.org

Squid turns it into this request:

GET http://127.0.0.1/index.html HTTP/1.1
Host: 127.0.0.1

As you can see, the request no longer contains any information that indicates the request is for http://squidbook.org. This shouldn’t be a problem as long as the backend server isn’t configured for virtual hosting of multiple domains.

If you want Squid to use the origin server’s hostname, you can put it in the httpd_accel_host directive:

httpd_accel_host squidbook.org

Then the request is as follows:

GET http://squidbook.org/index.html HTTP/1.1
Host: squidbook.org

Another option is to enable the httpd_accel_uses_host_header directive. Squid then inserts the Host header value into the URI for most requests, and the httpd_accel_host value is used only for requests that lack a Host header.

When you use a hostname, Squid goes through the normal steps to look up its IP address. Because you want the hostname to resolve to two different addresses (one for clients connecting to Squid and another for Squid connecting to the backend server), you should also add a static DNS entry to your system’s /etc/hosts file. For example:

127.0.0.1       squidbook.org

You might want to use a redirector instead. For example, you can write a simple Perl program that changes http://squidbook.org/... to http://127.0.0.1/.... See Chapter 11 for the nuts and bolts of redirecting client requests.

The httpd_accel_host directive has a special value. If you set it to virtual, Squid inserts the origin server’s IP address into the URI when the Host header is missing. This feature is useful only when using HTTP interception, however.

httpd_accel_port

This directive tells Squid the port number of the backend server. It is 80 by default. You won’t need to change this unless the backend server is running on a different port. Here is an example:

httpd_accel_port 8080

If you are accelerating origin servers on multiple ports, you can use the value 0. In this case, Squid takes the port number from the Host header.

httpd_accel_uses_host_header

This directive controls how Squid determines the hostname it inserts into accelerated URIs. If enabled, the request’s Host header value takes precedence over httpd_accel_host.

The httpd_accel_uses_host_header directive goes hand in hand with virtual domain hosting on the backend server. You can leave it disabled if the backend server is handling only one domain. If, on the other hand, you are accelerating multiple origin server names, turn it on:

httpd_accel_uses_host_header on

If you enable httpd_accel_uses_host_header, be sure to install some access controls as described later in this chapter. To understand why, consider this configuration:

httpd_accel_host does.not.exist
httpd_accel_uses_host_header on

Because most requests have a Host header, Squid ignores the httpd_accel_host setting and rarely inserts the bogus http://does.not.exist name into URIs. This essentially turns your surrogate into a caching proxy for anyone smart enough to fake an HTTP request. If I know that you are using Squid as a surrogate without proper access controls, I can send a request like this:

GET /index.html HTTP/1.1
Host: www.mrcranky.com

If you’ve enabled httpd_accel_uses_host_header and don’t have any destination-based access controls, Squid should forward my request to http://www.mrcranky.com. Read Section 15.4 and install access controls to ensure that Squid doesn’t talk to foreign origin servers.

httpd_accel_single_host

Whereas the httpd_accel_uses_host_header directive determines the hostname Squid puts into a URI, this one determines where Squid forwards its cache misses. By default (i.e., with httpd_accel_single_host disabled), Squid forwards surrogate cache misses to the host in the URI. If the URI contains a hostname, Squid performs a DNS lookup to get the backend server’s IP address.

When you enable httpd_accel_single_host, Squid always forwards surrogate cache misses to the host defined by httpd_accel_host. In other words, the contents of the URI and the Host header don’t affect the forwarding decision. Perhaps the best reason to enable this directive is to avoid DNS lookups. Simply set httpd_accel_host to the backend server’s IP address. Another reason to enable it is if you have another device (load balancer, virus scanner, etc.) between Squid and the backend server. You can make Squid forward the request to this other device without changing any aspect of the HTTP request.

Note that enabling both httpd_accel_single_host and httpd_accel_uses_host_header is a dangerous combination that might allow an attacker to poison your cache. Consider this configuration:

httpd_accel_single_host on
httpd_accel_host 172.16.1.1
httpd_accel_uses_host_header on

and this HTTP request:

GET /index.html HTTP/1.0
Host: www.othersite.com

Squid forwards the request to your backend server at 172.16.1.1 but stores the response under the URI http://www.othersite.com/index.html. Since 172.16.1.1 isn’t actually www.othersite.com, Squid now contains a bogus response for that URI. If you enable httpd_accel_with_proxy (next section) or your cache participates in a hierarchy or mesh, it may give out the bad response to unsuspecting users. To prevent such abuse, be sure to read Section 15.4.

Server-side persistent connections may not work if you use the httpd_accel_single_host directive. This is because Squid saves idle connections under the origin server hostname, but the connection-establishment code looks for an idle connection named by the httpd_accel_host value. If the two values are different, Squid fails to locate an appropriate idle connection. The idle connections are closed after the timeout, without being reused. You can avoid this little problem by disabling server-side persistent connections with the server_persistent_connections directive (see Appendix A).

httpd_accel_with_proxy

By default, whenever you enable the httpd_accel_host directive, Squid goes into strict surrogate mode. That is, it refuses proxy HTTP requests and accepts only surrogate requests, as though it were truly an origin server. Squid also disables the ICP port (although not HTCP, if you have it enabled). If you want Squid to accept both surrogate and proxy requests, enable this directive:

httpd_accel_with_proxy on

Gee, That Was Confusing!

Yeah, it was for me too. Let’s look at it another way. The settings that you need to use depend on how many backend boxes you have and how many origin server names you are accelerating. Let’s consider the four separate cases in the following sections.

One Box, One Server Name

This is the simplest sort of configuration. Because you have only one box and one hostname, the Host header values don’t matter much. You should probably use:

httpd_accel_host www.example.com
httpd_accel_single_host on
httpd_accel_uses_host_header off

If you like, you can use an IP address for httpd_accel_host, although it will appear in URIs in your access.log.

One Box, Many Server Names

Because you have many origin server names being virtually hosted on a single box, the Host header becomes important. We want Squid to insert it into the URIs it generates from partial requests. Your configuration should be:

httpd_accel_host www.example.com
httpd_accel_single_host on
httpd_accel_uses_host_header on

In this case, Squid generates the URI based on the Host header. If absent, Squid inserts www.example.com. You can disable httpd_accel_single_host if you prefer. As before, you can use an IP address in httpd_accel_host to avoid DNS lookups.

Many Boxes, One Server Name

This sounds like a load-balancing configuration. One way to accomplish it is to create a DNS name for the backend servers with multiple IP addresses. Squid iterates between all addresses (a.k.a. round-robin) for each cache miss. In this situation, the configuration is the same as for the one box/one name case:

httpd_accel_host roundrobin.example.com
httpd_accel_single_host on
httpd_accel_uses_host_header off

The only difference is that the httpd_accel_host name resolves to multiple addresses. It might look like this in a Berkeley Internet Name Daemon (BIND) zone file:

$ORIGIN example.com.
roundrobin      IN      A      192.168.1.2
                IN      A      192.168.1.3
                IN      A      192.168.1.4

With this DNS configuration, Squid uses the next address in the list each time it opens a new connection to roundrobin.example.com. When it gets to the end of the list, it starts over at the top. Note that Squid caches these DNS answers internally according to their TTLs. You aren’t relying on the name server to return the address list in a different order for each DNS query.

Another option is to use a redirector (see Chapter 11) to select the backend server. You can write a simple script to replace the URI hostname (e.g., roundrobin.example.com) with a different hostname or an IP address. You might even make the redirector smart enough to make its selection based on the current state of the backend servers. Use the following configuration with this approach:

httpd_accel_host roundrobin.example.com
httpd_accel_single_host off
httpd_accel_uses_host_header off

Many Boxes, Many Server Names

In this case, you want to use the Host header. You also want Squid to select the backend server based on the origin server’s name (i.e., a DNS lookup). The configuration is as follows:

httpd_accel_host www.example.com
httpd_accel_single_host off
httpd_accel_uses_host_header on

You might be tempted to set httpd_accel_host to virtual. However, that would be a mistake unless you are using HTTP interception.

Access Controls

A typically configured surrogate accepts HTTP requests from the whole Internet. This doesn’t mean, however, that you can forget about access controls. In particular, you’ll want to make sure Squid doesn’t accept requests belonging to foreign origin servers. The exception is when you have httpd_accel_with_proxy enabled.

For a surrogate-only configuration, use one of the destination-based access controls. For example, the dst type accomplishes the task:

acl All src 0/0
acl TheOriginServer dst 192.168.3.2
http_access allow TheOriginServer
http_access deny All

Alternatively, you can use a dstdomain ACL if you prefer:

acl All src 0/0
acl TheOriginServer dstdomain www.squidbook.org
http_access allow TheOriginServer
http_access deny All

Note that enabling httpd_accel_single_host somewhat bypasses the access control rules. This is because the origin server location (i.e., the httpd_accel_host value) is then set after Squid performs the access control checks.

Access controls become really tricky when you combine surrogate and proxy modes in a single instance of Squid. You can no longer simply deny all requests to foreign origin servers. You can, however, make sure that outsiders aren’t allowed to make proxy requests to random origin servers. For example:

acl All src 0/0
acl ProxyUsers src 10.2.0.0/16
acl TheOriginServer dst 192.168.3.2
http_access allow ProxyUsers
http_access allow TheOriginServer
http_access deny All

You can also use the local port number in your access control rules. It doesn’t really protect you from malicious activity, but does ensure, for example, that user-agents send their proxy requests to the appropriate port. This also makes it easier for you to split the service into separate proxy- and surrogate-only systems later. Assuming you configure Squid to listen on ports 80 and 3128, you might use:

acl All src 0/0
acl ProxyPort myport 3128
acl ProxyUsers src 10.2.0.0/16
acl SurrogatePort myport 80
acl TheOriginServer dst 192.168.3.2
http_access allow ProxyUsers ProxyPort
http_access allow TheOriginServer SurrogatePort
http_access deny All

Unfortunately, these access control rules don’t prevent attempts to poison your cache when you enable httpd_accel_single_host, httpd_accel_uses_host_header, and httpd_accel_with_proxy simultaneously. This is because the valid proxy request:

GET http://www.bad.site/ HTTP/1.1
Host: www.bad.site

and the bogus surrogate request:

GET / HTTP/1.1
Host: www.bad.site

have the same access control result but are forwarded to different servers. They have the same access control result because, after Squid rewrites the surrogate request, it has the same URI as the proxy request. However, they don’t get sent to the same place. The surrogate request goes to the server defined by httpd_accel_host because httpd_accel_single_host is enabled.

You can take steps towards solving this problem. Make sure your backend server generates an error for unknown server names (e.g., when the Host header refers to a nonlocal server). Better yet, don’t run Squid as a surrogate and proxy at the same time.

Content Negotiation

Recent versions of Squid support the HTTP/1.1 Vary header. This is good news if your backend server uses content negotiation. It might, for example, send different responses depending on which web browser makes the request (e.g., the User-Agent header), or based on the user’s language preferences (e.g., the Accept-Language header).

When the response for a URI varies on some aspect of the request, the origin (backend) server includes a Vary header. This header contains the list of request headers used to select the variant. These are the selecting headers. When Squid receives a response with a Vary header, it includes the selecting header values when it generates the internal cache key. Thus, a subsequent request with the same values for the selecting headers may generate a cache hit.

If you use the —enable-x-accelerator-vary option when running ./configure, Squid looks for a response header named X-Accelerator-Vary. Squid treats this header exactly like the Vary header. Because this is an extension header, however, it is ignored by downstream agents. It essentially provides a means for private content negotiation between Squid and your backend server. In order to use it you must also modify your server application to send the header in its responses. I don’t know of any situation in which this header would be useful. If you serve negotiated responses, you probably want to use the standard Vary header so that all agents know what’s going on.

Gotchas

Using Squid as a surrogate may improve your origin server’s security and performance. However, there are some potentially negative side effects as well. Here are a few things to keep in mind.

Logging

When using a surrogate, the origin server’s access log contains only the cache misses from Squid. Furthermore, those log-file entries have Squid’s IP address, rather than the client’s. In other words, Squid’s access.log is where all the good information is now stored.

Recall that, by default, Squid doesn’t use the common log-file format. You should use the emulate_httpd_log directive to make Squid’s access.log look just like Apache’s default log-file format.

Ignoring Reloads

The Reload button found on most browsers generates HTTP requests with the Cache-Control: no-cache directive set. While this is usually desirable for client-side caching proxies, it may ruin the performance of a surrogate. This is especially true if the backend server is heavily loaded. A reload request forces Squid to purge the currently cached response while retrieving the new response from the origin server. If those origin server responses arrive slowly, Squid consumes a larger than normal number of file descriptors and network resources.

To help in this situation, you may want to use one of the refresh_pattern options. When the ignore-reload option is set, Squid pretends that the request doesn’t contain the no-cache directive. The ignore-reload option is generally safe for surrogates, although it does, technically, violate the HTTP protocol.

To make Squid ignore reloads for all requests, use a line like this in squid.conf:

refresh_pattern . 0 20% 4320 ignore-reload

For a somewhat safer alternative, you can use the reload-into-ims option. It causes Squid to validate its cached response when the request contains no-cache. Note, however, that this works only for responses that have cache validators (such as Last-Modified timestamps).

Uncachable Content

As a surrogate, Squid obeys the standard HTTP headers for caching responses from your backend server. This means, for example, that certain dynamic responses might not be cached. You might want to use the refresh_pattern directive to force caching of these objects. For example:

refresh_pattern \.dhtml$ 60 80% 180

This trick only works for certain types of responses, namely, those without a Last-Modified or Expires header. By default, Squid doesn’t cache such responses. However, using a nonzero minimum time in a refresh_pattern rule instructs Squid to cache the response, and serve it as a cache hit for that amount of time anyway. See Section 7.7 for the details.

If your backend server generates other types of uncachable responses, you may not be able to trick Squid into storing them.

Errors

With Squid as a surrogate in front of your origin server, you should be aware that visitors to your site may see an error message from Squid, rather than the origin server itself. In other words, your use of Squid may be “exposed” through certain error messages. For example, Squid returns its own error message when it fails to parse the client’s HTTP request, which could happen if the request is incomplete or is malformed in some way. Squid also returns an error message if it can’t connect to the backend server for some reason.

If your site is consistent and functioning properly, you probably don’t need to worry about Squid’s error messages. Nonetheless, you may want to take a close look at the access.log from time to time and see what sort of errors, if any, your users might be seeing.

Purging Objects

You may find the PURGE method particularly useful when operating a surrogate. Because you have a good understanding of the content being served, you are more likely to know when a cached object must be purged. The technique for purging an object is the same as I mentioned previously. See Section 7.6 for a refresher.

Neighbors

Although I don’t recommend it, you can configure Squid as a surrogate and as part of a mesh or hierarchy. If you choose to take on such an arrangement, note that, by default, Squid forwards cache misses to parents (rather than the backend server). Assuming that isn’t what you really want, be sure to use the cache_peer_access directives so that requests for your backend server don’t go to your neighbors instead.

Exercises

  • Install and configure Squid as a surrogate on the same system where you run an HTTP server.

  • Make a few test requests with squidclient. Pay particular attention to the reply headers and notice how the requests appear in both access logs.

  • Try to poison your own surrogate with fake HTTP requests. It is probably easier with httpd_accel_single_host enabled.

  • Estimate the size of your origin server’s document set. What percentage of the data can fit into 1 GB of memory or disk space?



[1] Technically, the Host header is changed only in requests Squid forwards to the backend server (cache misses).