Log files are the primary sources of persistent information about Squid’s operation. In other words, they provide a record of what Squid has been doing. This includes URIs requested by users, objects that have been saved to disk, and various warnings and errors. When Squid appears to be malfunctioning, you’ll want to check the log files first. By the end of this chapter, you’ll know how to interpret and manage all of Squid’s various log files.
Depending on your configuration, Squid maintains, at most, seven log files. The three primary files are: cache.log, access.log, and store.log. Two optional log files, useragent.log and referer.log, are similar to access.log but contain additional information. I’ll also talk about the swap.state and netdb_state files. These are databases, used by Squid when it restarts.
Note that the filenames, such as access.log, are the default values. You can change most of the log file names with various squid.conf directives.
The following list contains a brief description of each log file:
This log file contains human-oriented, informational messages about Squid’s operation. The filename is defined by the cache_log directive. Under normal conditions, the file grows by about 10-100 KB per day.
This log file contains an entry for every HTTP and (optionally) ICP transaction made by Squid’s clients. The filename is defined by the cache_access_log directive. It grows at a rate of 100-200 bytes per transaction.
This log file contains low-level information about objects that enter and leave the cache. The filename is defined by the cache_store_log directive. It grows at a rate of about 150 bytes per transaction.
This optional log file contains HTTP Referer [1] headers for each client request. You must enable
referer logging with the —enable-referer-log option
when running ./configure. The
filename is defined by the referer_log
directive. It grows at a rate of about 80 bytes per
transaction.
This optional log file contains HTTP User-Agent headers for each client
request. You must enable user-agent logging with the
—enable-useragent-log option when running ./configure. The filename is defined by
the useragent_log directive. It grows at a rate
of about 75 bytes per transaction.
These files contain internal metadata about the objects stored on disk. Squid uses them to reconstruct the cache upon startup. By default, they are located in the cache_dir directories. However, you can change the location with the cache_swap_log directive. They grow at a rate of 100 bytes per cache miss.
This file holds the contents of the Network Measurement Database (see Section 10.5). It is always located in the first cache_dir directory. Its size is determined by the netdb_high value.
If Squid receives an error while writing a log file, it doesn’t silently continue. Instead, it exits with a fatal error message to get your attention. Make sure that you periodically rotate the log files, as described in Section 13.7, to reduce the possibility of filling your disks. For the same reason, I also recommend placing your log files on a partition separate from any of your cache directories.
cache.log contains various messages such as information about Squid’s configuration, warnings about possible performance problems, and serious errors. Here is some sample cache.log output:
2003/09/29 12:09:45| Starting Squid Cache version 2.5.STABLE4 for i386- unknown-freebsd4.8... 2003/09/29 12:09:45| Process ID 18990 2003/09/29 12:09:45| With 1064 file descriptors available 2003/09/29 12:09:45| Performing DNS Tests... 2003/09/29 12:09:45| Successful DNS name lookup tests... 2003/09/29 12:09:45| DNS Socket created at 0.0.0.0, port 1154, FD 5 2003/09/29 12:09:45| Adding nameserver 24.221.192.5 from /etc/resolv.conf 2003/09/29 12:09:45| Adding nameserver 24.221.208.5 from /etc/resolv.conf 2003/09/29 12:09:45| helperOpenServers: Starting 5 'redirector.pl' processes 2003/09/29 12:09:45| Unlinkd pipe opened on FD 15 2003/09/29 12:09:45| Swap maxSize 10240 KB, estimated 787 objects 2003/09/29 12:09:45| Target number of buckets: 39 2003/09/29 12:09:45| Using 8192 Store buckets 2003/09/29 12:09:45| Max Mem size: 8192 KB 2003/09/29 12:09:45| Max Swap size: 10240 KB 2003/09/29 12:09:45| Rebuilding storage in /usr/local/squid/var/cache (CLEAN) 2003/09/29 12:09:45| Using Least Load store dir selection 2003/09/29 12:09:45| Set Current Directory to /usr/local/squid/var/cache 2003/09/29 12:09:45| Loaded Icons. 2003/09/29 12:09:45| Accepting HTTP connections at 0.0.0.0, port 3128, FD 16. 2003/09/29 12:09:45| Accepting ICP messages at 0.0.0.0, port 3130, FD 17. 2003/09/29 12:09:45| WCCP Disabled. 2003/09/29 12:09:45| Ready to serve requests.
Each cache.log entry starts
with a timestamp showing when the message was generated. The very first
entry in this sample reports the Squid version (2.5.STABLE4) and a string identifying the
operating system for which Squid was configured (i386-unknown-freebsd4.8). The process ID
(18990) follows. Many cache.log entries may look cryptic (Target number of buckets: 39). In most cases,
under normal conditions, you can ignore entries you don’t understand. On
the other hand, you may want to look over essential configuration
details such as name-server addresses or HTTP server address. This
sample output ends with a statement that Squid is ready to serve
requests. At this point, Squid can accept HTTP connections from
clients.
Usually, the cache.log file grows slowly. However, an unusual HTTP transaction or similar event may cause Squid to emit a debugging message. If such an event happens often (e.g., a DoS attack, a new virus, or sudden disk failure), the log file may grow quickly. Rotating log files reduces the chance that you’ll run out of disk space.
Major errors and abnormal conditions are likely to be reported in cache.log. I recommend archiving these logs so that it is possible to go back and find the first occurrence of an unusual event. When describing a particular Squid problem on the mailing list or a similar forum, the relevant lines from cache.log may be very useful. You may also want to increase debugging levels for some sections so that others can better understand and fix your problem.
The debug_options directive controls the level of detail for cache.log messages. The default value
(ALL,1) is usually the best choice.
At higher levels, the unimportant messages make it harder to find the
important ones. Refer to Section
16.2 for a thorough description of the
debug_options directive.
Note that debugging at the highest levels (9 or 10) may add thousands of lines for each request, quickly consuming disk space and significantly degrading Squid’s performance.
You can use Squid’s -X command-line option to
enable full debugging for all sections. This mode is particularly
useful if Squid refuses to start, and the debugging levels in
squid.conf are insufficient to diagnose the
problem. This is also a good way to enable full debugging of the
configuration file parser, before it gets to the
debug_options directive. You should never use the
-X when Squid is operating properly.
You can use Squid’s -k debug command-line
option to enable full debugging immediately on a running Squid
process. This command operates as a toggle: the first invocation turns
on full debugging, and the second invocation turns it off. See Chapter 5 for a general discussion about
the -k option.
As I already mentioned, full debugging generates an overwhelming amount of data. This can make Squid, and the operating system, very slow. In extreme cases, you may find your terminal session becomes unresponsive after executing the first squid -k debug command. Locking yourself out while Squid is spitting megabytes of cache.log entries per second is an unpleasant experience. I find the following trick useful to get a compact, five-second debugging snapshot with less risk:
% squid -k debug; sleep 5; squid -k debug
To have Squid send copies of cache.log
messages to the system log, use the -s command-line
option. Only messages with debugging levels 0 and 1 are forwarded.
Level 0 messages are logged with syslog level LOG_WARNING. Level 1 messages use syslog
level LOG_NOTICE. All messages use
the LOCAL4 syslog facility. Here is
one way to configure syslogd so
that these messages are saved:
local4.warning /var/log/squid.log
Using syslog in addition to cache.log is especially handy when you maintain several Squid boxes. You can configure each local syslog daemon to forward these messages to a central host and enjoy a unified view of all caches in one location. For example, you might use this entry in /etc/syslogd.conf:
local4.notice @192.168.45.1
The -d level command-line option instructs Squid to dump cache.log messages to your terminal (i.e.,
stderr). The level argument
specifies the maximum level for messages that are dumped. Note that
you’ll see only messages that would appear in cache.log, subject to the
debug_options setting. For example, if you have
debug_options ALL,1, and run squid -d2, you won’t see any level 2
debugging messages.
The -d level and -N options
are most useful for debugging Squid problems or quickly testing a
change to the configuration file. They allow you to start Squid easily
and see the cache.log messages.
This option may also be useful when Squid starts from cron or a similar facility that
automatically captures a program’s standard error output and reports
it back to the user. For example, you may have a cron job that automatically reconfigures the
running
Squid process:
15 */4 * * * /usr/local/squid/sbin/squid -d1 -k reconfigure
Squid saves key information about HTTP transactions in access.log. This file is line-based, such that each line corresponds to one client request. Squid records the client IP address (or hostname), requested URI, response size, and other information.
Squid records all HTTP accesses in access.log, except for those that disconnect before sending any data. Squid also records all ICP (but not HTCP) transactions unless you disable them with the log_icp_queries directive. Section 13.2.4 describes the other squid.conf directives that affect the access log.
The default access.log format contains 10 fields. Here are some examples, with long lines split and indented:
1066037222.011 126389 9.121.105.207 TCP_MISS/503 1055
GET http://home.gigigaga.com/n8342133/Miho.DAT.019 -
DIRECT/203.187.1.180 -
1066037222.011 19120 12.83.179.11 TCP_MISS/200 359
GET http://ads.x10.com/720x300/Z2FtZ3JlZXRpbmcxLmRhd/7/AMG -
DIRECT/63.211.210.20 text/html
1066037222.011 34173 166.181.33.71 TCP_MISS/200 559
GET http://coursesites.blackboard.com:8081/service/collab/../1010706448190/ -
DIRECT/216.200.107.101 application/octet-stream
1066037222.011 19287 41.51.105.27 TCP_REFRESH_MISS/200 500
GET http://fn.yam.com/include/tsemark/show.js -
DIRECT/210.59.224.59 application/x-javascript
1066037222.011 19395 41.51.105.27 TCP_MISS/304 274
GET http://fnasp.yam.com/image/coin3.gif -
DIRECT/211.72.254.133 -
1066037222.011 19074 30.208.85.76 TCP_CLIENT_REFRESH_MISS/304 197
GET http://ads.icq.com/content/B0/0/..bC6GygEYNeHGjBUin5Azfe68m5hD1jLk$/aol -
DIRECT/64.12.184.121 -
1066037222.011 19048 12.83.179.11 TCP_MISS/200 261
GET http://ads.adsag.com/js.ng/...ne&cat=friendship&subcat=girltalk -
DIRECT/209.225.54.119 application/x-javascript
1066037222.118 106 41.51.105.27 TCP_HIT/200 536
GET http://rcm-images.amazon.com/images/G/01/rcm/privacy.gif -
NONE/- image/gif
1066037222.352 19475 27.34.49.248 TCP_MISS/200 12387
GET http://espanol.geocities.com/lebastias/divulgacion/budismo-tarot.html -
DIRECT/209.1.225.139 text/html
1066037222.352 132 144.157.100.17 TCP_MISS/504 1293
GET http://ar.atwola.com/image/93101912/aol -
NONE/- -Here are the definitions for all fields:
The completion time of the request, expressed as the number of seconds since the Unix epoch (Thu Jan 1 00:00:00 UTC 1970), with millisecond resolution. Squid uses this format, instead of something more human-friendly, to simplify the work of various log file processing programs.
You can use a simple Perl command to convert the Unix timestamps into local time. For example:
perl -pe 's/^\d+\.\d+/localtime($&)/e;' access.log
For HTTP transactions, this field indicates how much time it took to process the request. The timer starts when Squid receives the HTTP request and stops when the response has been fully delivered. The response time is given in milliseconds.
The response time is usually 0 for ICP queries. This is because Squid answers ICP queries very quickly. Furthermore, Squid doesn’t update the process clock between receiving an ICP query and sending the reply.
While time values are reported with millisecond resolution, the precision of those entries is probably about 10 milliseconds. Timing becomes even less precise when Squid is heavily loaded.
This field contains the client’s IP address, or hostname if you enable log_fqdn. For security or privacy reasons, you may want to mask a part of client’s address out using the client_netmask directive. However, that also makes it impossible to group requests coming from the same client.
This field consists of two tokens separated by a slash. The first token, result code,
classifies the protocol and the result of a transaction (e.g.,
TCP_HIT or UDP_DENIED). These are Squid-specific
codes, defined in Section
13.2.1. The codes that begin with TCP_ refer to HTTP requests, while
UDP_ refers to ICP
queries.
The second token is the HTTP response status code (e.g, 200, 304, 404, etc.). The status code normally comes from the origin server. In some cases, however, Squid may be responsible for selecting the status code. These codes, defined by the HTTP RFC, are summarized later in Table 13-1.
This field indicates the number of bytes transferred to the client. Strictly
speaking, it is the number of bytes that Squid told the TCP/IP
stack to send to the client. Thus, it doesn’t include overheads
from TCP/IP headers. Also note that the transfer size is normally
larger than the response’s Content-Length. This value includes the
HTTP response headers, while Content-Length does not.
These properties make the transfer size field useful for
approximate bandwidth usage analysis but not for exact HTTP entity
size calculations. If you need to know a response’s Content-Length, you can find it in the
store.log file.
This field contains the request method. Because Squid clients may use ICP or
HTTP, the request method is either HTTP- or ICP-specific. The most
common HTTP request method is GET. ICP queries are always logged with
ICP_QUERY. See Section 6.1.2.8 for a list
of HTTP methods Squid knows about.
This field contains the URI from the client’s request. The vast majority of logged URIs are actually URLs (i.e., they have hostnames).
Squid uses a special format for certain failures. These are cases when Squid can’t parse the HTTP request or otherwise determine the URI. Instead of a URI/URL, you’ll see a string such as “error:invalid-request.” For example:
1066036250.603 310 192.0.34.70 NONE/400 1203 GET error:invalid-request - NONE/- -
Also in this field look out for whitespace characters in the URI. Depending on your uri_whitespace setting, Squid may print the URI in the log file with whitespace characters. When this happens, the tools that read access.log files may become confused by the extra fields.
When logging, Squid strips all URI characters after the first question mark unless the strip_query_terms directive is disabled.
Squid can determine a user’s identity in two different ways. One is with the RFC 1413 ident protocol; the other is from HTTP authentication headers.
Squid attempts ident lookups based on the ident_lookup_access rules, if any (see Section 6.2). Alternatively, if you use proxy authentication (or regular server authentication in surrogate mode), Squid places the given username in this field. If both methods provide Squid with a username, and you’re using the native access.log format, the HTTP authentication name is logged, and the RFC 1413 name is ignored. The common log file format has separate fields for both names.
The peering information consists of two tokens, separated by a slash. It is relevant only for requests that are cache misses. The first token indicates how the next hop was chosen. The second token is the address of that next hop. The peering codes are listed in Section 13.2.3.
When Squid sends a request to a neighbor cache, the peerhost
address is the neighbor’s hostname. If the request is sent
directly to the origin server, however, Squid writes the origin
server’s IP address or its hostname if
log_ip_on_direct is disabled. The value
NONE/- indicates that Squid
didn’t forward this request to any other servers.
The final field of the default, native access.log is the content type of the
HTTP response. Squid obtains the content type value from the
response’s Content-Type header.
If that header is missing, Squid uses a hyphen (-).
If you enable the log_mime_headers directive, Squid appends two additional fields to each line:
Squid encodes the HTTP request headers and prints them between a pair of square brackets.
The brackets are necessary because Squid doesn’t encode space
characters. The encoding scheme is a little strange. Carriage
return (ASCII 13) and newline (ASCII 10) are printed as \r and \n, respectively. Other non-printable
characters are encoded with the RFC 1738 style, such that Tab
(ASCII 9) becomes %09.
Squid encodes the HTTP response headers and prints them between a pair of square brackets. Note that these are the headers sent to the client, which may be different from headers received from the origin server.
Squid writes to access.log only after the entire response has been sent to the client. This allows Squid to include both request and response information in the log file. However, transactions that take minutes, or even hours, to complete aren’t visible in access.log at the time of the request. When these types of transactions present a performance or policy concern, the access.log may be unable help you. Instead, use the cache manager to view a list of pending transactions (see Section 14.2.1.37).
The following labels may appear in the fourth field of the access.log file in response to HTTP requests:
TCP_HITSquid found a likely fresh copy of the requested resource and sent it immediately to the client.
TCP_MISSSquid didn’t have a cached copy of the requested resource.
TCP_REFRESH_HITSquid found a likely stale copy of the requested resource and sent a validation request to the origin server. The origin server sent a 304 (Not Modified) response, indicating that Squid’s copy is still fresh.
TCP_REF_FAIL_HITSquid found a likely stale copy of the requested resource and sent a validation request to the origin server. However, the origin server failed to respond or sent a response that Squid didn’t understand. In any case, Squid sent the cached (and likely stale) copy to the client.
TCP_REFRESH_MISSSquid found a likely stale copy of the requested resource and sent a validation request to the origin server. The server responded with new content, indicating the cached response was indeed stale.
TCP_CLIENT_REFRESH_MISSSquid found a copy of the requested resource, but the
client’s request included a Cache-Control: no-cache directive. Squid forwarded
the client’s request to the origin server, forcing a cache
validation.
TCP_IMS_HITThe client sent a validation request, and Squid found a more recent, and likely fresh, copy of the requested resource. Squid sent the newer content to the client, without contacting the origin server.
TCP_SWAPFAIL_MISSSquid found a valid copy of the requested resource but failed to load it from disk. Squid then sent the request to the origin server as though it were a cache miss.
TCP_NEGATIVE_HITWhen a request to an origin server results in an HTTP error, Squid may cache the response anyway. Repeated requests for these resources, within a short amount of time, result in negative hits. The negative_ttl directive controls the amount of time these errors may be cached. Also note that errors are cached only in memory and never written to disk. The following HTTP status codes may be negatively cached, subject to additional constraints: 204, 305, 400, 403, 404, 405, 414, 500, 501, 502, 503, 504.
TCP_MEM_HITSquid found a valid copy of the requested resource in the
memory cache and sent it immediately to the client. Note that
this doesn’t accurately represent all responses served from
memory. For example, responses that are cached in memory, but
require validation, are logged with TCP_REFRESH_HIT, TCP_REFRESH_MISS, etc.
TCP_DENIEDThe client’s request was denied, due to either the
http_access or
http_reply_access rules. Note that requests
denied by http_access have NONE/- in the ninth field, whereas
those denied by http_reply_access have a
valid entry.
TCP_OFFLINE_HITWhen offline_mode is enabled, Squid returns cache hits for almost any cached response, without considering its freshness.
TCP_REDIRECTA redirector program told Squid to generate an HTTP
redirect to a new URI (see Section 11.1). Normally,
Squid doesn’t log these redirects. To do so, you must manually
define the LOG_TCP_REDIRECTS
preprocessor directive before compiling Squid.
NONEUnclassified result used for certain errors, such as invalid hostnames.
The following labels may appear in the fourth field of the access.log file in response to ICP queries:
UDP_HITSquid found a likely fresh copy of the requested resource in the cache.
UDP_MISSSquid didn’t find a likely fresh copy of the requested
resource in the cache. If the same object is requested via HTTP,
it would probably be a cache miss. Compare with UDP_MISS_NOFETCH.
UDP_MISS_NOFETCHLike UDP_MISS, except
that this also indicates Squid’s reluctance to handle the
corresponding HTTP request. If you use the -Y
command-line option, Squid returns this, instead of UDP_MISS, while rebuilding its
in-memory indexes at startup.
UDP_DENIEDThe ICP query is denied due to the
icp_access rules. If more than 95% of the
ICP replies to a client are UDP_DENIED, and the client database is
enabled (see Appendix A),
Squid stops sending any ICP replies to the client for an hour.
When this happens you’ll also see a warning in cache.log.
UDP_INVALIDSquid received an invalid query (e.g., truncated message,
invalid protocol version, whitespace in the URI, etc.). Squid
sent an ICP_INVALID reply
back to the client.
Table 13-1 lists the numerical HTTP response codes and reason phrases. Note that Squid and other HTTP agents care only about the numeric value. The reason phrase is purely informational and doesn’t affect the meaning of the response. For each status code, I also provide a reference to the particular section in RFC 2616 that describes it. Note that status codes 0 and 600 are nonstandard values used by Squid, and aren’t mentioned in the RFC.
Code | Reason phrase | RFC 2616 section |
0 | No Response Received (Squid-specific) | N/A |
1xx | Informational | 10.1 |
100 | Continue | 10.1.1 |
101 | Switching Protocols | 10.1.2 |
2xx | Successful | 10.2 |
200 | OK | 10.2.1 |
201 | Created | 10.2.2 |
202 | Accepted | 10.2.3 |
203 | Non-Authoritative Information | 10.2.4 |
204 | No Content | 10.2.5 |
205 | Reset Content | 10.2.6 |
206 | Partial Content | 10.2.7 |
3xx | Redirection | 10.3 |
300 | Multiple Choices | 10.3.1 |
301 | Moved Permanently | 10.3.2 |
302 | Found | 10.3.3 |
303 | See Other | 10.3.4 |
304 | Not Modified | 10.3.5 |
305 | Use Proxy | 10.3.6 |
306 | (Unused) | 10.3.7 |
307 | Temporary Redirect | 10.3.8 |
4xx | Client Error | 10.4 |
400 | Bad Request | 10.4.1 |
401 | Unauthorized | 10.4.2 |
402 | Payment Required | 10.4.3 |
403 | Forbidden | 10.4.4 |
404 | Not Found | 10.4.5 |
405 | Method Not Allowed | 10.4.6 |
406 | Not Acceptable | 10.4.7 |
407 | Proxy Authentication Required | 10.4.8 |
408 | Request Timeout | 10.4.9 |
409 | Conflict | 10.4.10 |
410 | Gone | 10.4.11 |
411 | Length Required | 10.4.12 |
412 | Precondition Failed | 10.4.13 |
413 | Request Entity Too Large | 10.4.14 |
414 | Request-URI Too Long | 10.4.15 |
415 | Unsupported Media Type | 10.4.16 |
416 | Requested Range Not Satisfiable | 10.4.17 |
417 | Expectation Failed | 10.4.18 |
5xx | Server Error | 10.5 |
500 | Internal Server Error | 10.5.1 |
501 | Not Implemented | 10.5.2 |
502 | Bad Gateway | 10.5.3 |
503 | Service Unavailable | 10.5.4 |
504 | Gateway Timeout | 10.5.5 |
505 | HTTP Version Not Supported | 10.5.6 |
6xx | Proxy Error | N/A |
600 | Unparseable Response Headers (Squid-specific) | N/A |
You’ll see status code 0 in the access.log if Squid doesn’t receive any response from the origin server. You’ll see status code 600 if Squid received a response but couldn’t find any HTTP headers. In a small fraction of cases, certain origin servers send only the response body and omit any headers.
The following codes may appear in the ninth field of the access.log. Refer to Section 10.10 for a description of how Squid selects the next-hop for cache misses.
NONEThis indicates that Squid didn’t communicate with any other servers (neighbors, origin) for this request. You’ll see it in association with various types of cache hits, denied requests, cache manager requests, errors, and all ICP queries.
DIRECTSquid forwarded the request directly to the origin server. The second half of the field shows the origin server’s IP address, or hostname if you’ve disabled log_ip_on_direct.
SIBLING_HITSquid sent the request to this sibling cache after the sibling returned an ICP or HTCP hit.
PARENT_HITSquid sent the request to this parent cache after the parent returned an ICP or HTCP hit.
DEFAULT_PARENTSquid selected this parent because it was marked as
default on the
cache_peer line in squid.conf.
FIRST_UP_PARENTSquid forwarded the request to this parent because it is the first parent in the list known to be alive.
FIRST_PARENT_MISSSquid forwarded the request to the parent cache that was
first to respond with an ICP/HTCP miss message. In other words,
for this particular ICP/HTCP query, at this particular time, the
selected parent had the best round-trip time. Note that measured
RTTs may be artificially adjusted by the weight option to the
cache_peer directive.
CLOSEST_PARENT_MISSSquid selected this parent because it reports the lowest RTT to the origin server. This occurs only if both caches have netdb enabled (see Section 10.5), and the origin server (or other servers on its subnet) returns ICMP pings.
CLOSEST_PARENTThis is similar to CLOSEST_PARENT_MISS, except that the
RTT measurements don’t come from the ICP/HTCP reply messages.
Instead, they come from older measurements saved by Squid, such
as the netdb exchange feature.
CLOSEST_DIRECTSquid forwarded the request to the origin server based on netdb measurements. This happens if any of these conditions occur:
The RTT between Squid and the origin server is less than the configured minimum_direct_rtt value.
The measured number of router hops between Squid and the origin server is less than the configured minimum_direct_hops value.
The RTT values returned in ICP/HTCP replies indicate that Squid is closer to the origin server than any of its neighbors.
ROUNDROBIN_PARENTSquid forwarded the request to this parent because the
round-robin option was set,
and it had the lowest usage counter.
CD_PARENT_HITSquid forwarded the request to this parent based on the Cache Digest algorithm (see Section 10.7).
CD_SIBLING_HITSquid forwarded the request to this sibling based on the Cache Digest algorithm.
CARPSquid selected this parent based on the Cache Array Routing Protocol algorithm (see Section 10.9).
ANY_PARENTSquid selected this parent as a last resort because none of the other methods resulted in a viable next-hop.
Note that most of these codes may be preceded by TIMEOUT_ to indicate that a timeout occurred while waiting for ICP/HTCP replies. For example:
1066038165.382 345 193.233.46.21 TCP_MISS/200 2836
GET http://www.caida.org/home/images/home.jpg
TIMEOUT_CLOSEST_DIRECT/213.219.122.19 image/jpegYou can adjust the timeout with the icp_query_timeout directive.
Following are the configuration file directives that affect the access.log in one way or another.
This directive, enabled by default, causes Squid to log all ICP queries. If you’re running a busy parent cache, this may make your access.log files huge. To save space, disable this directive:
log_icp_queries off
If you disable ICP query logging, I suggest that you monitor the number of queries, either through the cache manager or with SNMP.
The access.log file has two formats: common and native. The common format is the same as most HTTP servers (e.g., Apache) use. It contains less information than Squid’s native format. However, you might want to use the common log-file format if you use Squid as a surrogate (see Chapter 15). The common format may also be useful if you have log-file analysis tools that know how to parse it. Use this directive to enable the common format:
emulate_httpd_log on
See the site http://www.w3.org/Daemon/User/Config/Logging.html#common-logfile-format, for a description of this format.
Use the log_mime_hdrs directive to make Squid log the HTTP request and response headers:
log_mime_headers on
When enabled, Squid appends the request and response headers to access.log. This adds two fields to each line. Each field is surrounded by square brackets to make parsing easier. Certain characters are encoded to keep the log file readable. Table 13-2 shows the encoding scheme.
By default, Squid puts client IP addresses in the access.log. You can record hostnames, when available, by enabling this directive:
log_fqdn on
This causes Squid to make reverse DNS lookups for the client’s address when it receives a request. If an answer is available by the time the request is complete, Squid places it in the third field.
This access rule list determines whether or not Squid makes an RFC 1413 ident query for the client’s TCP connection. By default, Squid doesn’t issue ident queries. To enable this feature, simply add one or more rules:
acl All src 0/0 ident_lookup_access allow All
If an answer is available by the time the request is complete, Squid places it in the eighth field. If you are also using HTTP authentication, that username is written instead of the ident answer.
When Squid forwards a cache miss to an origin server, it records the origin server’s IP address in the ninth field. You can disable this directive so that Squid writes the hostname instead:
log_ip_on_direct off
In this case, the hostname comes from the URI. If the URI contains an IP address, Squid doesn’t convert it to a hostname.
This directive exists to provide some level of privacy for your users. Rather than logging the entire client IP address, you can mask off some bits. For example:
client_netmask 255.255.255.0
With this setting, all client IP addresses in access.log have 0 as the last octet:
1066036246.918 35 163.11.255.0 TCP_IMS_HIT/304 266 GET http://... 1066036246.932 16 163.11.255.0 TCP_IMS_HIT/304 266 GET http://... 1066036247.616 313 140.132.252.0 TCP_MISS/200 1079 GET http://... 1066036248.598 44459 140.132.252.0 TCP_MISS/500 1531 GET http://... 1066036249.230 17 170.210.173.0 TCP_IMS_HIT/304 265 GET http://... 1066036249.752 2135 140.132.252.0 TCP_MISS/200 50230 GET http://... 1066036250.467 4 170.210.173.0 TCP_IMS_HIT/304 265 GET http://... 1066036250.762 102 163.11.255.0 TCP_IMS_HIT/304 265 GET http://... 1066036250.832 20 163.11.255.0 TCP_IMS_HIT/304 266 GET http://... 1066036251.026 74 203.91.150.0 TCP_CLIENT_REFRESH_MISS/304 267 GET http://...
This directive is another privacy feature. It removes query terms from URIs before
logging them. If your log files somehow fall into the wrong hands,
they won’t be able to find any usernames and passwords. When this
directive is enabled, all characters after a question mark (?) are removed. For example, a URI like
this:
http://auto.search.msn.com/response.asp?MT=www.kimo.com.yw&srch=3 &prov=&utf8
is logged like this:
http://auto.search.msn.com/response.asp?
Earlier, I mentioned the problem with whitespace appearing in some URIs. The RFCs state
that URIs must not contain whitespace, but in reality it happens all
too often. The uri_whitespace directive
dictates how Squid should handle such cases. The allowed settings
are: strip (default), deny, allow, encode, and chop. Of these, strip, encode, and chop ensure that the URI field doesn’t
contain any whitespace (thus adding more fields to access.log).
The allow setting allows
the request to pass through Squid unmodified. It is likely to cause
trouble for redirectors and log file parsers. The deny setting, on the other hand, causes
Squid to deny the request. The user receives an error message, but
the request is still written to access.log with the whitespace
characters.
If you set it to encode,
Squid changes the whitespace characters to their RFC 1738
equivalents. This is probably what the user-agent should have done
in the first place. The chop
setting causes Squid to cut off the URI at the first whitespace
character.
The default setting is strip, which makes Squid remove the
whitespace characters from the URI. It ensures that your log-file
parsers and redirectors will be happy, but it might break certain
things, such as improperly encoded search engine queries.
By default, Squid disables buffering for the cache.log file, which allows you to run tail -f and watch log file entries appear in real time. If you think this will cause an unnecessary overhead, you can disable buffering:
buffered_logs off
However, it probably doesn’t matter unless you are running Squid with full debugging. Note that this option affects only cache.log. The others always use unbuffered writes.
The access.log file contains a wealth of information, much more than you can see by just browsing through it. In order to get the big picture view, you’ll need to use a third-party log-file analysis package. You can find a long list of them linked from the Squid web page, or by going directly to http://www.squid-cache.org/Scripts/.
One of the most popular tools is Calamaris—a Perl script that parses the log file and generates either text or HTML-based reports. It provides a breakdown of traffic by request method, client IP address, origin server domain name, content types, filename extensions, reply size, and more. Calamaris also reports on ICP query traffic and even understands log files from other caching products. Check it out by visiting http://calamaris.cord.de/.
Squeezer, and its derivative, Squeezer2, are Squid-specific analysis tools. They provide many statistics that can help you understand Squid’s performance, especially when you have neighbors. Both generate HTML pages as output. Visit the Logfile Analysis page on the squid-cache.org site for links to these programs.
Webalyzer is another good utility. It is designed to be fast and produces HTML pages with tables and bar charts. It was originally designed for origin server access logs. Although it can parse Squid’s logs, it doesn’t report on such things as hit ratios and response times. It also uses some terms differently than I do. For example, Webalyzer calls any request a “hit,” which isn’t the same as a cache hit. It also makes a distinction between “pages” and “files.” For more information, visit the Webalyzer home page at http://www.mrunix.net/webalyzer/.
The store.log is a record of Squid’s decisions to store and remove objects from the cache. Squid creates an entry for each object it stores in the cache, each uncachable object, and each object that is removed by the replacement policy. The log file covers both in-memory and on-disk caches.
The store.log provides the following you can’t get from access.log:
Whether or not a particular response was cached.
The file number for cached objects. For UFS-based storage schemes, you can convert this to a pathname and examine the contents of the cache file.
The response’s content length: the Content-Length value, and the actual body
length.
Values for the Date,
Last-Modified, and Expires headers.
The response’s cache key (i.e., MD5 hash value).
As you can see, this is mostly low-level information you won’t need on a daily basis. Unless you do sophisticated analyses, or wish to debug a problem, you can probably get by without the store.log. You can disable it with a special setting:
cache_store_log none
As with other log files, Squid appends new store.log entries to the end of the file. A given URI may appear in the file more than once. For example, it gets cached, then released, then cached again. Only the most recent entry reflects the object’s current status.
The store.log is text-based and looks something like this:
1067299212.411 RELEASE -1 FFFFFFFF A5964B32245AC98592D83F9B6EA10B8D 206
1067299212 1064287906 -1 application/octet-stream 6840/6840
GET http://download.windowsupdate.com/msdownload/update/v3-19990518/cab...
1067299212.422 SWAPOUT 02 0005FD5F 6F34570785CACABC8DD01ABA5D73B392 200
1067299210 1057899600 -1 image/gif 1125/1125
GET http://forum.topsportsnet.com/shfimages/nav_members1.gif
1067299212.641 RELEASE -1 FFFFFFFF B0616CB4B7280F67672A40647DD08474 200
1067299212 -1 -1 text/html -1/67191
GET http://www.tlava.com/
1067299212.671 RELEASE -1 FFFFFFFF 5ECD93934257594825659B596D9444BC 200
1067299023 1034873897 1067299023 image/jpeg 3386/3386
GET http://ebiz0.ipixmedia.com/abc/ebiz/_EBIZ_3922eabf57d44e2a4c3e7cd234a...
1067299212.786 RELEASE -1 FFFFFFFF B388F7B766B307ADEC044A4099946A21 200
1067297755 -1 -1 text/html -1/566
GET http://www.evenflowrocks.com/pages/100303pic15.cfm
1067299212.837 RELEASE -1 FFFFFFFF ABC862C7107F3B7E9FC2D7CA01C8E6A1 304
1067299212 -1 1067299212 unknown -1/0
GET http://ebiz0.ipixmedia.com/abc/ebiz/_EBIZ_3922eabf57d44e2a4c3e7cd234a...
1067299212.859 RELEASE -1 FFFFFFFF 5ED2726D4A3AD83CACC8A01CFDD6082B 304
1066940882 1065063803 -1 application/x-javascript -1/0
GET http://www.bellsouth.com/scripts/header_footer.jsEach entry contains the following 13 fields:
The timestamp when the event took place, expressed as seconds since the Unix epoch with millisecond resolution.
The action taken on the object. This field has three possible values: SWAPOUT, RELEASE, and SO_FAIL.
A SWAPOUT occurs when
Squid successfully completes saving the object data to disk.
Some objects, such as those that are negatively cached, are
kept in memory, but not on disk. Squid doesn’t make a
store.log entry for
them.
A SO_FAIL entry
indicates that Squid could not completely save the object to
disk. Most likely it means that the storage scheme
implementation refused to open a new disk file for
writing.
A RELEASE occurs when
Squid removes an object from the cache, or decides that the
response isn’t cachable in the first place.
The directory number is a 7-bit index to the list of cache directories that’s written as a
decimal number. For objects that aren’t saved to disk, this field
contains the value -1.
The file number is a 25-bit identifier used internally by Squid. It is written as an 8-character hexadecimal number. The UFS-based storage schemes have an algorithm for mapping file numbers to pathnames (see Section 13.3.1).
Objects that aren’t saved to disk don’t have a valid file
number. For these, the file number field contains FFFFFFFF. This value appears only for
RELEASE and SO_FAIL entries.
Squid uses MD5 hash values for the primary index to locate cached objects. The key is based on the request method, URI, and possibly other information.
You might be able to use the cache key to match up store.log entries. Note, however, that
an object’s cache key can change. This happens, for example,
whenever Squid logs a TCP_REFRESH_MISS request in access.log. It looks like this:
1065837334.045 SWAPOUT ... 554BACBD2CB2A0C38FF9BF4B2239A9E5 ... http://blah 1066031047.925 RELEASE ... 92AE17121926106EB12FA8054064CABA ... http://blah 1066031048.074 SWAPOUT ... 554BACBD2CB2A0C38FF9BF4B2239A9E5 ... http://blah
So what’s going on? The object is originally cached under
one key (554B...). Some time
later, Squid receives another request for the object and forwards
a validation request to the origin server. When the response comes
back with new content, Squid changes the cache key of the old
object (to 92AE...) so that it
can give the new object the correct key (554B...). The old object is then
removed, and the new object is saved to disk.
This field shows the HTTP status code of the response, just like access.log. See Table 13-1 for a list of status codes.
The value of the Date
header in the HTTP response, expressed as seconds since the
Unix epoch. The value -1
indicates an unparseable Date
header, and -2 means the header
was entirely absent.
The value of the Last-Modified header in the HTTP response, expressed as seconds
since the Unix epoch. The value -1 indicates an unparseable Last-Modified header, and -2 means the header was entirely
absent.
The value of the Expires
header in the HTTP response, expressed as seconds since the
Unix epoch. The value -1
indicates an unparseable Expires header, and -2 means the header was entirely
absent.
The value of the Content-Type header in the HTTP
response, excluding any media-type parameters. Squid inserts the value unknown if the Content-Type is missing.
This field contains two numbers, separated by a slash. The first is the value of the
Content-Length header. A
-1 indicates the Content-Length header is absent. The
second is the actual size of the HTTP message body. You can use
these two numbers to identify partially received responses and
origin servers that incorrectly calculate the content length. In
most cases, the two numbers are the same.
The final field is the requested URI, as in access.log. This field also has the whitespace problem mentioned in the previous section. However, it is less worrisome here because you can safely ignore any extra fields.
For many of the RELEASE
entries, you’ll see question marks (?) for the last eight fields. This is because
most of those field values come from what Squid calls the MemObject structure. This structure is present
only for objects that have just been received, or are being stored
entirely in memory. Most of the objects in Squid’s cache don’t have a
MemObject because they exist only on
disk. For these, Squid puts question marks in the fields with missing
information.
If you find you need to examine a particular cache file, you can, with some
effort, turn a file number into a pathname. You’ll also need the
directory number, and L1 and L2 values. In the Squid source code, the
storeUfsDirFullPath( ) function
does this. You can find it in the src/fs/ufs/store_dir_ufs.c file. This short
Perl program mimics the current algorithm:
#!/usr/bin/perl
$L1 = 16;
$L2 = 256;
while (<>) {
$filn = hex($_);
printf("%02X/%02X/%08X\n",
(($filn / $L2) / $L2) % $L1,
($filn / $L2) % $L2,
$filn);
}And here’s how you can use it:
% echo 000DCD06 | ./fileno-to-pathname.pl 0D/CD/000DCD06
To find this file in the Nth
cache_dir, simply go to the corresponding
directory and list or view the file:
% cd /cache2 % ls -l 0D/CD/000DCD06 -rw------- 1 squid squid 391 Jun 3 12:40 0D/CD/000DCD06 % less 0D/CD/000DCD06
The optional referer.log
contains Referer header
values from client requests. To use this feature, you must run ./configure with the
—enable-referer-log option. You must also enter a
pathname for the referer_log directive. For
example:
referer_log /usr/local/squid/var/logs/referer.log
Set the filename to none if you
want to disable referer logging.
The Referer header normally
contains the URI from which the request was obtained (see Section 14.36
of RFC 2616). For example, when a web browser issues a request for an
embedded image, the Referer header is
set to the URI of the (HTML) page containing the images. It is also set
when you click on an HTML link. Some web site operators use Referer values to find so-called dead links.
You may find referer.log
particularly useful if you use Squid as a surrogate.
The referer.log has a simple format, with only four fields. Here are a few examples:
1068047502.377 3.0.168.206
http://www.amazon.com/exec/obidos/search-handle-form/002-7230223-8205634
http://www.amazon.com/exec/obidos/ASIN/0596001622/qid=1068047396/sr=2-1/...
1068047503.109 3.0.168.206
http://www.amazon.com/exec/obidos/ASIN/0596001622/qid=1068047396/sr=2-1/...
http://g-images.amazon.com/images/G/01/gourmet/gourmet-segway.gif
1068047503.196 3.0.168.206
http://www.amazon.com/exec/obidos/ASIN/0596001622/qid=1068047396/sr=2-1/...
http://g-images.amazon.com/images/G/01/marketing/cross-shop/arnold/appar...
1068047503.198 3.0.168.206
http://www.amazon.com/exec/obidos/ASIN/0596001622/qid=1068047396/sr=2-1/...
http://g-images.amazon.com/images/G/01/marketing/cross-shop/arnold/appar...
1068047503.825 3.0.168.206
http://www.amazon.com/exec/obidos/ASIN/0596001622/qid=1068047396/sr=2-1/...
http://images.amazon.com/images/P/B00005R8BC.01.TZZZZZZZ.jpg
1068047503.842 3.0.168.206
http://www.amazon.com/exec/obidos/ASIN/0596001622/qid=1068047396/sr=2-1/...
http://images.amazon.com/images/P/0596001622.01._PE_PI_SCMZZZZZZZ_.jpgNote that requests that lack a Referer header aren’t logged. The four fields
are as follows:
Note that, unlike access.log, a referer.log entry is made as soon as Squid receives the complete request. Thus, the referer.log entry occurs before the access.log, which waits for the end of the response.
The same as the client address in access.log. The log_fqdn and client_netmask directives affect this log file as well.
The value of the Referer
header from the client’s request. Note that the referer
value might have whitespace (or any other) characters. Squid
doesn’t encode the value before writing to referer.log.
The URI that the client is requesting. It matches the URI in access.log.
The optional useragent.log
contains User-Agent header values from client requests. To use this feature, you must
supply the —enable-useragent-log option when running
./configure. You also must enter a
pathname for the useragent_log directive. For
example:
useragent_log /usr/local/squid/var/logs/useragent.log
The User-Agent header normally
contains a description of the agent that made the request. In most
cases, the description is simply a list of product names with optional
version information. You should be aware that applications can easily
provide false user-agent information. Modern user-agents provide a way
to customize the description. Even Squid can alter the User-Agent header in forwarded
requests.
The useragent.log format is relatively simple. It looks like this:
3.0.168.206 [05/Nov/2003:08:51:43 -0700]
"Mozilla/5.0 (compatible; Konqueror/3; FreeBSD)"
3.0.168.207 [05/Nov/2003:08:52:18 -0700]
"Opera/7.21 (X11; FreeBSD i386; U) [en]"
4.241.144.204 [05/Nov/2003:08:55:11 -0700]
"Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en-us) AppleWebKit/103u (KHTM..."
3.0.168.206 [05/Nov/2003:08:51:43 -0700]
"Java1.3.1_01"
64.68.82.28 [05/Nov/2003:08:52:50 -0700]
"Googlebot/2.1 (http://www.googlebot.com/bot.html)"
3.0.168.205 [05/Nov/2003:08:52:50 -0700]
"WebZIP/4.1 (http://www.spidersoft.com)"
4.241.144.201 [05/Nov/2003:08:52:50 -0700]
"Mozilla/4.0 (compatible; MSIE 5.0; Windows 98; DigExt; Hotbar 3.0)"
3.0.168.206 [05/Nov/2003:08:54:40 -0700]
"Bookmark Renewal Check Agent [http://www.bookmark.ne.jp/] (Version 2.0..."Unlike the other log files, it has just three fields:
The same as the client address in access.log. The log_fqdn and client_netmask directives affect this log file as well.
Unlike the other log files, which represent the time as seconds since the Unix epoch, this one uses a human-readable format. It is the HTTP common log-file format timestamp, which looks like this:
[10/Jun/2003:22:38:36 -0600]
Note that the square brackets delimit the timestamp, which includes a space character. Also note that, like referer.log, these entries are created as soon as Squid receives the complete request.
The value of the User-Agent
header. These strings almost always contain whitespace. Squid
doesn’t encode User-Agent
values before writing them in this log file.
A swap.state file is a journal of objects that have been added to, and removed from, a cache directory. Each cache_dir has its own swap.state file. When Squid starts up, it reads the swap.state files to rebuild its in-memory indexes of cached objects. These files are a relatively critical part of Squid’s operation.
By default, each swap.state file is located in its corresponding cache directory. Thus, each state file automatically stays with each cache_dir. This is important if you ever decide to reorder your cache_dir lines or if you remove one or more from the list.
If you prefer to put them in a different location, you can use the cache_swap_log directive:
cache_swap_log /usr/local/squid/var/logs/swap.state
In this case, Squid creates a swap.state file for each directory by appending a numeric suffix. For example, if you have four cache directories, Squid creates the following:
/usr/local/squid/var/logs/swap.state.00 /usr/local/squid/var/logs/swap.state.01 /usr/local/squid/var/logs/swap.state.02 /usr/local/squid/var/logs/swap.state.03
In this situation, if you add, remove, or rearrange cache_dir lines, you may need to rename the swap.state files manually to keep everything consistent.
Technically, the swap.state format is storage scheme-dependent. However, all storage schemes use the same format in the current versions of Squid. The swap.state file uses a fixed-size (48-byte) binary format. Fields are written in host-byte order and are thus not necessarily portable between different operating systems. Table 13-3 describes the fields of a swap.state entry.
Name | Size, in bytes | Description |
op | 1 | Operation on the entry: added (1) or deleted (2). |
file number | 4 | Same as the fourth field of store.log, except it is stored in binary. |
timestamp | 4 | A timestamp corresponding to the time when the
response was generated or last validated. Taken from the
|
lastref | 4 | A timestamp corresponding to the most recent access to the object. |
expires | 4 | The object’s expiration time, taken from an
|
last-modified | 4 | The object’s |
swap file size | 4 | The amount of space the object occupies on disk. This includes HTTP headers and other Squid-specific meta-information. |
refcount | 2 | The number of times this object has been requested. |
flags | 2 | Various internal flags used by Squid. |
key | 16 | The MD5 hash of the corresponding URI. Same as the key in store.log, except this one is stored in binary. |
Squid always appends new entries to its log files. If your cache is busy, some of these files can become very large after a few days. Some operating systems even place limits on the size of a file (e.g., 2 GB) and return an error for writes beyond that size. To keep your log files manageable, and Squid happy, you must regularly rotate them.
Squid has a built-in feature for rotating log files. You can invoke it with the squid -k rotate command. You then tell Squid how many old copies of each file to keep with the logfile_rotate directive. For example, if you set it to 7, you’ll have eight versions of each log file: the current file and seven old ones.
Old log files are renamed with numeric extensions. For example, when you execute a rotation, Squid renames log.6 to log.7, then log.5 to log.6, and so on. The current log becomes log.0, and Squid creates a new, empty file named log.
Each time you execute squid -k rotate, Squid rotates the following files: cache.log, access.log, store.log, useragent.log (if enabled), and referer.log (if enabled). Squid also creates up-to-date versions of the swap.state files. Note, however, that swap.state isn’t archived with numeric extensions.
Squid doesn’t rotate the log files automatically. The best way to
make it happen is with a daily cron
job. For example:
0 0 * * * /usr/local/squid/sbin/squid -k rotate
If you’d rather write your own scripts to manage the log files,
Squid has a special mode that you’ll find useful. Simply set the
logfile_rotate directive to 0. Then, when you run squid -k rotate, Squid simply closes the
current log files and opens new ones. This is very useful when the
operating system allows you to rename files opened by another process.
The following shell script illustrates the idea:
#!/bin/sh set -e yesterday_secs=`perl -e 'print time -43200'` yesterday_date=`date -r $yesterday_secs +%Y%m%d` cd /usr/local/squid/var/logs # rename the current log file without interrupting the logging process mv access.log access.log.$yesterday_date # tell Squid to close the current logs and open new ones /usr/local/squid/sbin/squid -k rotate # give Squid some time to finish writing swap.state files sleep 60 mv access.log.$yesterday_date /archive/location/ gzip -9 /archive/location/access.log.$yesterday_date
Squid’s log files, especially access.log, contain a record of users’ activities and, hence, are subject to privacy concerns. As the Squid administrator, you should take every precaution to keep the log files safe and secure. One of the best ways to do that is limit the number of people who have access to the system on which Squid runs. If that isn’t possible, carefully examine the file and directory permissions to make sure they can’t be viewed by untrusted or unauthorized users.
You can also help protect your users’ privacy by taking advantage of the client_netmask and strip_query_terms directives. The former makes it harder to identify individual users in the access.log; the latter removes URI query terms that may contain personal information. See Section 13.2.4 for more information.
You may also want to develop a policy for keeping old log files. Obviously access.log helps keep users accountable for their activities, but how far back would you ever need to go searching for something? A week? A year? What would you do if presented with a court order to hand over your log files for the last three months?
If you like to keep historical data for a long time, perhaps you can make the log files anonymous or somehow reduce the dataset. If you are interested only in which URIs were accessed, but not by whom, you can extract only that field from access.log. This not only makes the file smaller, it also reduces the risk of a privacy violation. Another technique is to randomize the client IP addresses. In other words, create a filter that maps real IP addresses to fake ones, such that the same real address is always changed to the same fake address. If you are using RFC 1413 identification or HTTP authentication, consider making those fields anonymous as well.
Configure Squid so that it doesn’t create any log files, except for the swap.state file(s).
Write a simple Perl or awk script to calculate your cache hit ratio from access.log.
How does an “access denied” response appear in the access.log?
Does store.log have the same number of, more, or fewer, entries than access.log?
Take a file number from store.log and find the corresponding file in the disk cache. Examine the file and make sure you’ve found the correct response.
Develop and implement a policy for archiving old cache log files. Consider where and how they will be stored, for how long, and who has permission to access them.