Chapter 7. Real-Time Log Monitoring

The ability to analyze a log after an event is an important skill. It is equally important to be able to extract information from a log file in real-time to detect malicious or suspicious activity as it happens. In this chapter we will explore methods to read in log entries as they are generated, format them for output to the analyst, and generate alerts based on known indicators of compromise.

Monitoring Text Logs

The most basic method to monitor a log in real time is to use the tail command’s -f option, which continuously reads a file and outputs new lines to stdout as they are added. As in previous chapters, we will use an Apache web server access log for examples, but the techniques presented can be applied to any text-based log. To monitor the Apache access log with tail:

tail -f /var/logs/apache2/access.log

Commands can be combined to provide more advanced functionality. The output from tail can be piped into grep so only entries matching specific criteria will be output. The example below monitors the Apache access log and outputs entries matching a particular IP address.

tail -f /var/logs/apache2/access.log | grep '10.0.0.152'

Regular expressions can also be used. Below only entries returning a HTTP status code of 404 Page Not Found will be displayed. The -i option is added to ignore character case.

tail -f /var/logs/apache2/access.log | egrep -i 'HTTP/.*" 404'

To clean up the output it can be piped into the cut command to remove extraneous information. The example below monitors the access log for requests resulting in a 404 status code and then uses cut to only display the date/time and the page that was requested.

$ tail -f access.log | egrep --line-buffered 'HTTP/.*" 404' | cut -d' ' -f4-7

[29/Jul/2018:13:10:05 -0400] "GET /test
[29/Jul/2018:13:16:17 -0400] "GET /test.txt
[29/Jul/2018:13:17:37 -0400] "GET /favicon.ico

You can further clean the output by piping it into tr -d '[]"' to remove the square brackets and the orphen double-quotation.

Note that we used the egrep command’s --line-buffered option. This forces egrep to output to stdout each time a line break occurs. Without this option buffering occurs and output is not piped into cut until a buffer is filled. We don’t want to wait that long. The option will have egrep write out each line as it finds it.

Command Line Buffers

So what’s going on with buffering? Imagine that egrep is finding lots of lines that match the pattern specified for it. Then egrep would have a lot of output to produce. But output (in fact, any input or output (I/O)) is much more “expensive” (takes more time) than is straight computing (like searching for text). So the fewer I/O calls, the more efficient the program will be.

What the grep family of programs do, on finding a match, is copy a matching line into a large area of memory called a “buffer”, which has enough room to hold many lines of text. After finding and copying many lines that match, the buffer will fill up. Then grep makes one call to output the entire buffer. Imagine a case where grep can fit 50 matching lines into the buffer. Instead of making 50 output calls, one for each line, it only needs to make one call. That’s 50 times more efficient!

That works well for most uses of egrep, like when we are searching through a file. The egrep program will write each line to the buffer as it finds it and it doesn’t take that long to get to the end of the file. When the end of the file is reached, it will “flush” the buffer, that is, it will write out the contents of the buffer, even if it’s only partially filled, since there will be no more data coming in. When the input is coming from a file, that usually happens quickly.

But when reading from a pipe, especially our example, where tail -f is putting data into the pipe only occassionaly (when certain events happen), then there isn’t necessarily enough data to fill a buffer (and flush it) soon enough for us to see it “real time”. We would have to wait until the buffer fills - which might be hours or even days later.

The solution is to tell egrep to use the more inefficient technique of writing out each line, one at a time, as it is found. It keeps the data moving through the pipeline as soon as each match is found.

Log-Based Intrusion Detection

You can use the power of tail and egrep to monitor a log and output any entries that match known patterns of suspicious or malicious activity, often referred to as Indicators of Compromise (IOC). By doing this you can create a lightweight Intrusion Detection System (IDS). To begin lets create a file that contains regex patterns for IOCs.

Example 7-1. ioc.txt

\.\./ 
etc/passwd 
etc/shadow
cmd\.exe 
/bin/sh
/bin/bash

: This pattern (../) is an indicator of a directory traversal attack where the attacker tries to escape from the current working directory and access files for which they otherwise would not have permission.
: The Linux etc/passwd and etc/shadow files are used for system authentication and should never be available through the web server.
: Serving the cmd.exe, /bin/sh, or /bin/bash files is an indicator of a reverse shell being returned by the web server. A reverse shell is often an indicator of a successful exploitation attempt.

Note that the IOCs must be in a regular expression format as they will be used later with egrep.

Tip

IOCs for web servers are too numerous to discuss here in depth. For more examples of indicators of compromise download the latest Snort community ruleset at https://www.snort.org/downloads.

Next ioc.txt can be used with the egrep -f option. This option tells egrep to read in the regex patterns to search for from the specified file. This allows you to use tail to monitor the log file, and as each entry is added it will be compared against all of the patterns in the IOC file, outputting any entry that matches. Here is an example:

tail -f /var/logs/apache2/access.log | egrep -i -f ioc.txt

Additionally, the tee command can be used to simultaneously display the alerts to the screen and save them to their own file for later processing.

tail -f /var/logs/apache2/access.log | egrep --line-buffered -i -f ioc.txt |
tee -a interesting.txt

Again the --line-buffered option is used to ensure there are no problems caused by command output buffering.

Monitoring Windows Logs

As previously discussed, you need to use the wevtutil command to access Windows events. While the command is very versatile, it does not have functionality similar to tail that can be used to extract new entries as they occur. Thankfully, a simple bash script can provide similar functionality.

Example 7-2. wintail.sh

#!/bin/bash -
#
# Rapid Cybersecurity Ops
# wintail.sh
#
# Description:
# Perform a tail-like function on a Windows log
#

WINLOG="Application"  

LASTLOG=$(wevtutil qe "$WINLOG" //c:1 //rd:true //f:text)  

while true
do
	CURRENTLOG=$(wevtutil qe "$WINLOG" //c:1 //rd:true //f:text)  
	if [[ "$CURRENTLOG" != "$LASTLOG" ]]
	then
		echo "$CURRENTLOG"
		echo "----------------------------------"
		LASTLOG="$CURRENTLOG"
	fi
done

: This variable identifies the Windows log you want to monitor. You can use wevtutil el to obtain a list of logs currently available on the system.
: This executes the wevtutil command to query the specified log file. The c:1 parameter causes it to return only one log entry. The rd:true parameter causes the command to read the most recent log entry. Finally, f:text returns the result as plain text rather than XML which makes it easy to read from the screen.
: The next few lines execute the wevtutil command again and compare the latest log entry to the last one printed to the screen. If the two are different, meaning that a new entry was added to the log, it prints the entry to the screen. If they are the same nothing happens and it loops back and checks again.

Generating a Real-Time Histogram

A tail -f provides an ongoing stream of data. What if you wanted to count how many lines are added to a file during a time interval? You could observe that stream of data, start a timer, and begin counting until a specified time interval is up; then you can stop counting and report the results.

You might divide this work into two separate processes - two separate scripts - one to count the lines and another to watch the clock. The timekeeper will notify the line counter by means of a standard POSIX inter-process communication mechanism called a “signal”. A signal is a software interrupt and there are different kinds of such interrupts. Some are fatal, that is, will cause the process to terminate (e.g., a floating point exception). Most can be ignored or caught - and an action taken when the signal is caught. Many have a predefined purpose, used by the operating system. We’ll use one of the two signals available for users, SIGUSR1. (The other is SIGUSR2.)

Shell scripts can catch the catchable interrupts with the trap command, a shell built-in command. With trap you specify a command to indicate what action you want taken and a list of signals which trigger the invocation of that command. For example:

trap warnmsg SIGINT

will cause the command warnmsg (our own script or function) to be called whenever the shell script receives a SIGINT signal, as when you type a control-C to interrupt a running process.

Here is the script that performs the count.

Example 7-3. looper.sh

#!/bin/bash -
#
# Rapid Cybersecurity Ops
# looper.sh
#
# Description:
# Count the lines in a file being tailed -f
# Report the count interval on every SIGUSR1
#


function interval ()					
{
    echo $(date '+%y%m%d %H%M%S') $cnt			
    cnt=0
}

declare -i cnt=0
trap interval SIGUSR1					

shopt -s lastpipe					

tail -f --pid=$$ ${1:-log.file} | while read aline	
do
    let cnt++
done

: The function interval will be called on each signal. We define it here. It needs to be defined before we can call it, of course, but also before we can use it in our trap statement, below.
: The date command is called to provide a timestamp for the count value that we print out. After we print the count we reset its value to 0 to start the count for the next interval.
: Now that interval is defined, we can tell bash to call the function whenever our process receives a SIGUSR1 signal.
: This is a crucial step. Normally when there is a pipeline of commands (such as ls -l | grep rwx | wc) then those pieces of the pipeline (each command) are run in subshells and they each end up with their own process id. This would be a problem for this script because the while loop would be in a subshell, with a different process id. Whatever process started the looper.sh script wouldn’t know the process id of the while loop to send the signal to it. Moreover, changing the value of the cnt variable in the subshell doesn’t change the value of cnt in the main process, so a signal to the main process would result in a value of zero every time. The solution is this shopt command that sets (-s) the shell option lastpipe. That option tells the shell not to create a subshell for the last command in a pipeline but to run that command in the same process as the script itself. In our case that means that the tail will run in a subshell (i.e., a different process) but the while loop will be part of the main script process. Caution: this shell option is only available in bash 4.x and above, and is only for non-interactive shells (i.e., scripts).
: Here is the tail -f command with one more option, the --pid option. We specify a process id to tell tail to exit when that process dies. We are specifying $$, the current shell script’s process id, as the one to watch. This is useful for cleanup so that we don’t get tail commands left running in the background (if, for example, this script is run in the background; see the next script which does just that.)

The script tailcount.sh starts and stops the counting, the script that has the “stopwatch” so to speak, and times these intervals.

Example 7-4. tailcount.sh

#!/bin/bash -
#
# Rapid Cybersecurity Ops
# tailcount.sh
#
# Description:
# Count lines every n seconds
#

# cleanup - the other processes on exit
function cleanup ()
{
  [[ -n $LOPID ]] && kill $LOPID		
}

trap cleanup EXIT 				

bash looper.sh $1 &				
LOPID=$!					
# give it a chance to start up
sleep 3

while true
do
    kill -SIGUSR1 $LOPID
    sleep 5
done >&2

: Since this script will be starting other processes (other scripts) it should clean up after itself. If the process id has been stored in LOPID the variable will be non-empty and therefore the function will send a signal via the kill command to that process. By not specifying a particular signal on the kill command, the default signal to be sent is SIGTERM.
: Not a signal, EXIT is a special case for the trap statement to tell the shell to call this function (here, cleanup) when the shell that is running this script is about to exit.
: Now the real work begins. The looper.sh script is called but is put in the “background”, that is, detached from the keyboard to run on its own while this script continues (without waiting for looper.sh to finish).
: This saves the process id of the script that we just put in the background.
: This redirection is just a precaution. By redirecting stdout into stderr then any and all output coming from the while loop or the kill or sleep statements (though we’re not expecting any) will be sent to stderr and not get mixed in with any output coming from looper.sh which, though it is in the background, still writes to stdout.

In summary, looper.sh has been put in the background and its process id saved in a shell variable. Every 5 seconds this script (tailcount.sh) sends that process (which is running looper.sh) a SIGUSR1 signal which causes looper.sh to print out its current count and restart its counting. When tailcount.sh exits it will clean up by sending a SIGTERM to the looper.sh function so that it, too, will be terminated.

With both a script to do the counting and a script to drive it with its “stopwatch”, you can use their output as input to a script that prints out a histogram-like bar to represent the count. It is invoked as follows:

bash tailcount.sh | bash livebar.sh

The livebar.sh script reads from stdin and prints its output to stdout, one line for each line of input.

Example 7-5. livebar.sh

#!/bin/bash -
#
# Rapid Cybersecurity Ops
# livebar.sh
#
# Description:
# Creates a rolling horizontal bar chart of live data
#
# Usage:
# <output> | bash livebar.sh
#

function pr_bar ()					
{
    local raw maxraw scaled
    raw=$1
    maxraw=$2
    ((scaled=(maxbar*raw)/maxraw))
    ((scaled == 0)) && scaled=1		# min size guarantee
    for((i=0; i<scaled; i++)) ; do printf '#' ; done
    printf '\n'

} # pr_bar


maxbar=60   # largest no. of chars in a bar		
MAX=60
while read dayst timst qty
do
    if (( qty > MAX ))					
    then
	let MAX=$qty+$qty/4	# allow some room
	echo "              **** rescaling: MAX=$MAX"
    fi
    printf '%6.6s %6.6s %4d:' $dayst $timst $qty	
    pr_bar $qty $MAX
done

: The pr_bar function prints the bar of hashtags scaled to the maximum size based on the parameters supplied. This function might look familiar. We’re using the same function we used in histogram.sh in the previous chapter.
: This is the longest string of hastags we will allow on a line (to avoid line wrap).
: How large will the values be that need to be displayed? Not knowing before hand (although it could be supplied as an argument to the script) the script will, instead, keep track of a maximum. If that maximum is exceeded it will “rescale” and the current and future lines will be scaled to the new maximum. The script adds 25% onto the maximum so that it doesn’t need to rescale if each new value goes up by just one or two each time.
: The printf specifies a min and max width on the first two fields that are printed. They are date and time stamps and will be truncated if they exceed those widths. You wouldn’t want the count truncated so we specify it will be 4 digits wide but the entire value will be printed regardless. If it is smaller than 4 it will be padded with blanks.

Since this script reads from stdin you can run it by itself to see how it behaves. Here’s a sample:

$ bash livebar.sh
201010 1020 20
201010   1020   20:####################
201010 1020 70
              **** rescaling: MAX=87
201010   1020   70:################################################
201010 1020 75
201010   1020   75:###################################################
^C

In this example the input is mixing with the output. You could also put the input into a file and redirect it into the script to see just the output.

$ bash livebar.sh < testdata.txt
bash livebar.sh < x.data
201010   1020   20:####################
              **** rescaling: MAX=87
201010   1020   70:################################################
201010   1020   75:###################################################
$

Summary

Log files can provide tremendous insight into the operation of a system, but they also come in large quantities which makes them challenging to analyze. You can minimize this issue by creating a series of scripts to automate data formatting, aggregation, and alerting.

In the next chapter we will look at how similar techniques can be leveraged to monitor networks for configuration changes.

Exercises

add a -i option to livebar.sh to set the interval in seconds.
add a -M option to livebar.sh to set an expected maximum for input values. Use the getopts builtin to parse your options.
How might you add a -f option that filters data using (using, e.g., grep)? What challenges might you encounter? What approach(es) might you take to deal with those?
Modify wintail.sh to allow the user to specify the Windows log to be monitored by passing in a command line argument.
Modify wintail.sh to add the capability for it to be a lightweight intrusion detection system using egrep and an IOC file.
Consider the statement made in the note about buffering: “When the input is coming from a file, that usually happens quickly.” Why “usually”? Under what conditions might you see the need for the line buffering option on grep even when reading from a file?