The field of computer forensics is widespread and includes many facets of investigation. One such facet is the analysis of executable code. One of the most insidious places for a hacker to install some type of malicious functionality is within an executable file of some kind. In Linux, this is, of course, the ELF file type. We already explored some of the infection techniques that are being used in Chapter 4, ELF Virus Technology – Linux/Unix Viruses, but have spent very little time discussing the analysis phase. How exactly should an investigator go about exploring a binary for anomalies or code infections? That is what this chapter is all about.
The motives for an attacker infecting an executable varies greatly, and it may be for a virus, a botnet, or a backdoor. There are, of course, many cases where an individual wants to patch or modify a binary to achieve totally different ends such as binary protection, code patching, or other experimentation. Whether malicious or not, the binary modification methods are all the same. The inserted code is what determines whether or not the binary is possessed with malicious intent.
In either case, this chapter will arm the reader with the insight necessary for determining whether or not a binary has been modified, and how exactly it has been modified. In the following pages, we will be examining several different types of infections and will even discuss some of my findings when performing a real-world analysis of the Retaliation Virus for Linux that was engineered by one of the world's most skilled Virus authors named JPanic. This chapter is all about training your eye to be able to spot anomalies within an ELF binary file, and with some practice it becomes quite possible to do so with ease.
When a binary is modified in some way, it is generally for the purpose of adding code to the binary and then redirecting execution flow to that code. The redirection of execution flow can happen in many places within the binary. In this particular case, we are going to examine a very common technique used when patching binaries, especially for viruses. This technique is to simply modify the entry point, which is the e_entry member of the ELF file header.
The goal is here to determine whether or not e_entry is holding an address that points to a location that signifies an abnormal modification to the binary.
The quickest route to being able to detect anomalies is to first know what is normal. Let's take a look at two normal binaries: one dynamically linked and the other statically linked. Both have been compiled with gcc and neither has been tampered with in any way:
$ readelf -h bin1 | grep Entry Entry point address: 0x400520 $
So we can see that the entry point is 0x400520. If we look at the section headers, we can see what section this address falls into:
readelf -S bin1 | grep 4005 [13] .text PROGBITS 0000000000400520 00000520
In our example, the entry point starts at the beginning of the .text section. This is not always so, and therefore grepping for the first significant hex-digits, as we did previously isn't a consistent approach. It is recommended that you check both the address and size of each section header until you find the section with an address range that contains the entry point.
As we can see, it points right to the beginning of the .text section, which is common, but depending on how the binary was compiled and linked, this may change with each binary you look at. This binary was compiled so that it was linked to libc just like 99 percent of the binaries you will encounter are. This means that the entry point contains some special initialization code and it looks almost identical in every single libc-linked binary, so let's take a look at it so we can know what to expect when analyzing the entry point code of binaries:
$ objdump -d --section=.text bin1 0000000000400520 <_start>: 400520: 31 ed xor %ebp,%ebp 400522: 49 89 d1 mov %rdx,%r9 400525: 5e pop %rsi 400526: 48 89 e2 mov %rsp,%rdx 400529: 48 83 e4 f0 and $0xfffffffffffffff0,%rsp 40052d: 50 push %rax 40052e: 54 push %rsp 40052f: 49 c7 c0 20 07 40 00 mov $0x400720,%r8 // __libc_csu_fini 400536: 48 c7 c1 b0 06 40 00 mov $0x4006b0,%rcx // __libc_csu_init 40053d: 48 c7 c7 0d 06 40 00 mov $0x40060d,%rdi // main() 400544: e8 87 ff ff ff callq 4004d0 // call libc_start_main() ...
The preceding assembly code is the standard glibc initialization code pointed to by e_entry of the ELF header. This code is always executed before main() and its purpose is to call the initialization routine libc_start_main():
libc_start_main((void *)&main, &__libc_csu_init, &libc_csu_fini);
This function sets up the process heap segment, registers constructors and destructors, and initializes threading-related data. Then it calls main().
Now that you know what the entry point code looks like on a libc-linked binary, you should be able to easily determine when the entry point address is suspicious, when it points to code that does not look like this, or when it is not even in the .text section at all!
Now let's take a look another binary that has been infected with the Retaliation Virus and see what type of oddities we find with the entry point:
$ readelf -h retal_virus_sample | grep Entry Entry point address: 0x80f56f
A quick examination of the section headers with readelf -S will prove that this address is not accounted for by any section header, which is extremely suspicious. If an executable has section headers and there is an executable area that is not accounted for by a section, then it is almost certainly a sign of infection or binary patching. For code to be executed, section headers are not necessary as we've already learned, but program headers are.
Let's take a look and see what segment this address fits into by looking at the program headers with readelf -l:
Elf file type is EXEC (Executable file)
Entry point 0x80f56f
There are 9 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
PHDR 0x0000000000000040 0x0000000000400040 0x0000000000400040
0x00000000000001f8 0x00000000000001f8 R E 8
INTERP 0x0000000000000238 0x0000000000400238 0x0000000000400238
0x000000000000001c 0x000000000000001c R 1
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000
0x0000000000001244 0x0000000000001244 R E 200000
LOAD 0x0000000000001e28 0x0000000000601e28 0x0000000000601e28
0x0000000000000208 0x0000000000000218 RW 200000
DYNAMIC 0x0000000000001e50 0x0000000000601e50 0x0000000000601e50
0x0000000000000190 0x0000000000000190 RW 8
LOAD 0x0000000000003129 0x0000000000803129 0x0000000000803129
0x000000000000d9a3 0x000000000000f4b3 RWE 200000
This output is extremely suspicious for several reasons. Typically, we only see two LOAD segments with one ELF executable—one for the text and one for the data—although this is not a strict rule. Nevertheless, it is the norm, and this binary is showing three segments.
Moreover, this segment is suspiciously marked RWE (read + write + execute), which indicates self-modifying code, commonly used with viruses that have polymorphic engines such as this one. The entry point, points inside this third segment, when it should be pointing to the first segment (the text segment), which, as we can see, starts at the virtual address 0x400000, which is the typical text segment address for executables on Linux x86_64. We don't even have to look at the code to be fairly confident that this binary has been patched.
But for verification, especially if you are designing code that performs automated analysis of binaries, you can check the code at the entry point and see if it matches what it is expected to look like, which is the libc initialization code we looked at earlier.
The following gdb command is displaying the disassembled instructions found at the entry point of the retal_virus_sample executable:
(gdb) x/12i 0x80f56f 0x80f56f: push %r11 0x80f571: movswl %r15w,%r11d 0x80f575: movzwq -0x20d547(%rip),%r11 # 0x602036 0x80f57d: bt $0xd,%r11w 0x80f583: movabs $0x5ebe954fa,%r11 0x80f58d: sbb %dx,-0x20d563(%rip) # 0x602031 0x80f594: push %rsi 0x80f595: sete %sil 0x80f599: btr %rbp,%r11 0x80f59d: imul -0x20d582(%rip),%esi # 0x602022 0x80f5a4: negw -0x20d57b(%rip) # 0x602030 <completed.6458> 0x80f5ab: bswap %rsi
I think we can quickly agree that the preceding code does not look like the libc initialization code that we would expect to see in the entry point code of an untampered executable. You can simply compare it with the expected libc initialization code that we looked at from bin1 to find this out.
Other signs of modified entry points are when the address points to any section outside of the .text section, especially if it's a section that is the last-most section within the text segment (sometimes this the .eh_frame section). Another sure sign is if the address points to a location within the data segment that will generally be marked as executable (visible with readelf -l) so that it can execute the parasite code.
Modifying the entry point is not the only way to create an entry point to insert code. It is a common way to achieve it, and being able to detect this is an important heuristic, especially in malware because it can reveal the start point of the parasite code. In the next section, we will discuss other methods used to hijack control flow, which is not always at the beginning of execution, but in the middle or even at the end.