Chapter 6. ELF Binary Forensics in Linux

The field of computer forensics is widespread and includes many facets of investigation. One such facet is the analysis of executable code. One of the most insidious places for a hacker to install some type of malicious functionality is within an executable file of some kind. In Linux, this is, of course, the ELF file type. We already explored some of the infection techniques that are being used in Chapter 4, ELF Virus Technology – Linux/Unix Viruses, but have spent very little time discussing the analysis phase. How exactly should an investigator go about exploring a binary for anomalies or code infections? That is what this chapter is all about.

The motives for an attacker infecting an executable varies greatly, and it may be for a virus, a botnet, or a backdoor. There are, of course, many cases where an individual wants to patch or modify a binary to achieve totally different ends such as binary protection, code patching, or other experimentation. Whether malicious or not, the binary modification methods are all the same. The inserted code is what determines whether or not the binary is possessed with malicious intent.

In either case, this chapter will arm the reader with the insight necessary for determining whether or not a binary has been modified, and how exactly it has been modified. In the following pages, we will be examining several different types of infections and will even discuss some of my findings when performing a real-world analysis of the Retaliation Virus for Linux that was engineered by one of the world's most skilled Virus authors named JPanic. This chapter is all about training your eye to be able to spot anomalies within an ELF binary file, and with some practice it becomes quite possible to do so with ease.

The science of detecting entry point modification

When a binary is modified in some way, it is generally for the purpose of adding code to the binary and then redirecting execution flow to that code. The redirection of execution flow can happen in many places within the binary. In this particular case, we are going to examine a very common technique used when patching binaries, especially for viruses. This technique is to simply modify the entry point, which is the e_entry member of the ELF file header.

The goal is here to determine whether or not e_entry is holding an address that points to a location that signifies an abnormal modification to the binary.

Note

Abnormal means any modification that wasn't created by the linker itself /usr/bin/ld whose job it is to link ELF objects together. The linker will create a binary that represents normalcy, whereas an unnatural modification often appears suspicious to the trained eye.

The quickest route to being able to detect anomalies is to first know what is normal. Let's take a look at two normal binaries: one dynamically linked and the other statically linked. Both have been compiled with gcc and neither has been tampered with in any way:

$ readelf -h bin1 | grep Entry
  Entry point address:               0x400520
$

So we can see that the entry point is 0x400520. If we look at the section headers, we can see what section this address falls into:

readelf -S bin1 | grep 4005
  [13] .text             PROGBITS         0000000000400520  00000520

Note

In our example, the entry point starts at the beginning of the .text section. This is not always so, and therefore grepping for the first significant hex-digits, as we did previously isn't a consistent approach. It is recommended that you check both the address and size of each section header until you find the section with an address range that contains the entry point.

As we can see, it points right to the beginning of the .text section, which is common, but depending on how the binary was compiled and linked, this may change with each binary you look at. This binary was compiled so that it was linked to libc just like 99 percent of the binaries you will encounter are. This means that the entry point contains some special initialization code and it looks almost identical in every single libc-linked binary, so let's take a look at it so we can know what to expect when analyzing the entry point code of binaries:

$ objdump -d --section=.text bin1

 0000000000400520 <_start>:
  400520:       31 ed                 xor    %ebp,%ebp
  400522:       49 89 d1              mov    %rdx,%r9
  400525:       5e                    pop    %rsi
  400526:       48 89 e2              mov    %rsp,%rdx
  400529:       48 83 e4 f0           and    $0xfffffffffffffff0,%rsp
  40052d:       50                    push   %rax
  40052e:       54                    push   %rsp
  40052f:       49 c7 c0 20 07 40 00   mov    $0x400720,%r8 // __libc_csu_fini
  400536:       48 c7 c1 b0 06 40 00  mov    $0x4006b0,%rcx // __libc_csu_init
  40053d:       48 c7 c7 0d 06 40 00  mov    $0x40060d,%rdi // main()
  400544:       e8 87 ff ff ff         callq  4004d0  // call libc_start_main()
...

The preceding assembly code is the standard glibc initialization code pointed to by e_entry of the ELF header. This code is always executed before main() and its purpose is to call the initialization routine libc_start_main():

libc_start_main((void *)&main, &__libc_csu_init, &libc_csu_fini);

This function sets up the process heap segment, registers constructors and destructors, and initializes threading-related data. Then it calls main().

Now that you know what the entry point code looks like on a libc-linked binary, you should be able to easily determine when the entry point address is suspicious, when it points to code that does not look like this, or when it is not even in the .text section at all!

Note

A binary that is statically linked with libc will have initialization code in _start that is virtually identical to the preceding code, so the same rule applies for statically linked binaries as well.

Now let's take a look another binary that has been infected with the Retaliation Virus and see what type of oddities we find with the entry point:

$ readelf -h retal_virus_sample | grep Entry
  Entry point address:        0x80f56f

A quick examination of the section headers with readelf -S will prove that this address is not accounted for by any section header, which is extremely suspicious. If an executable has section headers and there is an executable area that is not accounted for by a section, then it is almost certainly a sign of infection or binary patching. For code to be executed, section headers are not necessary as we've already learned, but program headers are.

Let's take a look and see what segment this address fits into by looking at the program headers with readelf -l:

Elf file type is EXEC (Executable file)
Entry point 0x80f56f
There are 9 program headers, starting at offset 64

Program Headers:
  Type       Offset             VirtAddr           PhysAddr
             FileSiz            MemSiz              Flags  Align
  PHDR       0x0000000000000040 0x0000000000400040 0x0000000000400040
             0x00000000000001f8 0x00000000000001f8  R E    8
  INTERP     0x0000000000000238 0x0000000000400238 0x0000000000400238
             0x000000000000001c 0x000000000000001c  R      1
      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
  LOAD       0x0000000000000000 0x0000000000400000 0x0000000000400000
             0x0000000000001244 0x0000000000001244  R E    200000
  LOAD       0x0000000000001e28 0x0000000000601e28 0x0000000000601e28
             0x0000000000000208 0x0000000000000218  RW     200000
  DYNAMIC    0x0000000000001e50 0x0000000000601e50 0x0000000000601e50
             0x0000000000000190 0x0000000000000190  RW     8
  LOAD       0x0000000000003129 0x0000000000803129 0x0000000000803129
             0x000000000000d9a3 0x000000000000f4b3  RWE    200000

This output is extremely suspicious for several reasons. Typically, we only see two LOAD segments with one ELF executable—one for the text and one for the data—although this is not a strict rule. Nevertheless, it is the norm, and this binary is showing three segments.

Moreover, this segment is suspiciously marked RWE (read + write + execute), which indicates self-modifying code, commonly used with viruses that have polymorphic engines such as this one. The entry point, points inside this third segment, when it should be pointing to the first segment (the text segment), which, as we can see, starts at the virtual address 0x400000, which is the typical text segment address for executables on Linux x86_64. We don't even have to look at the code to be fairly confident that this binary has been patched.

But for verification, especially if you are designing code that performs automated analysis of binaries, you can check the code at the entry point and see if it matches what it is expected to look like, which is the libc initialization code we looked at earlier.

The following gdb command is displaying the disassembled instructions found at the entry point of the retal_virus_sample executable:

(gdb) x/12i 0x80f56f
   0x80f56f:  push   %r11
   0x80f571:  movswl %r15w,%r11d
   0x80f575:  movzwq -0x20d547(%rip),%r11        # 0x602036
   0x80f57d:  bt     $0xd,%r11w
   0x80f583:  movabs $0x5ebe954fa,%r11
   0x80f58d:  sbb    %dx,-0x20d563(%rip)        # 0x602031
   0x80f594:  push   %rsi
   0x80f595:  sete   %sil
   0x80f599:  btr    %rbp,%r11
   0x80f59d:  imul   -0x20d582(%rip),%esi        # 0x602022
   0x80f5a4:  negw   -0x20d57b(%rip)        # 0x602030 <completed.6458>
   0x80f5ab:  bswap  %rsi

I think we can quickly agree that the preceding code does not look like the libc initialization code that we would expect to see in the entry point code of an untampered executable. You can simply compare it with the expected libc initialization code that we looked at from bin1 to find this out.

Other signs of modified entry points are when the address points to any section outside of the .text section, especially if it's a section that is the last-most section within the text segment (sometimes this the .eh_frame section). Another sure sign is if the address points to a location within the data segment that will generally be marked as executable (visible with readelf -l) so that it can execute the parasite code.

Note

Typically, the data segment is marked as RW, because no code is supposed to be executing in that segment. If you see the data marked RWX then let that serve as a red flag, because it is extremely suspicious.

Modifying the entry point is not the only way to create an entry point to insert code. It is a common way to achieve it, and being able to detect this is an important heuristic, especially in malware because it can reveal the start point of the parasite code. In the next section, we will discuss other methods used to hijack control flow, which is not always at the beginning of execution, but in the middle or even at the end.

Previous Chapter

Summary

Next Chapter

Detecting other forms of control flow hijacking

Table of Contents for Learning Linux Binary Analysis

Chapter 6. ELF Binary Forensics in Linux

The science of detecting entry point modification

Note

Note

Note

Note

Table of Contents for
Learning Linux Binary Analysis