There are many reasons to modify a binary, and depending on the desired functionality, the binary control flow will be patched in different ways. In the previous example of the Retaliation Virus, the entry point in the ELF file header was modified. There are many other ways to transfer execution to the inserted code, and we will discuss a few of the more common approaches.
In ELF executables and shared libraries, you will notice that there is a section commonly present named .ctors (commonly also named .init_array). This section contains an array of addresses that are function pointers called by the initialization code from the .init section. The function pointers refer to functions created with the constructor attribute, which are executed before main(). This means that the .ctors function pointer table can be patched with an address that points to the code that has been injected into the binary, which we refer to as the parasite code.
It is relatively easy to check whether or not one of the addresses in the .ctors section is valid. The constructor routines should always be stored specifically within the .text section of the text segment. Remember from Chapter 2, The ELF Binary Format, that the .text section is not the text segment, but rather a section that resides within the range of the text segment. If the .ctors section contains any function pointers that refer to locations outside of the .text section, then it is probably time to get suspicious.
A side note on .ctors for anti-anti-debugging
Some binaries that incorporate anti-debugging techniques will actually create a legal constructor function that calls ptrace(PTRACE_TRACEME, 0);.
As discussed in Chapter 4, ELF Virus Technology – Linux/Unix Viruses, this technique prevents a debugger from attaching to the process since only one tracer can be attached at any given time. If you discover that a binary has a function that performs this anti-debugging trick and has a function pointer in .ctors, then it is advised to simply patch that function pointer with 0x00000000 or 0xffffffff that will direct the __libc_start_main() function to ignore it, therefore effectively disabling the anti-debugging technique. This task could be easily accomplished in GDB with the set command, for example, set {long}address = 0xffffffff, assuming that address is the location of the .ctors entry you want to modify.
This technique has been used as far back as 1998 when it was published by Silvio Cesare in http://phrack.org/issues/56/7.html, which discusses the techniques of shared library redirection.
In Chapter 2, The ELF Binary Format, we carefully examined dynamic linking and I explained the inner workings of the PLT (procedure linkage table) and
GOT (global offset table). Specifically, we looked at lazy linking and how the PLT contains code stubs that transfer control to addresses that are stored in the GOT. If a shared library function such as printf has never been called before, then the address stored in the GOT will point back to the PLT, which then invokes the dynamic linker, subsequently filling in the GOT with the address that points to the printf function from the libc shared library that is mapped into the process address space.
It is common for both static (at rest) and hot-patching (in memory) to modify one or more GOT entries so that a patched in function is called instead of the original. We will examine a binary that has been injected with an object file that contains a function that simply writes a string to stdout. The GOT entry for puts(char *); has been patched with an address that points to the injected function.
The first three GOT entries are reserved and will typically not be patched because it will likely prevent the executable from running correctly (See Chapter 2, The ELF Binary Format, section on Dynamic linking). Therefore, as analysts, we are interested in observing the entries starting at GOT[3]. Each GOT value should be an address. The address can have one of two values that would be considered valid:
When a binary is infected on disk (versus runtime infection), then a GOT entry will be patched with an address that points somewhere within the binary where code has been injected. Recall from Chapter 4, ELF Virus Technology – Linux/Unix Viruses, that there are numerous ways to inject code into an executable file. In the binary sample that we will look at here, a relocatable object file (ET_REL) was inserted at the end of the text segment using the Silvio padding infection discussed in Chapter 4, ELF Virus Technology – Linux/Unix Viruses.
When analyzing the .got.plt section of a binary that has been infected, we must carefully validate each address from GOT[4] through GOT[N]. This is still easier than looking at the binary in memory because before the binary is executed, the GOT entries should always point only to the PLT, as no shared library functions have been resolved yet.
Using the readelf -S utility and looking for the .plt section, we can deduce the PLT address range. In the case of the 32-bit binary I am looking at now, it is 0x8048300 - 0x8048350. Remember this range before we look at the following .got.plt section.
[12] .plt PROGBITS 08048300 000300 000050 04 AX 0 0 16
Now let's take a look at the .got.plt section of a 32-bit binary and see if any of the relevant addresses are pointing outside of 0x8048300–0x8048350:
Contents of section .got.plt: … 0x804a00c: 28860408 26830408 36830408 …
So let's take these addresses out of their little endian byte ordering and validate that each one points within the .plt section as expected:
08048628: This does not point to PLT!08048326: This is valid08048336: This is valid08048346: This is validThe GOT location 0x804a00c contains the address 0x8048628, which does not point to a valid location. We can see what shared library function 0x804a00c corresponds to by looking at the relocation entries with the readelf -r command, which shows us that the infected GOT entry corresponds to the libc function puts():
Relocation section '.rel.plt' at offset 0x2b0 contains 4 entries:
Offset Info Type Sym.Value Sym. Name
0804a00c 00000107 R_386_JUMP_SLOT 00000000 puts
0804a010 00000207 R_386_JUMP_SLOT 00000000 __gmon_start__
0804a014 00000307 R_386_JUMP_SLOT 00000000 exit
0804a018 00000407 R_386_JUMP_SLOT 00000000 __libc_start_mainSo the GOT location 0x804a00c is the relocation unit for the puts() function. Typically, it should contain an address that points to the PLT stub for the GOT offset so that the dynamic linker will be invoked and resolve the runtime value for that symbol. In this case, the GOT entry contains the address 0x8048628, which points to a suspicious bit of code at the end of the text segment:
8048628: 55 push %ebp 8048629: 89 e5 mov %esp,%ebp 804862b: 83 ec 0c sub $0xc,%esp 804862e: c7 44 24 08 25 00 00 movl $0x25,0x8(%esp) 8048635: 00 8048636: c7 44 24 04 4c 86 04 movl $0x804864c,0x4(%esp) 804863d: 08 804863e: c7 04 24 01 00 00 00 movl $0x1,(%esp) 8048645: e8 a6 ff ff ff call 80485f0 <_write> 804864a: c9 leave 804864b: c3 ret
Technically, we don't even have to know what this code does in order to know that the GOT was hijacked because the GOT should only contain addresses that point to the PLT, and this is clearly not a PLT address:
$ ./host HAHA puts() has been hijacked! $
A further exercise would be to disinfect this binary manually, which is something we do in the ELF workshop trainings I provide periodically. Disinfecting this binary would primarily entail patching the .got.plt entry that contains the pointer to the parasite and replacing it with a pointer to the appropriate PLT stub.
The term trampoline is used loosely but is originally referred to inline code patching, where the insertion of a branch instruction such as a jmp is placed over the first 5 to 7 bytes of the procedure prologue of a function. Often times, this trampoline is temporarily replaced with the original code bytes if the function that was patched needs to be called in such a way that it behaves as it originally did, and then the trampoline instruction is quickly placed back again. Detecting inline code hooks such as these is quite easy and can even be automated with some degree of ease provided you have a program or script that can disassemble a binary.
Following are two examples of trampoline code (32-bit x86 ASM):
movl $target, %eax jmp *%eax
push $target ret
A good classic paper on using function trampolines for function hijacking in kernel space was written by Silvio in 1999. The same concepts can be applied today in userland and in the kernel; for the kernel you would have to disable the write protect bit in the cr0 register to make the text segment writeable, or directly modify a PTE to mark a given page as writeable. I personally have had more success with the former method. The original paper on kernel function trampolines can be found at http://vxheaven.org/lib/vsc08.html.
The quickest way to detect function trampolines is to locate the entry point of every single function and verify that the first 5 to 7 bytes of code do not translate to some type of branch instruction. It would be very easy to write a Python script for GDB that can do this. I have written C code to do this in the past fairly easily.