ELF relocations

From the ELF(5) man pages:

Relocation is the process of connecting symbolic references with symbolic definitions. Relocatable files must have information that describes how to modify their section contents, thus allowing executable and shared object files to hold the right information for a process's program image. Relocation entries are these data.

The process of relocation relies on symbols and sections, which is why we covered symbols and sections first. In relocations, there are relocation records, which essentially contain information about how to patch the code related to a given symbol. Relocations are literally a mechanism for binary patching and even hot-patching in memory when the dynamic linker is involved. The linker program: /bin/ld that is used to create executable files, and shared libraries must have some type of metadata that describes how to patch certain instructions. This metadata is stored as what we call relocation records. I will further explain relocations by using an example.

Imagine having two object files linked together to create an executable. We have obj1.o that contains the code to call a function named foo() that is located in obj2.o. Both obj1.o and obj2.o are analyzed by the linker program and contain relocation records so that they may be linked to create a fully working executable program. Symbolic references will be resolved into symbolic definitions, but what does that even mean? Object files are relocatable code, which means that it is code that is meant to be relocated to a location at a given address within an executable segment. Before the relocation process happens, the code has symbols and code that will not properly function or cannot be properly referenced without first knowing their location in memory. These must be patched after the position of the instruction or symbol within the executable segment is known by the linker.

Let's take a quick look at a 64-bit relocation entry:

typedef struct {
        Elf64_Addr r_offset;
        Uint64_t   r_info;
} Elf64_Rel;

And some relocation entries require an addend:

typedef struct {
        Elf64_Addr r_offset;
        uint64_t   r_info;
        int64_t    r_addend;
} Elf64_Rela;

The r_offset points to the location that requires the relocation action. A relocation action describes the details of how to patch the code or data contained at r_offset.

The r_info gives both the symbol table index with respect to which the relocation must be made and the type of relocation to apply.

The r_addend specifies a constant addend used to compute the value stored in the relocatable field.

The relocation records for 32-bit ELF files are the same as for 64-bit, but use 32-bit integers. The following example for are object file code will be compiled as 32-bit so that we can demonstrate implicit addends, which are not as commonly used in 64-bit. An implicit addend occurs when the relocation records are stored in ElfN_Rel type structures that don't contain an r_addend field and therefore the addend is stored in the relocation target itself. The 64-bit executables tend to use the ElfN_Rela structs that contain an explicit addend. I think it is worth understanding both scenarios, but implicit addends are a little more confusing, so it makes sense to bring light to this area.

Let's take a look at the source code:

_start()
{
   foo();
}

We see that it calls the foo() function. However, the foo() function is not located directly within that source code file; so, upon compiling, there will be a relocation entry created that is necessary for later satisfying the symbolic reference:

$ objdump -d obj1.o
obj1.o:     file format elf32-i386
Disassembly of section .text:
00000000 <func>:
   0:   55                      push   %ebp
   1:   89 e5                   mov    %esp,%ebp
   3:   83 ec 08                sub    $0x8,%esp
   6:   e8 fc ff ff ff          call 7 <func+0x7>
   b:   c9                      leave  
   c:   c3                      ret

As we can see, the call to foo() is highlighted and it contains the value 0xfffffffc, which is the implicit addend. Also notice the call 7. The number 7 is the offset of the relocation target to be patched. So when obj1.o (which calls foo() located in obj2.o) is linked with obj2.o to make an executable, a relocation entry that points at offset 7 is processed by the linker, telling it which location (offset 7) needs to be modified. The linker then patches the 4 bytes at offset 7 so that it will contain the real offset to the foo() function, after foo() has been positioned somewhere within the executable.

Note

The call instruction e8 fc ff ff ff contains the implicit addend and is important to remember for this lesson; the value 0xfffffffc is -(4) or -(sizeof(uint32_t)). A dword is 4 bytes on a 32-bit system, which is the size of this relocation target.

$ readelf -r obj1.o

Relocation section '.rel.text' at offset 0x394 contains 1 entries:
 Offset     Info    Type            Sym.Value  Sym. Name
00000007  00000902 R_386_PC32        00000000   foo

As we can see, a relocation field at offset 7 is specified by the relocation entry's r_offset field.

R_386_PC32 is the relocation type. To understand all of these types, read the ELF specs. Each relocation type requires a different computation on the relocation target being modified. R_386_PC32 modifies the target with S + A – P.
S is the value of the symbol whose index resides in the relocation entry.
A is the addend found in the relocation entry.
P is the place (section offset or address) of the storage unit being relocated (computed using r_offset).

Let's look at the final output of our executable after compiling obj1.o and obj2.o on a 32-bit system:

$ gcc -nostdlib obj1.o obj2.o -o relocated
$ objdump -d relocated

test:     file format elf32-i386


Disassembly of section .text:

080480d8 <func>:
 80480d8:   55                      push   %ebp
 80480d9:   89 e5                   mov    %esp,%ebp
 80480db:   83 ec 08                sub    $0x8,%esp
 80480de:   e8 05 00 00 00          call   80480e8 <foo>
 80480e3:   c9                      leave  
 80480e4:   c3                      ret    
 80480e5:   90                      nop
 80480e6:   90                      nop
 80480e7:   90                      nop

080480e8 <foo>:
 80480e8:   55                      push   %ebp
 80480e9:   89 e5                   mov    %esp,%ebp
 80480eb:   5d                      pop    %ebp
 80480ec:   c3                      ret

We can see that the call instruction (the relocation target) at 0x80480de has been modified with the 32-bit offset value of 5, which points foo(). The value 5 is the result of the R386_PC_32 relocation action:

S + A – P: 0x80480e8 + 0xfffffffc – 0x80480df = 5

The 0xfffffffc is the same as –4 if a signed integer, so the calculation can also be seen as:

0x80480e8 + (0x80480df + sizeof(uint32_t))

To calculate an offset into a virtual address, use the following computation:

address_of_call + offset + 5 (Where 5 is the length of the call instruction)

Which in this case is 0x80480de + 5 + 5 = 0x80480e8.

Note

Pay attention to this computation as it is important to remember and can be used when calculating offsets to addresses frequently.

An address may also be computed into an offset with the following computation:

address – address_of_call – 4 (Where 4 is the length of the immediate operand to the call instruction, which is 32bits).

As mentioned previously, the ELF specs cover ELF relocations in depth, and we will be visiting some of the types used in dynamic linking in the next section, such as R386_JMP_SLOT relocation entries.

Relocatable code injection-based binary patching

Relocatable code injection is a technique that hackers, virus writers, or anyone who wants to modify the code in a binary may utilize as a way to relink a binary after it's already been compiled and linked into an executable. That is, you can inject an object file into an executable, update the executable's symbol table to reflect newly inserted functionality, and perform the necessary relocations on the injected object code so that it becomes a part of the executable.

A complicated virus might use this technique rather than just appending position-independent code. This technique requires making room in the target executable to inject the code, followed by applying the relocations. We will cover binary infection and code injection more thoroughly in Chapter 4, ELF Virus Technology – Linux/Unix Viruses.

As mentioned in Chapter 1, The Linux Environment and Its Tools, there is an amazing tool called Eresi (http://www.eresi-project.org), which is capable of relocatable code injection (aka ET_REL injection). I also designed a custom reverse engineering tool for ELF, namely, Quenya. It is very old but can be found at http://www.bitlackeys.org/projects/quenya_32bit.tgz. Quenya has many features and capabilities, and one of them is to inject object code into an executable. This can be very useful for patching a binary by hijacking a given function. Quenya is only a prototype and was never developed to the extent that the Eresi project was. I am only using it as an example because I am more familiar with it; however, I will say that for more reliable results, it may be desirable to either use Eresi or write your own tooling.

Let us pretend we are an attacker and we want to infect a 32-bit program that calls puts() to print Hello World. Our goal is to hijack puts() so that it calls evil_puts():

#include <sys/syscall.h>
int _write (int fd, void *buf, int count)
{
  long ret;

  __asm__ __volatile__ ("pushl %%ebx\n\t"
"movl %%esi,%%ebx\n\t"
"int $0x80\n\t""popl %%ebx":"=a" (ret)
                        :"0" (SYS_write), "S" ((long) fd),
"c" ((long) buf), "d" ((long) count));
  if (ret >= 0) {
    return (int) ret;
  }
  return -1;
}
int evil_puts(void)
{
        _write(1, "HAHA puts() has been hijacked!\n", 31);
}

Now we compile evil_puts.c into evil_puts.o and inject it into our program called ./hello_world:

$ ./hello_world
Hello World

This program calls the following:

puts("Hello World\n");

We now use Quenya to inject and relocate our evil_puts.o file into hello_world:

[Quenya v0.1@alchemy] reloc evil_puts.o hello_world
0x08048624  addr: 0x8048612
0x080485c4 _write addr: 0x804861e
0x080485c4  addr: 0x804868f
0x080485c4  addr: 0x80486b7
Injection/Relocation succeeded

As we can see, the write() function from our evil_puts.o object file has been relocated and assigned an address at 0x804861e in the executable file hello_world. The next command hijack overwrites the global offset table entry for puts() with the address of evil_puts():

[Quenya v0.1@alchemy] hijack binary hello_world evil_puts puts
Attempting to hijack function: puts
Modifying GOT entry for puts
Successfully hijacked function: puts
Committing changes into executable file
[Quenya v0.1@alchemy] quit

And Whammi!

ryan@alchemy:~/quenya$ ./hello_world
HAHA puts() has been hijacked!

We have successfully relocated an object file into an executable and modified the executable's control flow so that it executes the code that we injected. If we use readelf -s on hello_world, we can actually now see a symbol for evil_puts().

For your interest, I have included a small snippet of code that contains the ELF relocation mechanics in Quenya; it may be a little bit obscure without seeing the rest of the code base, but it is also somewhat straightforward if you have retained what we learned about relocations:

switch(obj.shdr[i].sh_type)
{
case SHT_REL: /* Section contains ElfN_Rel records */
rel = (Elf32_Rel *)(obj.mem + obj.shdr[i].sh_offset);
for (j = 0; j < obj.shdr[i].sh_size / sizeof(Elf32_Rel); j++, rel++)
{
/* symbol table */ 
symtab = (Elf32_Sym *)obj.section[obj.shdr[i].sh_link]; 

/* symbol we are applying relocation to */
symbol = &symtab[ELF32_R_SYM(rel->r_info)];

/* section to modify */
TargetSection = &obj.shdr[obj.shdr[i].sh_info];
TargetIndex = obj.shdr[i].sh_info;

/* target location */
TargetAddr = TargetSection->sh_addr + rel->r_offset;

/* pointer to relocation target */
RelocPtr = (Elf32_Addr *)(obj.section[TargetIndex] + rel->r_offset);

/* relocation value */
RelVal = symbol->st_value; 
RelVal += obj.shdr[symbol->st_shndx].sh_addr;

printf("0x%08x %s addr: 0x%x\n",RelVal, &SymStringTable[symbol->st_name], TargetAddr);

switch (ELF32_R_TYPE(rel->r_info)) 
{
/* R_386_PC32      2    word32  S + A - P */ 
case R_386_PC32:
*RelocPtr += RelVal;
*RelocPtr -= TargetAddr;
break;

/* R_386_32        1    word32  S + A */
case R_386_32:
*RelocPtr += RelVal;
     break;
 } 
}

As shown in the preceding code, the relocation target that RelocPtr points to is modified according to the relocation action requested by the relocation type (such as R_386_32).

Although relocatable code binary injection is a good example of the idea behind relocations, it is not a perfect example of how a linker actually performs it with multiple object files. Nevertheless, it still retains the general idea and application of a relocation action. Later on we will talk about shared library (ET_DYN) injection, which brings us now to the topic of dynamic linking.

Table of Contents for Learning Linux Binary Analysis

ELF relocations

Note

Note

Relocatable code injection-based binary patching

Table of Contents for
Learning Linux Binary Analysis