Table of Contents for
Learning Linux Binary Analysis

Version ebook / Retour

Cover image for bash Cookbook, 2nd Edition Learning Linux Binary Analysis by Ryan elfmaster O'Neill Published by Packt Publishing, 2016
  1. Cover
  2. Table of Contents
  3. Learning Linux Binary Analysis
  4. Learning Linux Binary Analysis
  5. Credits
  6. About the Author
  7. Acknowledgments
  8. About the Reviewers
  9. www.PacktPub.com
  10. Preface
  11. What you need for this book
  12. Who this book is for
  13. Conventions
  14. Reader feedback
  15. Customer support
  16. 1. The Linux Environment and Its Tools
  17. Useful devices and files
  18. Linker-related environment points
  19. Summary
  20. 2. The ELF Binary Format
  21. ELF program headers
  22. ELF section headers
  23. ELF symbols
  24. ELF relocations
  25. ELF dynamic linking
  26. Coding an ELF Parser
  27. Summary
  28. 3. Linux Process Tracing
  29. ptrace requests
  30. The process register state and flags
  31. A simple ptrace-based debugger
  32. A simple ptrace debugger with process attach capabilities
  33. Advanced function-tracing software
  34. ptrace and forensic analysis
  35. Process image reconstruction – from the memory to the executable
  36. Code injection with ptrace
  37. Simple examples aren't always so trivial
  38. Demonstrating the code_inject tool
  39. A ptrace anti-debugging trick
  40. Summary
  41. 4. ELF Virus Technology �� Linux/Unix Viruses
  42. ELF virus engineering challenges
  43. ELF virus parasite infection methods
  44. The PT_NOTE to PT_LOAD conversion infection method
  45. Infecting control flow
  46. Process memory viruses and rootkits – remote code injection techniques
  47. ELF anti-debugging and packing techniques
  48. ELF virus detection and disinfection
  49. Summary
  50. 5. Linux Binary Protection
  51. Stub mechanics and the userland exec
  52. Other jobs performed by protector stubs
  53. Existing ELF binary protectors
  54. Downloading Maya-protected binaries
  55. Anti-debugging for binary protection
  56. Resistance to emulation
  57. Obfuscation methods
  58. Protecting control flow integrity
  59. Other resources
  60. Summary
  61. 6. ELF Binary Forensics in Linux
  62. Detecting other forms of control flow hijacking
  63. Identifying parasite code characteristics
  64. Checking the dynamic segment for DLL injection traces
  65. Identifying reverse text padding infections
  66. Identifying text segment padding infections
  67. Identifying protected binaries
  68. IDA Pro
  69. Summary
  70. 7. Process Memory Forensics
  71. Process memory infection
  72. Detecting the ET_DYN injection
  73. Linux ELF core files
  74. Summary
  75. 8. ECFS – Extended Core File Snapshot Technology
  76. The ECFS philosophy
  77. Getting started with ECFS
  78. libecfs – a library for parsing ECFS files
  79. readecfs
  80. Examining an infected process using ECFS
  81. The ECFS reference guide
  82. Process necromancy with ECFS
  83. Learning more about ECFS
  84. Summary
  85. 9. Linux /proc/kcore Analysis
  86. stock vmlinux has no symbols
  87. /proc/kcore and GDB exploration
  88. Direct sys_call_table modifications
  89. Kprobe rootkits
  90. Debug register rootkits – DRR
  91. VFS layer rootkits
  92. Other kernel infection techniques
  93. vmlinux and .altinstructions patching
  94. Using taskverse to see hidden processes
  95. Infected LKMs – kernel drivers
  96. Notes on /dev/kmem and /dev/mem
  97. /dev/mem
  98. K-ecfs – kernel ECFS
  99. Kernel hacking goodies
  100. Summary
  101. Index

ELF symbols

Symbols are a symbolic reference to some type of data or code such as a global variable or function. For instance, the printf() function is going to have a symbol entry that points to it in the dynamic symbol table .dynsym. In most shared libraries and dynamically linked executables, there exist two symbol tables. In the readelf -S output shown previously, you can see two sections: .dynsym and .symtab.

The .dynsym contains global symbols that reference symbols from an external source, such as libc functions like printf, whereas the symbols contained in .symtab will contain all of the symbols in .dynsym, as well as the local symbols for the executable, such as global variables, or local functions that you have defined in your code. So .symtab contains all of the symbols, whereas .dynsym contains just the dynamic/global symbols.

So the question is: Why have two symbol tables if .symtab already contains everything that's in .dynsym? If you check out the readelf -S output of an executable, you will see that some sections are marked A (ALLOC) or WA (WRITE/ALLOC) or AX (ALLOC/EXEC). If you look at .dynsym, you will see that it is marked ALLOC, whereas .symtab has no flags.

ALLOC means that the section will be allocated at runtime and loaded into memory, and .symtab is not loaded into memory because it is not necessary for runtime. The .dynsym contains symbols that can only be resolved at runtime, and therefore they are the only symbols needed at runtime by the dynamic linker. So, while the .dynsym symbol table is necessary for the execution of dynamically linked executables, the .symtab symbol table exists only for debugging and linking purposes and is often stripped (removed) from production binaries to save space.

Let's take a look at what an ELF symbol entry looks like for 64-bit ELF files:

typedef struct {
uint32_t      st_name;
    unsigned char st_info;
    unsigned char st_other;
    uint16_t      st_shndx;
    Elf64_Addr    st_value;
    Uint64_t      st_size;
} Elf64_Sym;

Symbol entries are contained within the .symtab and .dynsym sections, which is why the sh_entsize (section header entry size) for those sections are equivalent to sizeof(ElfN_Sym).

st_name

The st_name contains an offset into the symbol table's string table (located in either .dynstr or .strtab), where the name of the symbol is located, such as printf.

st_value

The st_value holds the value of the symbol (either an address or offset of its location).

st_size

The st_size contains the size of the symbol, such as the size of a global function ptr, which would be 4 bytes on a 32-bit system.

st_other

This member defines the symbol visibility.

st_shndx

Every symbol table entry is defined in relation to some section. This member holds the relevant section header table index.

st_info

The st_info specifies the symbol type and binding attributes. For a complete list of these types and attributes, consult the ELF(5) man page. The symbol types start with STT whereas the symbol bindings start with STB. As an example, a few common ones are as explained in the next sections.

Symbol types

We've got the following symbol types:

  • STT_NOTYPE: The symbols type is undefined
  • STT_FUNC: The symbol is associated with a function or other executable code
  • STT_OBJECT: The symbol is associated with a data object

Symbol bindings

We've got the following symbol bindings:

  • STB_LOCAL: Local symbols are not visible outside the object file containing their definition, such as a function declared static.
  • STB_GLOBAL: Global symbols are visible to all object files being combined. One file's definition of a global symbol will satisfy another file's undefined reference to the same symbol.
  • STB_WEAK: Similar to global binding, but with less precedence, meaning that the binding is weak and may be overridden by another symbol (with the same name) that is not marked as STB_WEAK.

There are macros for packing and unpacking the binding and type fields:

  • ELF32_ST_BIND(info) or ELF64_ST_BIND(info) extract a binding from an st_info value
  • ELF32_ST_TYPE(info) or ELF64_ST_TYPE(info) extract a type from an st_info value
  • ELF32_ST_INFO(bind, type) or ELF64_ST_INFO(bind, type) convert a binding and a type into an st_info value

Let's look at the symbol table for the following source code:

static inline void foochu()
{ /* Do nothing */ }

void func1()
{ /* Do nothing */ }

_start()
{
        func1();
        foochu();
}

The following is the command to see the symbol table entries for functions foochu and func1:

ryan@alchemy:~$ readelf -s test | egrep 'foochu|func1'
     7: 080480d8     5 FUNC    LOCAL  DEFAULT    2 foochu
     8: 080480dd     5 FUNC    GLOBAL DEFAULT    2 func1

We can see that the foochu function is a value of 0x80480da, and is a function (STT_FUNC) that has a local symbol binding (STB_LOCAL). If you recall, we talked a little bit about LOCAL bindings, which mean that the symbol cannot be seen outside the object file it is defined it, which is why foochu is local, since we declared it with the static keyword in our source code.

Symbols make life easier for everyone; they are a part of ELF objects for the purpose of linking, relocation, readable disassembly, and debugging. This brings me to the topic of a useful tool that I coded in 2013, named ftrace. Similar to, and in the same spirit of ltrace and strace, ftrace will trace all of the function calls made within the binary and can also show other branch instructions such as jumps. I originally designed ftrace to help in reversing binaries for which I didn't have the source code while at work. The ftrace is considered to be a dynamic analysis tool. Let's take a look at some of its capabilities. We compile a binary with the following source code:

#include <stdio.h>

int func1(int a, int b, int c)
{
  printf("%d %d %d\n", a, b ,c);
}

int main(void)
{
  func1(1, 2, 3);
}

Now, assuming that we don't have the preceding source code and we want to know the inner workings of the binary that it compiles into, we can run ftrace on it. First let's look at the synopsis:

ftrace [-p <pid>] [-Sstve] <prog>

The usage is as follows:

  • [-p]: This traces by PID
  • [-t]: This is for the type detection of function args
  • [-s]: This prints string values
  • [-v]: This gives a verbose output
  • [-e]: This gives miscellaneous ELF information (symbols, dependencies)
  • [-S]: This shows function calls with stripped symbols
  • [-C]: This completes the control flow analysis

Let's give it a try:

ryan@alchemy:~$ ftrace -s test
[+] Function tracing begins here:
PLT_call@0x400420:__libc_start_main()
LOCAL_call@0x4003e0:_init()
(RETURN VALUE) LOCAL_call@0x4003e0: _init() = 0
LOCAL_call@0x40052c:func1(0x1,0x2,0x3)  // notice values passed
PLT_call@0x400410:printf("%d %d %d\n")  // notice we see string value
1 2 3
(RETURN VALUE) PLT_call@0x400410: printf("%d %d %d\n") = 6
(RETURN VALUE) LOCAL_call@0x40052c: func1(0x1,0x2,0x3) = 6
LOCAL_call@0x400470:deregister_tm_clones()
(RETURN VALUE) LOCAL_call@0x400470: deregister_tm_clones() = 7

A clever individual might now be asking: What happens if a binary's symbol table has been stripped? That's right; you can strip a binary of its symbol table; however, a dynamically linked executable will always retain .dynsym but will discard .symtab if it is stripped, so only the imported library symbols will show up.

If a binary is compiled statically (gcc-static) or without libc linking (gcc-nostdlib), and it is then stripped with the strip command, a binary will have no symbol table at all since the dynamic symbol table is no longer imperative. The ftrace behaves differently with the –S flag that tells ftrace to show every function call even if there is no symbol attached to it. When using the –S flag, ftrace will display function names as SUB_<address_of_function>, similar to how IDA pro will show functions that have no symbol table reference.

Let's look at the following very simple source code:

int foo(void) {
}

_start()
{
  foo();
  __asm__("leave");
}

The preceding source code simply calls the foo() function and exits. The reason we are using _start() instead of main() is because we compile it with the following:

gcc -nostdlib test2.c -o test2

The gcc flag -nostdlib directs the linker to omit standard libc linking conventions and to simply compile the code that we have and nothing more. The default entry point is a symbol called _start():

ryan@alchemy:~$ ftrace ./test2
[+] Function tracing begins here:
LOCAL_call@0x400144:foo()
(RETURN VALUE) LOCAL_call@0x400144: foo() = 0
Now let's strip the symbol table and run ftrace on it again:
ryan@alchemy:~$ strip test2
ryan@alchemy:~$ ftrace -S test2
[+] Function tracing begins here:
LOCAL_call@0x400144:sub_400144()
(RETURN VALUE) LOCAL_call@0x400144: sub_400144() = 0

We now notice that foo() function has been replaced by sub_400144(), which shows that the function call is happening at address 0x400144. Now if we look at the binary test2 before we stripped the symbols, we can see that 0x400144 is indeed where foo() is located:

ryan@alchemy:~$ objdump -d test2
test2:     file format elf64-x86-64
Disassembly of section .text:
0000000000400144<foo>:
  400144:   55                      push   %rbp
  400145:   48 89 e5                mov    %rsp,%rbp
  400148:   5d                      pop    %rbp
  400149:   c3                      retq   

000000000040014a <_start>:
  40014a:   55                      push   %rbp
  40014b:   48 89 e5                mov    %rsp,%rbp
  40014e:   e8 f1 ff ff ff          callq  400144 <foo>
  400153:   c9                      leaveq
  400154:   5d                      pop    %rbp
  400155:   c3                 retq

In fact, to give you a really good idea of how helpful symbols can be to reverse engineers (when we have them), let's take a look at the test2 binary, this time without symbols to demonstrate how it becomes slightly less obvious to read. This is primarily because branch instructions no longer have a symbol name attached to them, so analyzing the control flow becomes more tedious and requires more annotation, which some disassemblers like IDA-pro allow us to do as we go:

$ objdump -d test2
test2:     file format elf64-x86-64
Disassembly of section .text:
0000000000400144 <.text>:
  400144:   55                      push   %rbp  
  400145:   48 89 e5                mov    %rsp,%rbp
  400148:   5d                      pop    %rbp
  400149:   c3                      retq   
  40014a:   55                      push   %rbp 
  40014b:   48 89 e5                mov    %rsp,%rbp
  40014e:   e8 f1 ff ff ff          callq  0x400144
  400153:   c9                      leaveq
  400154:   5d                      pop    %rbp
  400155:   c3                      retq   

The only thing to give us an idea where a new function starts is by examining the procedure prologue, which is at the beginning of every function, unless (gcc -fomit-frame-pointer) has been used, in which case it becomes less obvious to identify.

This book assumes that the reader already has some knowledge of assembly language, since teaching x86 asm is not the goal of this book, but notice the preceding emboldened procedure prologue, which helps denote the start of each function. The procedure prologue just sets up the stack frame for each new function that has been called by backing up the base pointer on the stack and setting its value to the stack pointers before the stack pointer is adjusted to make room for local variables. This way variables can be referenced as positive offsets from a fixed address stored in the base pointer register ebp/rbp.

Now that we've gotten a grasp on symbols, the next step is to understand relocations. We will see in the next section how symbols, relocations, and sections are all closely tied together and live at the same level of abstraction within the ELF format.