Table of Contents for
Learning Linux Binary Analysis

Version ebook / Retour

Cover image for bash Cookbook, 2nd Edition Learning Linux Binary Analysis by Ryan elfmaster O'Neill Published by Packt Publishing, 2016
  1. Cover
  2. Table of Contents
  3. Learning Linux Binary Analysis
  4. Learning Linux Binary Analysis
  5. Credits
  6. About the Author
  7. Acknowledgments
  8. About the Reviewers
  9. www.PacktPub.com
  10. Preface
  11. What you need for this book
  12. Who this book is for
  13. Conventions
  14. Reader feedback
  15. Customer support
  16. 1. The Linux Environment and Its Tools
  17. Useful devices and files
  18. Linker-related environment points
  19. Summary
  20. 2. The ELF Binary Format
  21. ELF program headers
  22. ELF section headers
  23. ELF symbols
  24. ELF relocations
  25. ELF dynamic linking
  26. Coding an ELF Parser
  27. Summary
  28. 3. Linux Process Tracing
  29. ptrace requests
  30. The process register state and flags
  31. A simple ptrace-based debugger
  32. A simple ptrace debugger with process attach capabilities
  33. Advanced function-tracing software
  34. ptrace and forensic analysis
  35. Process image reconstruction – from the memory to the executable
  36. Code injection with ptrace
  37. Simple examples aren't always so trivial
  38. Demonstrating the code_inject tool
  39. A ptrace anti-debugging trick
  40. Summary
  41. 4. ELF Virus Technology �� Linux/Unix Viruses
  42. ELF virus engineering challenges
  43. ELF virus parasite infection methods
  44. The PT_NOTE to PT_LOAD conversion infection method
  45. Infecting control flow
  46. Process memory viruses and rootkits – remote code injection techniques
  47. ELF anti-debugging and packing techniques
  48. ELF virus detection and disinfection
  49. Summary
  50. 5. Linux Binary Protection
  51. Stub mechanics and the userland exec
  52. Other jobs performed by protector stubs
  53. Existing ELF binary protectors
  54. Downloading Maya-protected binaries
  55. Anti-debugging for binary protection
  56. Resistance to emulation
  57. Obfuscation methods
  58. Protecting control flow integrity
  59. Other resources
  60. Summary
  61. 6. ELF Binary Forensics in Linux
  62. Detecting other forms of control flow hijacking
  63. Identifying parasite code characteristics
  64. Checking the dynamic segment for DLL injection traces
  65. Identifying reverse text padding infections
  66. Identifying text segment padding infections
  67. Identifying protected binaries
  68. IDA Pro
  69. Summary
  70. 7. Process Memory Forensics
  71. Process memory infection
  72. Detecting the ET_DYN injection
  73. Linux ELF core files
  74. Summary
  75. 8. ECFS – Extended Core File Snapshot Technology
  76. The ECFS philosophy
  77. Getting started with ECFS
  78. libecfs – a library for parsing ECFS files
  79. readecfs
  80. Examining an infected process using ECFS
  81. The ECFS reference guide
  82. Process necromancy with ECFS
  83. Learning more about ECFS
  84. Summary
  85. 9. Linux /proc/kcore Analysis
  86. stock vmlinux has no symbols
  87. /proc/kcore and GDB exploration
  88. Direct sys_call_table modifications
  89. Kprobe rootkits
  90. Debug register rootkits – DRR
  91. VFS layer rootkits
  92. Other kernel infection techniques
  93. vmlinux and .altinstructions patching
  94. Using taskverse to see hidden processes
  95. Infected LKMs – kernel drivers
  96. Notes on /dev/kmem and /dev/mem
  97. /dev/mem
  98. K-ecfs – kernel ECFS
  99. Kernel hacking goodies
  100. Summary
  101. Index

Detecting other forms of control flow hijacking

There are many reasons to modify a binary, and depending on the desired functionality, the binary control flow will be patched in different ways. In the previous example of the Retaliation Virus, the entry point in the ELF file header was modified. There are many other ways to transfer execution to the inserted code, and we will discuss a few of the more common approaches.

Patching the .ctors/.init_array section

In ELF executables and shared libraries, you will notice that there is a section commonly present named .ctors (commonly also named .init_array). This section contains an array of addresses that are function pointers called by the initialization code from the .init section. The function pointers refer to functions created with the constructor attribute, which are executed before main(). This means that the .ctors function pointer table can be patched with an address that points to the code that has been injected into the binary, which we refer to as the parasite code.

It is relatively easy to check whether or not one of the addresses in the .ctors section is valid. The constructor routines should always be stored specifically within the .text section of the text segment. Remember from Chapter 2, The ELF Binary Format, that the .text section is not the text segment, but rather a section that resides within the range of the text segment. If the .ctors section contains any function pointers that refer to locations outside of the .text section, then it is probably time to get suspicious.

Note

A side note on .ctors for anti-anti-debugging

Some binaries that incorporate anti-debugging techniques will actually create a legal constructor function that calls ptrace(PTRACE_TRACEME, 0);.

As discussed in Chapter 4, ELF Virus Technology – Linux/Unix Viruses, this technique prevents a debugger from attaching to the process since only one tracer can be attached at any given time. If you discover that a binary has a function that performs this anti-debugging trick and has a function pointer in .ctors, then it is advised to simply patch that function pointer with 0x00000000 or 0xffffffff that will direct the __libc_start_main() function to ignore it, therefore effectively disabling the anti-debugging technique. This task could be easily accomplished in GDB with the set command, for example, set {long}address = 0xffffffff, assuming that address is the location of the .ctors entry you want to modify.

Detecting PLT/GOT hooks

This technique has been used as far back as 1998 when it was published by Silvio Cesare in http://phrack.org/issues/56/7.html, which discusses the techniques of shared library redirection.

In Chapter 2, The ELF Binary Format, we carefully examined dynamic linking and I explained the inner workings of the PLT (procedure linkage table) and GOT (global offset table). Specifically, we looked at lazy linking and how the PLT contains code stubs that transfer control to addresses that are stored in the GOT. If a shared library function such as printf has never been called before, then the address stored in the GOT will point back to the PLT, which then invokes the dynamic linker, subsequently filling in the GOT with the address that points to the printf function from the libc shared library that is mapped into the process address space.

It is common for both static (at rest) and hot-patching (in memory) to modify one or more GOT entries so that a patched in function is called instead of the original. We will examine a binary that has been injected with an object file that contains a function that simply writes a string to stdout. The GOT entry for puts(char *); has been patched with an address that points to the injected function.

The first three GOT entries are reserved and will typically not be patched because it will likely prevent the executable from running correctly (See Chapter 2, The ELF Binary Format, section on Dynamic linking). Therefore, as analysts, we are interested in observing the entries starting at GOT[3]. Each GOT value should be an address. The address can have one of two values that would be considered valid:

  • Address pointer that points back into the PLT
  • Address pointer that points to a valid shared library function

When a binary is infected on disk (versus runtime infection), then a GOT entry will be patched with an address that points somewhere within the binary where code has been injected. Recall from Chapter 4, ELF Virus Technology – Linux/Unix Viruses, that there are numerous ways to inject code into an executable file. In the binary sample that we will look at here, a relocatable object file (ET_REL) was inserted at the end of the text segment using the Silvio padding infection discussed in Chapter 4, ELF Virus Technology – Linux/Unix Viruses.

When analyzing the .got.plt section of a binary that has been infected, we must carefully validate each address from GOT[4] through GOT[N]. This is still easier than looking at the binary in memory because before the binary is executed, the GOT entries should always point only to the PLT, as no shared library functions have been resolved yet.

Using the readelf -S utility and looking for the .plt section, we can deduce the PLT address range. In the case of the 32-bit binary I am looking at now, it is 0x8048300 - 0x8048350. Remember this range before we look at the following .got.plt section.

Truncated output from readelf -S command

[12] .plt     PROGBITS        08048300 000300 000050 04  AX  0   0 16

Now let's take a look at the .got.plt section of a 32-bit binary and see if any of the relevant addresses are pointing outside of 0x80483000x8048350:

Contents of section .got.plt:
…
0x804a00c: 28860408 26830408 36830408 …

So let's take these addresses out of their little endian byte ordering and validate that each one points within the .plt section as expected:

  • 08048628: This does not point to PLT!
  • 08048326: This is valid
  • 08048336: This is valid
  • 08048346: This is valid

The GOT location 0x804a00c contains the address 0x8048628, which does not point to a valid location. We can see what shared library function 0x804a00c corresponds to by looking at the relocation entries with the readelf -r command, which shows us that the infected GOT entry corresponds to the libc function puts():

Relocation section '.rel.plt' at offset 0x2b0 contains 4 entries:
 Offset     Info    Type            Sym.Value  Sym. Name
0804a00c  00000107 R_386_JUMP_SLOT   00000000   puts
0804a010  00000207 R_386_JUMP_SLOT   00000000   __gmon_start__
0804a014  00000307 R_386_JUMP_SLOT   00000000   exit
0804a018  00000407 R_386_JUMP_SLOT   00000000   __libc_start_main

So the GOT location 0x804a00c is the relocation unit for the puts() function. Typically, it should contain an address that points to the PLT stub for the GOT offset so that the dynamic linker will be invoked and resolve the runtime value for that symbol. In this case, the GOT entry contains the address 0x8048628, which points to a suspicious bit of code at the end of the text segment:

 8048628:       55                      push   %ebp
 8048629:       89 e5                   mov    %esp,%ebp
 804862b:       83 ec 0c                sub    $0xc,%esp
 804862e:       c7 44 24 08 25 00 00    movl   $0x25,0x8(%esp)
 8048635:       00
 8048636:       c7 44 24 04 4c 86 04    movl   $0x804864c,0x4(%esp)
 804863d:       08
 804863e:       c7 04 24 01 00 00 00    movl   $0x1,(%esp)
 8048645:       e8 a6 ff ff ff          call   80485f0 <_write>
 804864a:       c9                      leave  
 804864b:       c3                      ret  

Technically, we don't even have to know what this code does in order to know that the GOT was hijacked because the GOT should only contain addresses that point to the PLT, and this is clearly not a PLT address:

$ ./host
HAHA puts() has been hijacked!
$

A further exercise would be to disinfect this binary manually, which is something we do in the ELF workshop trainings I provide periodically. Disinfecting this binary would primarily entail patching the .got.plt entry that contains the pointer to the parasite and replacing it with a pointer to the appropriate PLT stub.

Detecting function trampolines

The term trampoline is used loosely but is originally referred to inline code patching, where the insertion of a branch instruction such as a jmp is placed over the first 5 to 7 bytes of the procedure prologue of a function. Often times, this trampoline is temporarily replaced with the original code bytes if the function that was patched needs to be called in such a way that it behaves as it originally did, and then the trampoline instruction is quickly placed back again. Detecting inline code hooks such as these is quite easy and can even be automated with some degree of ease provided you have a program or script that can disassemble a binary.

Following are two examples of trampoline code (32-bit x86 ASM):

  • Type 1:
    movl $target, %eax
    jmp *%eax
  • Type 2:
    push $target
    ret

A good classic paper on using function trampolines for function hijacking in kernel space was written by Silvio in 1999. The same concepts can be applied today in userland and in the kernel; for the kernel you would have to disable the write protect bit in the cr0 register to make the text segment writeable, or directly modify a PTE to mark a given page as writeable. I personally have had more success with the former method. The original paper on kernel function trampolines can be found at http://vxheaven.org/lib/vsc08.html.

The quickest way to detect function trampolines is to locate the entry point of every single function and verify that the first 5 to 7 bytes of code do not translate to some type of branch instruction. It would be very easy to write a Python script for GDB that can do this. I have written C code to do this in the past fairly easily.