Table of Contents for
Learning Linux Binary Analysis

Version ebook / Retour

Cover image for bash Cookbook, 2nd Edition Learning Linux Binary Analysis by Ryan elfmaster O'Neill Published by Packt Publishing, 2016
  1. Cover
  2. Table of Contents
  3. Learning Linux Binary Analysis
  4. Learning Linux Binary Analysis
  5. Credits
  6. About the Author
  7. Acknowledgments
  8. About the Reviewers
  9. www.PacktPub.com
  10. Preface
  11. What you need for this book
  12. Who this book is for
  13. Conventions
  14. Reader feedback
  15. Customer support
  16. 1. The Linux Environment and Its Tools
  17. Useful devices and files
  18. Linker-related environment points
  19. Summary
  20. 2. The ELF Binary Format
  21. ELF program headers
  22. ELF section headers
  23. ELF symbols
  24. ELF relocations
  25. ELF dynamic linking
  26. Coding an ELF Parser
  27. Summary
  28. 3. Linux Process Tracing
  29. ptrace requests
  30. The process register state and flags
  31. A simple ptrace-based debugger
  32. A simple ptrace debugger with process attach capabilities
  33. Advanced function-tracing software
  34. ptrace and forensic analysis
  35. Process image reconstruction – from the memory to the executable
  36. Code injection with ptrace
  37. Simple examples aren't always so trivial
  38. Demonstrating the code_inject tool
  39. A ptrace anti-debugging trick
  40. Summary
  41. 4. ELF Virus Technology �� Linux/Unix Viruses
  42. ELF virus engineering challenges
  43. ELF virus parasite infection methods
  44. The PT_NOTE to PT_LOAD conversion infection method
  45. Infecting control flow
  46. Process memory viruses and rootkits – remote code injection techniques
  47. ELF anti-debugging and packing techniques
  48. ELF virus detection and disinfection
  49. Summary
  50. 5. Linux Binary Protection
  51. Stub mechanics and the userland exec
  52. Other jobs performed by protector stubs
  53. Existing ELF binary protectors
  54. Downloading Maya-protected binaries
  55. Anti-debugging for binary protection
  56. Resistance to emulation
  57. Obfuscation methods
  58. Protecting control flow integrity
  59. Other resources
  60. Summary
  61. 6. ELF Binary Forensics in Linux
  62. Detecting other forms of control flow hijacking
  63. Identifying parasite code characteristics
  64. Checking the dynamic segment for DLL injection traces
  65. Identifying reverse text padding infections
  66. Identifying text segment padding infections
  67. Identifying protected binaries
  68. IDA Pro
  69. Summary
  70. 7. Process Memory Forensics
  71. Process memory infection
  72. Detecting the ET_DYN injection
  73. Linux ELF core files
  74. Summary
  75. 8. ECFS – Extended Core File Snapshot Technology
  76. The ECFS philosophy
  77. Getting started with ECFS
  78. libecfs – a library for parsing ECFS files
  79. readecfs
  80. Examining an infected process using ECFS
  81. The ECFS reference guide
  82. Process necromancy with ECFS
  83. Learning more about ECFS
  84. Summary
  85. 9. Linux /proc/kcore Analysis
  86. stock vmlinux has no symbols
  87. /proc/kcore and GDB exploration
  88. Direct sys_call_table modifications
  89. Kprobe rootkits
  90. Debug register rootkits – DRR
  91. VFS layer rootkits
  92. Other kernel infection techniques
  93. vmlinux and .altinstructions patching
  94. Using taskverse to see hidden processes
  95. Infected LKMs – kernel drivers
  96. Notes on /dev/kmem and /dev/mem
  97. /dev/mem
  98. K-ecfs – kernel ECFS
  99. Kernel hacking goodies
  100. Summary
  101. Index

ELF virus engineering challenges

The design phase of an ELF virus may be considered an artistic endeavor, requiring creative thinking and clever constructs; many passionate coders will agree with this. Meanwhile, it is a great engineering challenge that exceeds the regular conventions of programming, requiring the developer to think outside conventional paradigms and to manipulate the code, data, and environment into behaving a certain way. At one point in time, I did a security assessment at a large antivirus (AV) company for one of their products. While talking with the developers of the AV software, I was amazed that next to none of them had any real idea of how to engineer a virus, let alone design any real heuristics for identifying them (other than signatures). The truth is that virus writing is difficult, and requires serious skill. There are a number of challenges that come into play when engineering them, and before we discuss the engineering components, let's look at what some of these challenges are.

Parasite code must be self-contained

A parasite must be able to physically exist inside another program. This means that it does not have the luxury of linking to outside libraries through the dynamic linker. The parasite must be self-contained, which means that it relies on no external linking, is position independent, and is able to dynamically calculate memory addresses within itself; this is because the addresses will change between each infection, since the parasite will be injected into an existing binary where its position will change each time. This means that if the parasite code references a function or a string by its address, the hardcoded address will change and the code will fail; instead, use IP-relative code with a function that calculates the address of the code/data by its offset to the instruction pointer.

Note

In some more complex memory viruses such as my Saruman virus, I allow the parasite to be compiled as an executable program with dynamic linking, but the code to launch it into a process address space is very complicated, because it must handle relocations and dynamic linking manually. There are also relocatable code injectors such as Quenya, which allow a parasite to be compiled as relocatable objects, but the infector must be able to support handling relocations during the infection phase.

Solution

Compile your initial virus executable with the gcc option -nostdlib. You may also compile it with -fpic -pie to make the executable position-independent code (PIC). The IP-relative addressing available on x86_64 machines is actually a nice feature for virus writers. Create your own common functions, such as strcpy() and memcmp(). When you need advanced functionality such as heap allocation with malloc(), you may instead use sys_brk() or sys_mmap() to create your own allocation routines. Create your own syscall wrappers, for example, a wrapper for the mmap syscall is shown here, using C and inline assembly:

#define __NR_MMAP 9
void *_mmap(unsigned long addr, unsigned long len, unsigned long prot, unsigned long flags, long fd, unsigned long off)
{
        long mmap_fd = fd;
        unsigned long mmap_off = off;
        unsigned long mmap_flags = flags;
        unsigned long ret;

        __asm__ volatile(
                         "mov %0, %%rdi\n"
                         "mov %1, %%rsi\n"
                         "mov %2, %%rdx\n"
                         "mov %3, %%r10\n"
                         "mov %4, %%r8\n"
                         "mov %5, %%r9\n"
                         "mov $__NR_MMAP, %%rax\n"
                         "syscall\n" : : "g"(addr), "g"(len), "g"(prot),                "g"(flags), "g"(mmap_fd), "g"(mmap_off));
        __asm__ volatile ("mov %%rax, %0" : "=r"(ret));
        return (void *)ret;
}

Once you have a wrapper calling the mmap() syscall, you can create a simple malloc routine.

The malloc function is used to allocate memory on the heap. Our little malloc function uses a memory-mapped segment for each allocation, which is inefficient but suffices for simple use cases:

void * _malloc(size_t len)
{
        void *mem = _mmap(NULL, len, PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
        if (mem == (void *)-1)
                return NULL;
        return mem;
}

Complications with string storage

This challenge rather blends in with the last section on self-contained code. When handling strings in your virus code, you may have:

const char *name = "elfmaster";

You will want to tend to stay away from code such as the preceding one. This is because the compiler will likely store the elfmaster data in the .rodata section, and then reference that string by its address. The address will not be valid once the virus executable is injected inside another program. This problem is really coupled with the problem of hardcoded addresses that we discussed earlier.

Solution

Use the stack to store strings so that they are dynamically allocated at runtime:

char name[10] = {'e', 'l', 'f', 'm', 'a', 's', 't', 'e', 'r', '\0'};

Another neat trick that I just recently discovered during the construction of the Skeksi virus for 64-bit Linux is to merge the text and data segment into a single segment, that is, read+write+execute (RWX), by using the -N option with gcc. This is very nice because the global data and read-only data, such as the .data and .rodata sections, are all merged into a single segment. This allows the virus to simply inject the entire segment during the infection phase, which will include string literals such as those from .rodata. This technique combined with IP-relative addressing allows a virus author to use traditional string literals:

char *name = "elfmaster";

This type of string can now be used in the virus code, and the method of storing strings on the stack can be avoided entirely. It is important to note, however, that keeping all of the strings stored off the stack in global data will cause the overall size of the virus parasite to increase, which is sometimes undesirable. The Skeksi virus was recently released and is available at http://www.bitlackeys.org/#skeksi.

Finding legitimate space to store parasite code

This is one of the big questions to answer when writing a virus: where will the payload (the body of the virus) be injected? In other words, where in the host binary will the parasite live? The possibilities vary from binary format to binary format. In the ELF format, there are quite a number of places to inject code, but they all require correct adjustment of the various different ELF header values.

The challenge isn't necessarily finding space but rather adjusting the ELF binary to allow you to use that space while keeping the executable file looking reasonably normal and staying within the ELF specifications closely enough so that it still executes properly. There are many things that must be considered when patching a binary and modifying its layout, such as page alignment, offset adjustments, and address adjustments.

Solution

Read the ELF specs carefully when creating new methods of binary patching, and make sure that you stay within the boundaries necessary for program execution. In the next section, we will discuss some techniques of virus infection.

Passing the execution control flow to the parasite

Here is another common challenge, which is how to pass the control flow of the host executable to the parasite. In many cases, it will suffice to adjust the entry point in the ELF file header to point to the parasite code. This is reliable, but also very obvious. If the entry point has been modified to point at the parasite, then we can use readelf -h to see the entry point and immediately know the location of the parasite code.

Solution

If you don't want to modify the entry point address, then consider finding a place where you can insert/modify a branch to your parasite code, such as inserting a jmp or overwriting a function pointer. One great place for this is in the .ctors or .init_array sections, which contain function pointers. The .dtors or .fini_array sections can work as well if you don't mind the parasite executing after the regular program code (instead of before).