I think that the most prevalent type of process infection is DLL injection, also known as .so injection. It is a clean and effective solution that suits the needs of most attackers and runtime malware. Let's take a look at an infected process, and I will highlight the ways in which we can identify parasite code.
Our infected process is a simple test program named ./host that is infected with the Azazel userland rootkit. Azazel is the newer version of the popular Jynx rootkit. Both of these rootkits rely on LD_PRELOAD to load a malicious shared library that hijacks various glibc shared library functions. We will inspect the infected process using various GNU tools and the Linux environment, such as the /proc filesystem.
The first step while analyzing a process is to map out the address space. The most straightforward way to do this is by looking at the /proc/<pid>/maps file. We want to take note of any strange file mappings and segments with odd permissions. Also in our case, we may need to check the stack for environment variables, so we will want to take note of its location in memory.
Here's an example of memory mappings of an infected process ./host:
$ cat /proc/`pidof host`/maps 00400000-00401000 r-xp 00000000 00:24 5553671 /home/user/git/azazel/host 00600000-00601000 r--p 00000000 00:24 5553671 /home/user/git/azazel/host 00601000-00602000 rw-p 00001000 00:24 5553671 /home/user/git/azazel/host 0066c000-0068d000 rw-p 00000000 00:00 0 [heap] 3001000000-3001019000 r-xp 00000000 08:01 11406078 /lib/x86_64-linux-gnu/libaudit.so.1.0.0 3001019000-3001218000 ---p 00019000 08:01 11406078 /lib/x86_64-linux-gnu/libaudit.so.1.0.0 3001218000-3001219000 r--p 00018000 08:01 11406078 /lib/x86_64-linux-gnu/libaudit.so.1.0.0 3001219000-300121a000 rw-p 00019000 08:01 11406078 /lib/x86_64-linux-gnu/libaudit.so.1.0.0 300121a000-3001224000 rw-p 00000000 00:00 0 3003400000-300340d000 r-xp 00000000 08:01 11406085 /lib/x86_64-linux-gnu/libpam.so.0.83.1 300340d000-300360c000 ---p 0000d000 08:01 11406085 /lib/x86_64-linux-gnu/libpam.so.0.83.1 300360c000-300360d000 r--p 0000c000 08:01 11406085 /lib/x86_64-linux-gnu/libpam.so.0.83.1 300360d000-300360e000 rw-p 0000d000 08:01 11406085 /lib/x86_64-linux-gnu/libpam.so.0.83.1 7fc30ac7f000-7fc30ac81000 r-xp 00000000 08:01 11406070 /lib/x86_64-linux-gnu/libutil-2.19.so 7fc30ac81000-7fc30ae80000 ---p 00002000 08:01 11406070 /lib/x86_64-linux-gnu/libutil-2.19.so 7fc30ae80000-7fc30ae81000 r--p 00001000 08:01 11406070 /lib/x86_64-linux-gnu/libutil-2.19.so 7fc30ae81000-7fc30ae82000 rw-p 00002000 08:01 11406070 /lib/x86_64-linux-gnu/libutil-2.19.so 7fc30ae82000-7fc30ae85000 r-xp 00000000 08:01 11406068 /lib/x86_64-linux-gnu/libdl-2.19.so 7fc30ae85000-7fc30b084000 ---p 00003000 08:01 11406068 /lib/x86_64-linux-gnu/libdl-2.19.so 7fc30b084000-7fc30b085000 r--p 00002000 08:01 11406068 /lib/x86_64-linux-gnu/libdl-2.19.so 7fc30b085000-7fc30b086000 rw-p 00003000 08:01 11406068 /lib/x86_64-linux-gnu/libdl-2.19.so 7fc30b086000-7fc30b241000 r-xp 00000000 08:01 11406096 /lib/x86_64-linux-gnu/libc-2.19.so 7fc30b241000-7fc30b440000 ---p 001bb000 08:01 11406096 /lib/x86_64-linux-gnu/libc-2.19.so 7fc30b440000-7fc30b444000 r--p 001ba000 08:01 11406096 /lib/x86_64-linux-gnu/libc-2.19.so 7fc30b444000-7fc30b446000 rw-p 001be000 08:01 11406096 /lib/x86_64-linux-gnu/libc-2.19.so 7fc30b446000-7fc30b44b000 rw-p 00000000 00:00 0 7fc30b44b000-7fc30b453000 r-xp 00000000 00:24 5553672 /home/user/git/azazel/libselinux.so 7fc30b453000-7fc30b652000 ---p 00008000 00:24 5553672 /home/user/git/azazel/libselinux.so 7fc30b652000-7fc30b653000 r--p 00007000 00:24 5553672 /home/user/git/azazel/libselinux.so 7fc30b653000-7fc30b654000 rw-p 00008000 00:24 5553672 /home/user/git/azazel/libselinux.so 7fc30b654000-7fc30b677000 r-xp 00000000 08:01 11406093 /lib/x86_64-linux-gnu/ld-2.19.so 7fc30b847000-7fc30b84c000 rw-p 00000000 00:00 0 7fc30b873000-7fc30b876000 rw-p 00000000 00:00 0 7fc30b876000-7fc30b877000 r--p 00022000 08:01 11406093 /lib/x86_64-linux-gnu/ld-2.19.so 7fc30b877000-7fc30b878000 rw-p 00023000 08:01 11406093 /lib/x86_64-linux-gnu/ld-2.19.so 7fc30b878000-7fc30b879000 rw-p 00000000 00:00 0 7fff82fae000-7fff82fcf000 rw-p 00000000 00:00 0 [stack] 7fff82ffb000-7fff82ffd000 r-xp 00000000 00:00 0 [vdso] ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
The areas of interest and concern are highlighted in the preceding output of the maps file for the process of ./host. In particular, notice the shared library with the /home/user/git/azazel/libselinux.so path. This should immediately grab your attention because the path is not the standard shared library path and it has the name libselinux.so, which is traditionally stored with all other shared libraries (that is, /usr/lib).
This could indicate possible shared library injection (also known as the ET_DYN injection), which would mean that this is not the authentic libselinux.so library. The first thing that we might check for in this case is the LD_PRELOAD environment variable to see whether it was used to
preload the libselinux.so library.
The environment variables for a program are stored near the bottom of the stack at the beginning of a program's runtime. The bottom of the stack is actually the highest address (the beginning of the stack), since the stack grows into smaller addresses on the x86 architecture. Based on the output from /proc/<pid>/maps, we can get the location of the stack:
STACK_TOP STACK_BOTTOM 7fff82fae000 - 7fff82fcf000
So, we want to check the stack from 0x7fff82fcf000 onward. Using GDB, we can attach to the process and quickly locate the environment variables on the stack by using the x/s <address> command, which tells GDB to view the memory in ASCII format. The x/4096s <address> command does the same thing but reads from 4,096 bytes of data.
We can safely presume that the environment variables will be in the first 4,096 bytes of the stack, but since the stack grows into lower addresses, we must start reading at <stack_bottom> - 4096.
Here's an example of using GDB to read environment variables on a stack:
$ gdb -q attach `pidof host`
$ x/4096s (0x7fff82fcf000 – 4096)
… scroll down a few pages …
0x7fff82fce359: "./host"
0x7fff82fce360: "LD_PRELOAD=./libselinux.so"
0x7fff82fce37b: "XDG_VTNR=7"
---Type <return> to continue, or q <return> to quit---
0x7fff82fce386: "XDG_SESSION_ID=c2"
0x7fff82fce398: "CLUTTER_IM_MODULE=xim"
0x7fff82fce3ae: "SELINUX_INIT=YES"
0x7fff82fce3bf: "SESSION=ubuntu"
0x7fff82fce3ce: "GPG_AGENT_INFO=/run/user/1000/keyring-jIVrX2/gpg:0:1"
0x7fff82fce403: "TERM=xterm"
0x7fff82fce40e: "SHELL=/bin/bash"
… truncated …As we can see from the preceding output, we have verified that LD_PRELOAD was used to preload libselinux.so into the process. This means that any glibc functions within the program that have the same name as any functions in the preloaded shared library will be overridden and effectively hijacked by the ones in libselinux.so.
In other words, if the ./host program calls the fopen function from glibc and libselinux.so contains its own version of fopen, then that is the fopen function that will be stored in the PLT/GOT (the .got.plt section) and used instead of the glibc version. This leads us to the next indicated item—detecting function hijacking in the PLT/GOT (the PLT's global offset table).
Before checking the PLT/GOT that is in the ELF section called .got.plt (which is in the data segment of the executable), let's see which functions in the ./host program have relocations for the PLT/GOT. Remember from the chapter on ELF internals that the relocation entries for the global offset table are of the <ARCH>_JUMP_SLOT type. Refer to the ELF(5) manual for details.
The relocation type for the PLT/GOT is called <ARCH>_JUMP_SLOT because they are just that—jump slots. They contain function pointers that the PLT uses with jmp instructions to transfer control to the destination function. The actual relocation types are named X86_64_JUMP_SLOT, i386_JUMP_SLOT, and so on depending on the architecture.
Here's an example of identifying shared library functions:
$ readelf -r host Relocation section '.rela.plt' at offset 0x418 contains 7 entries: 000000601018 000100000007 R_X86_64_JUMP_SLO 0000000000000000 unlink + 0 000000601020 000200000007 R_X86_64_JUMP_SLO 0000000000000000 puts + 0 000000601028 000300000007 R_X86_64_JUMP_SLO 0000000000000000 opendir + 0 000000601030 000400000007 R_X86_64_JUMP_SLO 0000000000000000 __libc_start_main+0 000000601038 000500000007 R_X86_64_JUMP_SLO 0000000000000000 __gmon_start__+0 000000601040 000600000007 R_X86_64_JUMP_SLO 0000000000000000 pause + 0 000000601048 000700000007 R_X86_64_JUMP_SLO 0000000000000000 fopen + 0
We can see that there are several well-known glibc functions being called. It is possible that some or all of these are being hijacked by the imposture shared library libselinux.so.
From the readelf output that displays the PLT/GOT entries in the ./host executable, we can see the address of each symbol. Let's take a look at the global offset table in the memory for the following symbols: fopen, opendir, and unlink. It is possible that these have been hijacked and no longer point to the libc.so library.
Here's an example of the GDB output displaying the GOT values:
(gdb) x/gx 0x601048 0x601048 <fopen@got.plt>: 0x00007fc30b44e609 (gdb) x/gx 0x601018 0x601018 <unlink@got.plt>: 0x00007fc30b44ec81 (gdb) x/gx 0x601028 0x601028 <opendir@got.plt>: 0x00007fc30b44ed77
A quick look at the executable memory region of the selinux.so shared library shows us that the addresses displayed in the GOT by GDB point to functions within selinux.so and not libc.so:
7fc30b44b000-7fc30b453000 r-xp /home/user/git/azazel/libselinux.so
With this particular malware (Azazel), the malicious shared library was preloaded using LD_PRELOAD, which made verifying the library as suspicious an easy task. This is not always the case, as many forms of malware will inject the shared library via ptrace() or shellcode that uses either mmap() or __libc_dlopen_mode(). The heuristics for determining whether or not a shared library has been injected will be detailed in the next section.
As we just demonstrated, detecting shared libraries that have been preloaded with LD_PRELOAD is rather simple. What about shared libraries that were injected into a remote process? Or in other words, shared objects that were inserted into a pre-existing process? It is important to know whether or not a shared library was maliciously injected if we want to be able to take the next step and detect PLT/GOT hooks. First, we must identify all the ways in which a shared library can be injected into a remote process, as we briefly discussed in section 7.2.2.
Let's look at a concrete example of how this might be accomplished. Here is some example code from Saruman that injects PIE executables into a process.
Using the readelf utility, we can see that in the standard C library (libc.so.6), there exists a function named __libc_dlopen_mode. This function actually accomplishes the same thing as the dlopen function, which is not resident in libc. This means that with any process that uses libc, we can get the dynamic linker to load whatever ET_DYN object we want to, while also automatically handling all the relocation patches.
It is rather common for attackers to use this function to load ET_DYN objects into a process:
$ readelf -s /lib/x86_64-linux-gnu/libc.so.6 | grep dlopen 2128: 0000000000136160 146 FUNC GLOBAL DEFAULT 12 __libc_dlopen_mode@@GLIBC_PRIVATE
The following code is in C, but when compiled into machine code, it can be used as shellcode that we inject into the process using ptrace:
#define __RTLD_DLOPEN 0x80000000 //glibc internal dlopen flag emulates dlopen behaviour
__PAYLOAD_KEYWORDS__ void * dlopen_load_exec(const char *path, void *dlopen_addr)
{
void * (*libc_dlopen_mode)(const char *, int) = dlopen_addr;
void *handle = (void *)0xfff; //initialized for debugging
handle = libc_dlopen_mode(path, __RTLD_DLOPEN|RTLD_NOW|RTLD_GLOBAL);
__RETURN_VALUE__(handle);
__BREAKPOINT__;
}Notice that one of the arguments is void *dlopen_addr. Saruman locates the address to the __libc_dlopen_mode() function, which resides in libc.so. This is accomplished using a function for resolving symbols within the libc library.
There are many more details to the following code, and I would highly encourage you to check out Saruman. It is specifically for injecting executable programs that are compiled as ET_DYN objects, but as mentioned previously, the injection method will also work for shared libraries since they are also compiled as ET_DYN objects:
Elf64_Addr get_sym_from_libc(handle_t *h, const char *name)
{
int fd, i;
struct stat st;
Elf64_Addr libc_base_addr = get_libc_addr(h->tasks.pid);
Elf64_Addr symaddr;
if ((fd = open(globals.libc_path, O_RDONLY)) < 0) {
perror("open libc");
exit(-1);
}
if (fstat(fd, &st) < 0) {
perror("fstat libc");
exit(-1);
}
uint8_t *libcp = mmap(NULL, st.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
if (libcp == MAP_FAILED) {
perror("mmap libc");
exit(-1);
}
symaddr = resolve_symbol((char *)name, libcp);
if (symaddr == 0) {
printf("[!] resolve_symbol failed for symbol '%s'\n", name);
printf("Try using --manual-elf-loading option\n");
exit(-1);
}
symaddr = symaddr + globals.libc_addr;
DBG_MSG("[DEBUG]-> get_sym_from_libc() addr of __libc_dl_*: %lx\n", symaddr);
return symaddr;
}To further demystify shared library injection, let me show you a much simpler technique that uses ptrace injected shellcode to open()/mmap() the shared library into the process address space. This technique is fine to use, but it requires that the malware manually handle all of the hot patching of relocations. The __libc_dlopen_mode() function handles all of this transparently with the help of the dynamic linker itself, so it is actually easier in the long run.
The following shellcode can be injected into an executable segment within a given process and then be executed using ptrace.
Note that this is the second time I've used this hand-written shellcode as an example in the book. I wrote it in 2008 for a 32-bit Linux system, and it was convenient to use as an example. Otherwise, I'm sure I would have written something new to demonstrate a more modern approach in x86_64 Linux:
_start:
jmp B
A:
# fd = open("libtest.so.1.0", O_RDONLY);
xorl %ecx, %ecx
movb $5, %al
popl %ebx
xorl %ecx, %ecx
int $0x80
subl $24, %esp
# mmap(0, 8192, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_SHARED, fd, 0);
xorl %edx, %edx
movl %edx, (%esp)
movl $8192,4(%esp)
movl $7, 8(%esp)
movl $2, 12(%esp)
movl %eax,16(%esp)
movl %edx, 20(%esp)
movl $90, %eax
movl %esp, %ebx
int $0x80
# the int3 will pass control back the tracer
int3
B:
call A
.string "/lib/libtest.so.1.0"With PTRACE_POKETEXT to inject it and PTRACE_SETREGS to set %eip to the entry point of the shellcode, once the shellcode hits the int3 instruction, it will effectively pass the control back to your program that is performing the infection. This can then simply detach from the host process that is now infected with the shared library (/lib/libtest.so.1.0).
In some cases, such as on binaries that have PaX mprotect restrictions enabled (https://pax.grsecurity.net/docs/mprotect.txt), the ptrace system call cannot be used to inject shellcode into the text segment. This is because it is read-only, and the restrictions will also prevent marking the text segment writeable, so you cannot simply get around this. However, this can be circumvented in several ways, such as by setting the instruction pointer to __libc_dlopen_mode and storing the arguments to the function in registers (such as %rdi, %rsi, and so on). Alternatively, in the case of a 32-bit architecture, the arguments can be stored on the stack.
Another way is by manipulating the VDSO code that is present in most processes.
This technique is one that is demonstrated at http://vxheaven.org/lib/vrn00.html, but the general idea is simple. The VDSO code that is mapped to the process address space, as seen in the /proc/<pid>/maps output earlier in this chapter, contains code that invokes system calls via the syscall (for 64-bit) and sysenter (for 32-bit) instructions. The calling convention for system calls in Linux always places the system call number in the %eax/%rax register.
If an attacker uses ptrace(PTRACE_SYSCALL, …), they can quickly locate the syscall instruction in the VDSO code and replace the register values to invoke whichever system call is desired. If this is done carefully and done while restoring the original system call that was executing, then it will not cause the application to crash. The open and mmap system calls can be used to load an executable object such as ET_DYN or ET_REL into the process address space. Alternatively, they can be used to simply create an anonymous memory mapping that can store shellcode.
This is a code example in which the attacker takes advantage of this code on a 32-bit system:
fffe420 <__kernel_vsyscall>: ffffe420: 51 push %ecx ffffe421: 52 push %edx ffffe422: 55 push %ebp ffffe423: 89 e5 mov %esp,%ebp ffffe425: 0f 34 sysenter
The following is a code example in which the attacker takes advantage of this code on a 64-bit system:
ffffffffff700db8: b0 60 mov $0x60,%al ffffffffff700dba: 0f 05 syscall
The dynamic linker is the only legitimate way to bring a shared library into a process. Remember, however, that an attacker can use the __libc_dlopen_mode function, which invokes the dynamic linker to load an object. So how do we tell when the dynamic linker is doing legitimate work? There are three legitimate ways in which a shared object is mapped to a process by the dynamic linker.
Let's look at what we consider legitimate shared object loading:
DT_NEEDED entry in the executable program that corresponds to the shared library file.DT_NEEDED entries in order to load other shared libraries. This can be called transitive shared library loading.libdl.so, then it may use the dynamic loading functions to load libraries on the fly. The function for loading shared objects is named dlopen, and the function for resolving symbols is named dlsym.Now, let's take a look at the illegitimate ways in which a shared object can be loaded into a process, that is to say, by an attacker or a malware instance:
__libc_dlopen_mode function exists within libc.so (not libdl.so) and is not intended to be called by a program. It is actually marked as a GLIBC PRIVATE function. Most processes have libc.so, and this is therefore a function commonly used by attackers or malware to load arbitrary shared objects.VDSO manipulation. As we have already demonstrated, this technique can be used to execute arbitrary syscalls, and therefore it can be simple to memory-map a shared object with this method.open and mmap system calls.DT_NEEDED entries can be added by an attacker by overwriting the DT_NULL tag in the dynamic segment of an executable or shared library, thus being able to tell the dynamic linker to load whatever shared object they wish. This particular method was discussed in Chapter 6, ELF Binary Forensics in Linux, and it falls more into the topic of that chapter, but it may also be necessary when inspecting a suspicious process.Be sure to inspect the binary of a suspicious process, and verify that the dynamic segment doesn't appear suspicious. Refer to the Checking the dynamic segment for DLL injection traces section of Chapter 6, ELF Binary Forensics in Linux.
Now that we have a clear definition of legitimate versus illegitimate loading of shared objects, we can get into the discussion of heuristics for detecting when a shared library is legitimate or not.
Beforehand, it is worth noting again that LD_PRELOAD is commonly used for good as well as bad purposes, and the only sure-fire way of knowing this is by inspecting what the actual code that resides in the preloaded shared object does. Therefore, we will leave LD_PRELOAD out of the discussion on heuristics here.
In this section, I will describe the general principles behind detecting whether a shared library is legitimate or not. In Chapter 8, ECFS – Extended Core File Snapshot Technology, we will be discussing the ECFS technology, which actually incorporates these heuristics into its feature set.
For now, let's look at the principles only. We want to get a list of the shared libraries that are mapped to the process and then see which ones qualify for being legitimately loaded by the dynamic linker:
/proc/<pid>/maps file.Some maliciously injected shared libraries won't appear as file mappings because the attacker created anonymous memory mappings and then memcpy'd the shared object code into those memory regions. In the next chapter, we will see that ECFS can weed these more stealthy entities out as well. A scan can be done of each executable memory region that is anonymously mapped to see whether ELF headers exist, particularly those with the ET_DYN file type.
DT_NEEDED entry exists in the executable that corresponds to the shared library you are seeing. If one exists, then it is a legitimate shared library. After you have verified that a given shared library is legitimate, check that shared library's dynamic segment and enumerate the DT_NEEDED entries within it. Those corresponding shared libraries can also be marked as legitimate. This goes back to the concept of transitive shared object loading.PLT/GOT of the process's actual executable program. If there are any dlopen calls being used, then analyze the code to find any calls to dlopen. The dlopen calls may be passed arguments that can be inspected statically, like this for instance:void *handle = dlopen("somelib.so", RTLD_NOW);In such cases, the string will be stored as a static constant and will therefore be in the .rodata section of the binary. So, check whether the .rodata section (or wherever the string is stored) contains any strings that contain the shared library path you are trying to validate.
DT_NEEDED section and cannot be accounted for by any dlopen calls either, then that means it was either preloaded by LD_PRELOAD or injected by some other means. At this point, you should qualify the shared object as suspicious.Currently, there are not many great tools that are specifically for process memory analysis in Linux. This is the reason that I designed ECFS (discussed in Chapter 8, ECFS – Extended Core File Snapshot Technology). There are only a few tools I know of that can detect PLT/GOT overwrites, and each one of them essentially uses the same heuristics that we just discussed: