We just reviewed some common methods for hijacking execution flow. If you can identify where the execution flow points, you can typically identify some or all of the parasite code. In the section Detecting PLT/GOT hooks, we determined the location of the parasite code for the hijacked puts() function by simply locating the PLT/GOT entry that had been modified and seeing where that address pointed to, which, in that case, was to an appended page containing parasite code.
Parasite code can be qualified as code that is unnaturally inserted into the binary; in other words, it wasn't linked in by the actual ELF object linker. With that said, there are several characteristics that can sometimes be attributed to injected code, depending on the techniques used.
Position independent code (PIC) is often used for parasites so that it can be injected into any point of a binary or memory and still execute properly regardless of its position in memory. PIC parasites are easier to inject into an executable because the code can be inserted into the binary without having to consider handling relocations. In some cases, such as with my Linux padding Virus http://www.bitlackeys.org/projects/lpv.c, the parasite is compiled as an executable with the gcc-nostdlib flag. It is not compiled as position independent, but it has no libc linking, and special care is taken within the parasite code itself to dynamically resolve memory addresses with instruction-pointer relative computations.
In many cases, the parasite code is written purely in assembly language and is therefore in a sense more identifiable as being a potential parasite since it will look different from what the compiler produces. One of the giveaways with parasite code written in assembly is the way in which syscalls are handled. In C code, typically syscalls are called through libc functions that will invoke the actual syscall. Therefore, syscalls look just like regular dynamically linked functions. In handwritten assembly code, syscalls are usually invoked directly using either the Intel sysenter or syscall instructions, and sometimes even int 0x80 (which is now considered legacy). If syscall instructions are present, we may consider it a red flag.
Another red flag, especially when analyzing a remote process that may be infected, is to see int3 instructions that can serve many purposes such as passing control back to a tracing process that is performing the infection or, even more disturbing, the ability to trigger some type of anti-debugging mechanism within malware or a binary protector.
The following 32-bit code memory maps a shared library into a process and then passes control back to the tracer with an int3. Notice that int 0x80 is being used to invoke the syscalls. This shellcode is actually quite old; I wrote it in 2008. Typically, nowadays we want to use either the sysenter or syscall instruction to invoke a system call in Linux, but the int 0x80 will still work; it is just slower and therefore considered deprecated:
_start:
jmp B
A:
# fd = open("libtest.so.1.0", O_RDONLY);
xorl %ecx, %ecx
movb $5, %al
popl %ebx
xorl %ecx, %ecx
int $0x80
subl $24, %esp
# mmap(0, 8192, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_SHARED, fd, 0);
xorl %edx, %edx
movl %edx, (%esp)
movl $8192,4(%esp)
movl $7, 8(%esp)
movl $2, 12(%esp)
movl %eax,16(%esp)
movl %edx, 20(%esp)
movl $90, %eax
movl %esp, %ebx
int $0x80
int3
B:
call A
.string "/lib/libtest.so.1.0"If you were to see this code inside an executable on disk or in memory, you should quickly come to the conclusion that it does not look like compiled code. One dead giveaway is the call/pop technique
that is used to dynamically retrieve the address of /lib/libtest.so.1.0. The string is stored right after the call A instruction and therefore its address is pushed onto the stack, and then you can see that it gets popped into ebx, which is not conventional compiler code.
For runtime analysis, the infection vectors are many, and we will cover more about parasite identification in memory when we get into Chapter 7, Process Memory Forensics.