Virtual Machine-Based Obfuscation

Mentioned earlier in this chapter (in Opcode Obfuscation in Opcode Obfuscation), some of the most sophisticated obfuscators reimplement the program they receive as input, using a custom byte code and associated virtual machine. When confronting a binary obfuscated in this manner, the only native code that you might see would be the virtual machine. Assuming you recognize that you are looking at a software virtual machine, developing a complete understanding of all of this code generally fails to reveal the true purpose of the obfuscated program. This is because the behavior of the program remains buried in the embedded byte code that the virtual machine must interpret. To fully understand the program, you must, first, locate all of the embedded byte code and, second, reverse engineer the instruction set of the virtual machine so you can properly interpret the meaning of that byte code.

By way of comparison, imagine that you knew nothing whatsoever about Java, and someone handed you a Java virtual machine and a .class file containing compiled byte code and asked you what they did. Lacking any documentation, you could make little sense of the byte code file, and you would need to fully reverse the virtual machine to learn both the structure of a .class file and how to interpret its contents. With an understanding of the byte code machine language, you could then proceed to understanding the .class file.

VMProtect is an example of a commercial product that utilizes very sophisticated virtual machine-based obfuscation techniques. As more of an academic exercise, TheHyper’s HyperUnpackMe2 challenge binary is a fairly straightforward example of the use of virtual machines in obfuscation, the primary challenge being to locate the virtual machine’s embedded byte code program and determine the meaning of each byte code. In his article on OpenRCE describing HyperUnpackMe2,^[185] Rolf Rolles’s approach was to fully comprehend the virtual machine in order to build a processor module capable of disassembling its byte code. The processor module then allowed him to disassemble the byte code embedded within the challenge binary. A minor limitation to this approach is that it allows you to view either the x86 code within HyperUnpackme2 (using IDA’s x86 module) or the virtual machine code (using Rolle’s processor module) but not both at the same time. This obligates you to create two different databases, each using a different processor module. An alternative approach takes advantage of the ability to customize existing processor modules (see Customizing Existing Processors in Customizing Existing Processors) through the use of plug-ins, effectively allowing you to extend an instruction set to include all of the instructions of an embedded virtual machine. Applying this approach to HyperUnpackMe2 allows us to view x86 code and virtual machine code together in a single database, as shown in the following listing:

TheHyper:01013B2F            h_pop.l       R9
TheHyper:01013B32              h_pop.l       R7
TheHyper:01013B35              h_pop.l       R5
TheHyper:01013B38              h_mov.l       SP, R2
TheHyper:01013B3C              h_sub.l       SP, 0Ch
TheHyper:01013B44              h_pop.l       R2
TheHyper:01013B47              h_pop.l       R1
TheHyper:01013B4A              h_retn        0Ch
TheHyper:01013B4A sub_1013919  endp
TheHyper:01013B4A
TheHyper:01013B4A ; ----------------------------------------------------------
TheHyper:01013B4D              dd 24242424h
TheHyper:01013B51              dd 0A9A4285Dh           ; TAG VALUE
TheHyper:01013B55
TheHyper:01013B55 ; ============ S U B R O U T I N E =========================
TheHyper:01013B55
TheHyper:01013B55 ; Attributes: bp-based frame
TheHyper:01013B55
TheHyper:01013B55 sub_1013B55  proc near      ; DATA XREF: TheHyper:0103AF7A?o
TheHyper:01013B55
TheHyper:01013B55 var_8        = dword ptr −8
TheHyper:01013B55 var_4        = dword ptr −4
TheHyper:01013B55 arg_0        = dword ptr  8
TheHyper:01013B55 arg_4        = dword ptr  0Ch
TheHyper:01013B55
TheHyper:01013B55            push    ebp
TheHyper:01013B56              mov     ebp, esp
TheHyper:01013B58              sub     esp, 8
TheHyper:01013B5B              mov     eax, [ebp+arg_0]
TheHyper:01013B5E              mov     [esp+8+var_8], eax
TheHyper:01013B61              mov     [esp+8+var_4], 0
TheHyper:01013B69              push    4
TheHyper:01013B6B              push    1000h

Here, the code beginning at is disassembled as HyperUnpackMe2 byte code, while the code that follows at is displayed as x86 code.

The ability to simultaneously display native code and byte code has been anticipated by Hex-Rays, which introduced custom datatypes and formats in IDA 5.7. Custom data formats are useful when IDA’s built-in formatting options fail to meet your needs. New formatting capabilities are registered by specifying (using a script or plug-in) a menu name for your format and a function to perform the formatting. Once you select a custom format for a data item, IDA will invoke your formatting function each time it needs to display that data item. Custom datatypes are useful when IDA’s built-in datatypes are not expressive enough represent the data that you encounter in a particular binary. Custom datatypes, like custom formats, are registered using a script or a plug-in. The Hex-Rays example registers a custom data type to designate virtual machine byte code and displays each byte code as an instruction by using a custom data format. A drawback to this approach is that it requires you to locate every virtual machine instruction and explicitly change its data type. Using a custom processor extension, designating a single value as a virtual machine instruction automatically leads to the discovery of every reachable instruction, because IDA drives the disassembly process and the processor extension discovers new reachable instructions via its custom_emu implementation.

^[185]See “Defeating HyperUnpackMe2 With an IDA Processor Module” at http://www.openrce.org/articles/full_view/28.

Previous Chapter

Static De-obfuscation of Binaries Using IDA

Next Chapter

Summary

Table of Contents for The IDA Pro Book, 2nd Edition

Virtual Machine-Based Obfuscation

Table of Contents for
The IDA Pro Book, 2nd Edition