Table of Contents for
The IDA Pro Book, 2nd Edition

Version ebook / Retour

Cover image for bash Cookbook, 2nd Edition The IDA Pro Book, 2nd Edition by Chris Eagle Published by No Starch Press, 2011
  1. Cover
  2. The IDA Pro Book
  3. PRAISE FOR THE FIRST EDITION OF THE IDA PRO BOOK
  4. Acknowledgments
  5. Introduction
  6. I. Introduction to IDA
  7. 1. Introduction to Disassembly
  8. The What of Disassembly
  9. The Why of Disassembly
  10. The How of Disassembly
  11. Summary
  12. 2. Reversing and Disassembly Tools
  13. Summary Tools
  14. Deep Inspection Tools
  15. Summary
  16. 3. IDA Pro Background
  17. Obtaining IDA Pro
  18. IDA Support Resources
  19. Your IDA Installation
  20. Thoughts on IDA’s User Interface
  21. Summary
  22. II. Basic IDA Usage
  23. 4. Getting Started with IDA
  24. IDA Database Files
  25. Introduction to the IDA Desktop
  26. Desktop Behavior During Initial Analysis
  27. IDA Desktop Tips and Tricks
  28. Reporting Bugs
  29. Summary
  30. 5. IDA Data Displays
  31. Secondary IDA Displays
  32. Tertiary IDA Displays
  33. Summary
  34. 6. Disassembly Navigation
  35. Stack Frames
  36. Searching the Database
  37. Summary
  38. 7. Disassembly Manipulation
  39. Commenting in IDA
  40. Basic Code Transformations
  41. Basic Data Transformations
  42. Summary
  43. 8. Datatypes and Data Structures
  44. Creating IDA Structures
  45. Using Structure Templates
  46. Importing New Structures
  47. Using Standard Structures
  48. IDA TIL Files
  49. C++ Reversing Primer
  50. Summary
  51. 9. Cross-References and Graphing
  52. IDA Graphing
  53. Summary
  54. 10. The Many Faces of IDA
  55. Using IDA’s Batch Mode
  56. Summary
  57. III. Advanced IDA Usage
  58. 11. Customizing IDA
  59. Additional IDA Configuration Options
  60. Summary
  61. 12. Library Recognition Using FLIRT Signatures
  62. Applying FLIRT Signatures
  63. Creating FLIRT Signature Files
  64. Summary
  65. 13. Extending IDA’s Knowledge
  66. Augmenting Predefined Comments with loadint
  67. Summary
  68. 14. Patching Binaries and Other IDA Limitations
  69. IDA Output Files and Patch Generation
  70. Summary
  71. IV. Extending IDA’s Capabilities
  72. 15. IDA Scripting
  73. The IDC Language
  74. Associating IDC Scripts with Hotkeys
  75. Useful IDC Functions
  76. IDC Scripting Examples
  77. IDAPython
  78. IDAPython Scripting Examples
  79. Summary
  80. 16. The IDA Software Development Kit
  81. The IDA Application Programming Interface
  82. Summary
  83. 17. The IDA Plug-in Architecture
  84. Building Your Plug-ins
  85. Installing Plug-ins
  86. Configuring Plug-ins
  87. Extending IDC
  88. Plug-in User Interface Options
  89. Scripted Plug-ins
  90. Summary
  91. 18. Binary Files and IDA Loader Modules
  92. Manually Loading a Windows PE File
  93. IDA Loader Modules
  94. Writing an IDA Loader Using the SDK
  95. Alternative Loader Strategies
  96. Writing a Scripted Loader
  97. Summary
  98. 19. IDA Processor Modules
  99. The Python Interpreter
  100. Writing a Processor Module Using the SDK
  101. Building Processor Modules
  102. Customizing Existing Processors
  103. Processor Module Architecture
  104. Scripting a Processor Module
  105. Summary
  106. V. Real-World Applications
  107. 20. Compiler Personalities
  108. RTTI Implementations
  109. Locating main
  110. Debug vs. Release Binaries
  111. Alternative Calling Conventions
  112. Summary
  113. 21. Obfuscated Code Analysis
  114. Anti–Dynamic Analysis Techniques
  115. Static De-obfuscation of Binaries Using IDA
  116. Virtual Machine-Based Obfuscation
  117. Summary
  118. 22. Vulnerability Analysis
  119. After-the-Fact Vulnerability Discovery with IDA
  120. IDA and the Exploit-Development Process
  121. Analyzing Shellcode
  122. Summary
  123. 23. Real-World IDA Plug-ins
  124. IDAPython
  125. collabREate
  126. ida-x86emu
  127. Class Informer
  128. MyNav
  129. IdaPdf
  130. Summary
  131. VI. The IDA Debugger
  132. 24. The IDA Debugger
  133. Basic Debugger Displays
  134. Process Control
  135. Automating Debugger Tasks
  136. Summary
  137. 25. Disassembler/Debugger Integration
  138. IDA Databases and the IDA Debugger
  139. Debugging Obfuscated Code
  140. IdaStealth
  141. Dealing with Exceptions
  142. Summary
  143. 26. Additional Debugger Features
  144. Debugging with Bochs
  145. Appcall
  146. Summary
  147. A. Using IDA Freeware 5.0
  148. Using IDA Freeware
  149. B. IDC/SDK Cross-Reference
  150. Index
  151. About the Author

Virtual Machine-Based Obfuscation

Mentioned earlier in this chapter (in Opcode Obfuscation in Opcode Obfuscation), some of the most sophisticated obfuscators reimplement the program they receive as input, using a custom byte code and associated virtual machine. When confronting a binary obfuscated in this manner, the only native code that you might see would be the virtual machine. Assuming you recognize that you are looking at a software virtual machine, developing a complete understanding of all of this code generally fails to reveal the true purpose of the obfuscated program. This is because the behavior of the program remains buried in the embedded byte code that the virtual machine must interpret. To fully understand the program, you must, first, locate all of the embedded byte code and, second, reverse engineer the instruction set of the virtual machine so you can properly interpret the meaning of that byte code.

By way of comparison, imagine that you knew nothing whatsoever about Java, and someone handed you a Java virtual machine and a .class file containing compiled byte code and asked you what they did. Lacking any documentation, you could make little sense of the byte code file, and you would need to fully reverse the virtual machine to learn both the structure of a .class file and how to interpret its contents. With an understanding of the byte code machine language, you could then proceed to understanding the .class file.

VMProtect is an example of a commercial product that utilizes very sophisticated virtual machine-based obfuscation techniques. As more of an academic exercise, TheHyper’s HyperUnpackMe2 challenge binary is a fairly straightforward example of the use of virtual machines in obfuscation, the primary challenge being to locate the virtual machine’s embedded byte code program and determine the meaning of each byte code. In his article on OpenRCE describing HyperUnpackMe2,[185] Rolf Rolles’s approach was to fully comprehend the virtual machine in order to build a processor module capable of disassembling its byte code. The processor module then allowed him to disassemble the byte code embedded within the challenge binary. A minor limitation to this approach is that it allows you to view either the x86 code within HyperUnpackme2 (using IDA’s x86 module) or the virtual machine code (using Rolle’s processor module) but not both at the same time. This obligates you to create two different databases, each using a different processor module. An alternative approach takes advantage of the ability to customize existing processor modules (see Customizing Existing Processors in Customizing Existing Processors) through the use of plug-ins, effectively allowing you to extend an instruction set to include all of the instructions of an embedded virtual machine. Applying this approach to HyperUnpackMe2 allows us to view x86 code and virtual machine code together in a single database, as shown in the following listing:

TheHyper:01013B2F            h_pop.l       R9
TheHyper:01013B32              h_pop.l       R7
TheHyper:01013B35              h_pop.l       R5
TheHyper:01013B38              h_mov.l       SP, R2
TheHyper:01013B3C              h_sub.l       SP, 0Ch
TheHyper:01013B44              h_pop.l       R2
TheHyper:01013B47              h_pop.l       R1
TheHyper:01013B4A              h_retn        0Ch
TheHyper:01013B4A sub_1013919  endp
TheHyper:01013B4A
TheHyper:01013B4A ; ----------------------------------------------------------
TheHyper:01013B4D              dd 24242424h
TheHyper:01013B51              dd 0A9A4285Dh           ; TAG VALUE
TheHyper:01013B55
TheHyper:01013B55 ; ============ S U B R O U T I N E =========================
TheHyper:01013B55
TheHyper:01013B55 ; Attributes: bp-based frame
TheHyper:01013B55
TheHyper:01013B55 sub_1013B55  proc near      ; DATA XREF: TheHyper:0103AF7A?o
TheHyper:01013B55
TheHyper:01013B55 var_8        = dword ptr −8
TheHyper:01013B55 var_4        = dword ptr −4
TheHyper:01013B55 arg_0        = dword ptr  8
TheHyper:01013B55 arg_4        = dword ptr  0Ch
TheHyper:01013B55
TheHyper:01013B55            push    ebp
TheHyper:01013B56              mov     ebp, esp
TheHyper:01013B58              sub     esp, 8
TheHyper:01013B5B              mov     eax, [ebp+arg_0]
TheHyper:01013B5E              mov     [esp+8+var_8], eax
TheHyper:01013B61              mov     [esp+8+var_4], 0
TheHyper:01013B69              push    4
TheHyper:01013B6B              push    1000h

Here, the code beginning at is disassembled as HyperUnpackMe2 byte code, while the code that follows at is displayed as x86 code.

The ability to simultaneously display native code and byte code has been anticipated by Hex-Rays, which introduced custom datatypes and formats in IDA 5.7. Custom data formats are useful when IDA’s built-in formatting options fail to meet your needs. New formatting capabilities are registered by specifying (using a script or plug-in) a menu name for your format and a function to perform the formatting. Once you select a custom format for a data item, IDA will invoke your formatting function each time it needs to display that data item. Custom datatypes are useful when IDA’s built-in datatypes are not expressive enough represent the data that you encounter in a particular binary. Custom datatypes, like custom formats, are registered using a script or a plug-in. The Hex-Rays example registers a custom data type to designate virtual machine byte code and displays each byte code as an instruction by using a custom data format. A drawback to this approach is that it requires you to locate every virtual machine instruction and explicitly change its data type. Using a custom processor extension, designating a single value as a virtual machine instruction automatically leads to the discovery of every reachable instruction, because IDA drives the disassembly process and the processor extension discovers new reachable instructions via its custom_emu implementation.



[185] See “Defeating HyperUnpackMe2 With an IDA Processor Module” at http://www.openrce.org/articles/full_view/28.