Table of Contents for
The IDA Pro Book, 2nd Edition

Version ebook / Retour

Cover image for bash Cookbook, 2nd Edition The IDA Pro Book, 2nd Edition by Chris Eagle Published by No Starch Press, 2011
  1. Cover
  2. The IDA Pro Book
  3. PRAISE FOR THE FIRST EDITION OF THE IDA PRO BOOK
  4. Acknowledgments
  5. Introduction
  6. I. Introduction to IDA
  7. 1. Introduction to Disassembly
  8. The What of Disassembly
  9. The Why of Disassembly
  10. The How of Disassembly
  11. Summary
  12. 2. Reversing and Disassembly Tools
  13. Summary Tools
  14. Deep Inspection Tools
  15. Summary
  16. 3. IDA Pro Background
  17. Obtaining IDA Pro
  18. IDA Support Resources
  19. Your IDA Installation
  20. Thoughts on IDA’s User Interface
  21. Summary
  22. II. Basic IDA Usage
  23. 4. Getting Started with IDA
  24. IDA Database Files
  25. Introduction to the IDA Desktop
  26. Desktop Behavior During Initial Analysis
  27. IDA Desktop Tips and Tricks
  28. Reporting Bugs
  29. Summary
  30. 5. IDA Data Displays
  31. Secondary IDA Displays
  32. Tertiary IDA Displays
  33. Summary
  34. 6. Disassembly Navigation
  35. Stack Frames
  36. Searching the Database
  37. Summary
  38. 7. Disassembly Manipulation
  39. Commenting in IDA
  40. Basic Code Transformations
  41. Basic Data Transformations
  42. Summary
  43. 8. Datatypes and Data Structures
  44. Creating IDA Structures
  45. Using Structure Templates
  46. Importing New Structures
  47. Using Standard Structures
  48. IDA TIL Files
  49. C++ Reversing Primer
  50. Summary
  51. 9. Cross-References and Graphing
  52. IDA Graphing
  53. Summary
  54. 10. The Many Faces of IDA
  55. Using IDA’s Batch Mode
  56. Summary
  57. III. Advanced IDA Usage
  58. 11. Customizing IDA
  59. Additional IDA Configuration Options
  60. Summary
  61. 12. Library Recognition Using FLIRT Signatures
  62. Applying FLIRT Signatures
  63. Creating FLIRT Signature Files
  64. Summary
  65. 13. Extending IDA’s Knowledge
  66. Augmenting Predefined Comments with loadint
  67. Summary
  68. 14. Patching Binaries and Other IDA Limitations
  69. IDA Output Files and Patch Generation
  70. Summary
  71. IV. Extending IDA’s Capabilities
  72. 15. IDA Scripting
  73. The IDC Language
  74. Associating IDC Scripts with Hotkeys
  75. Useful IDC Functions
  76. IDC Scripting Examples
  77. IDAPython
  78. IDAPython Scripting Examples
  79. Summary
  80. 16. The IDA Software Development Kit
  81. The IDA Application Programming Interface
  82. Summary
  83. 17. The IDA Plug-in Architecture
  84. Building Your Plug-ins
  85. Installing Plug-ins
  86. Configuring Plug-ins
  87. Extending IDC
  88. Plug-in User Interface Options
  89. Scripted Plug-ins
  90. Summary
  91. 18. Binary Files and IDA Loader Modules
  92. Manually Loading a Windows PE File
  93. IDA Loader Modules
  94. Writing an IDA Loader Using the SDK
  95. Alternative Loader Strategies
  96. Writing a Scripted Loader
  97. Summary
  98. 19. IDA Processor Modules
  99. The Python Interpreter
  100. Writing a Processor Module Using the SDK
  101. Building Processor Modules
  102. Customizing Existing Processors
  103. Processor Module Architecture
  104. Scripting a Processor Module
  105. Summary
  106. V. Real-World Applications
  107. 20. Compiler Personalities
  108. RTTI Implementations
  109. Locating main
  110. Debug vs. Release Binaries
  111. Alternative Calling Conventions
  112. Summary
  113. 21. Obfuscated Code Analysis
  114. Anti–Dynamic Analysis Techniques
  115. Static De-obfuscation of Binaries Using IDA
  116. Virtual Machine-Based Obfuscation
  117. Summary
  118. 22. Vulnerability Analysis
  119. After-the-Fact Vulnerability Discovery with IDA
  120. IDA and the Exploit-Development Process
  121. Analyzing Shellcode
  122. Summary
  123. 23. Real-World IDA Plug-ins
  124. IDAPython
  125. collabREate
  126. ida-x86emu
  127. Class Informer
  128. MyNav
  129. IdaPdf
  130. Summary
  131. VI. The IDA Debugger
  132. 24. The IDA Debugger
  133. Basic Debugger Displays
  134. Process Control
  135. Automating Debugger Tasks
  136. Summary
  137. 25. Disassembler/Debugger Integration
  138. IDA Databases and the IDA Debugger
  139. Debugging Obfuscated Code
  140. IdaStealth
  141. Dealing with Exceptions
  142. Summary
  143. 26. Additional Debugger Features
  144. Debugging with Bochs
  145. Appcall
  146. Summary
  147. A. Using IDA Freeware 5.0
  148. Using IDA Freeware
  149. B. IDC/SDK Cross-Reference
  150. Index
  151. About the Author

Chapter 20. Compiler Personalities

image with no caption

At this point, if we have done our job properly, you now possess the essential skills to use IDA effectively and, more important, to bend it to your will. The next step, young grasshopper, is to learn to react to the ninja stars that binaries (as opposed to IDA) will throw at you. Depending on your motives for staring at assembly language, you may be very familiar with what you are looking at, or you may never know what you are going to be faced with. If you happen to spend all of your time examining code that was compiled using gcc on a Linux platform, you may become quite familiar with the style of code that it generates. On the other hand, if someone dropped a debug version of a program compiled using Microsoft Visual C++ (VC++) in your lap, you might be completely baffled. Malware analysts in particular are faced with a wide variety of code to examine. Setting aside the topic of obfuscation for the moment, malware analysts are likely to see code created using Visual Basic, Delphi, and Visual C/C++; machine language blobs embedded within documents; and more all in the same afternoon.

In this chapter we will take a brief look at some of the ways that compilers differ as viewed through the IDA looking glass. The intent is not to delve into why compilers differ; rather, we hope to cover some of the ways that those differences manifest themselves in disassembly listings and how you may resolve those differences. Among other things, the compiler and associated options used to build a particular piece of software constitute one data point in profiling the author of that software.

While a wide variety of compilers are available for a wide variety of languages, in this chapter we will primarily utilize compiled C code for our examples, as a large number of C compilers are available for a large number of platforms.

Jump Tables and Switch Statements

The C switch statement is a frequent target for compiler optimizations. The goal of these optimizations is to match the switch variable to a valid case label in the most efficient manner possible. The means by which this is achieved typically depends on the nature of the switch statement’s case labels. When the case labels are widely spread, as in the following example, most compilers generate code to perform a binary search [140]to match the switch variable against one of the cases.

switch (value) {
   case 1:
      //code executed when value == 1
      break;
   case 211:
      //code executed when value == 211
      break;
   case 295:
      //code executed when value == 295
      break;
   case 462:
      //code executed when value == 462
      break;
   case 1093:
      //code executed when value == 1093
      break;
   case 1839:
      //code executed when value == 1839
      break;
}

When case labels are closely clustered, preferably sequentially as shown here, compilers generally resolve the switch variable by performing a table lookup[141] to match the switch variable to the address of its associated case.

switch (value) {
   case 1:
      //code executed when value == 1
      break;
   case 2:
      //code executed when value == 2
      break;
   case 3:
      //code executed when value == 3
      break;
   case 4:
      //code executed when value == 4
      break;
   case 5:
      //code executed when value == 5
      break;
   case 6:
      //code executed when value == 6
      break;
}

A compiled example of a switch statement that matches the switch variable against the consecutive cases 1 through 12 is shown here:

.text:00401155         mov     edx, [ebp+arg_0]
 .text:00401158         cmp     edx, 0Ch        ; switch 13 cases
  .text:0040115B         ja    loc_4011F1      ; default
  .text:0040115B                                   ; jumptable 00401161 case 0
  .text:00401161         jmp     ds:off_401168[edx*4] ; switch jump
  .text:00401161 ; ---------------------------------------------------------------
 .text:00401168 off_401168 dd offset loc_4011F1
  ; DATA XREF: sub_401150+11↑r
  .text:00401168         dd offset loc_40119C ; jump table for switch statement
  .text:00401168         dd offset loc_4011A1
  .text:00401168         dd offset loc_4011A6
  .text:00401168         dd offset loc_4011AB
  .text:00401168         dd offset loc_4011B3
  .text:00401168         dd offset loc_4011BB
  .text:00401168         dd offset loc_4011C3
  .text:00401168         dd offset loc_4011CB
  .text:00401168         dd offset loc_4011D3
  .text:00401168         dd offset loc_4011DB
  .text:00401168         dd offset loc_4011E3
  .text:00401168         dd offset loc_4011EB
  .text:0040119C ; ---------------------------------------------------------------
  .text:0040119C
  .text:0040119C loc_40119C:                 ; CODE XREF: sub_401150+11↑j
  .text:0040119C                             ; DATA XREF: sub_401150:off_401168↑o
 .text:0040119C         mov     eax, [ebp+arg_4] ; jumptable 00401161 case 1

This example was compiled using the Borland command-line compiler, which IDA well understands. The comments, which IDA inserted during the analysis phase, demonstrate that IDA has a clear understanding that this is a switch statement. In this example we note that IDA recognizes the switch test , the jump table , and individual cases by value within the code.

As a side note on the use of jump tables to resolve switch cases, note that the table in the previous example contains 13 entries, while the switch statement is known to test cases 1 through 12 only. In this case, the compiler elected to include an entry for case 0 rather than treating 0 as a special case. The destination for case 0 is the same as the destination for every other value outside the range of 1 to 12 .

A final implementation note concerns the nature of the test performed on the switch variable. For readers less familiar with the x86 instruction set, the test and the associated jump in the succeeding line may appear only to exclude values larger than 12 while failing to account for negative values. If true, this could be disastrous, as using a negative index into the jump table might lead to unintended consequences. Fortunately, the ja (jump above) instruction treats comparisons as if they were performed on unsigned values; thus −1 (0xFFFFFFFF) would be seen as 4294967295, which is much larger than 12 and therefore excluded from the valid range for indexing the jump table.

The same source code compiled using Microsoft Visual C++ results in the disassembly listing shown here:

.text:004013D5             mov     ecx, [ebp+var_8]
  .text:004013D8        sub     ecx, 1
  .text:004013DB              mov     [ebp+var_8], ecx
  .text:004013DE              cmp     [ebp+var_8], 0Bh ; switch 12 cases
  .text:004013E2             ja      loc_40146E      ; jumptable 004013EB default case
  .text:004013E8             mov     edx, [ebp+var_8]
  .text:004013EB             jmp     ds:off_401478[edx*4] ; switch jump
  .text:004013F2
 .text:004013F2 loc_4013F2:                          ; DATA XREF:
  .text:off_401478?o
  .text:004013F2             mov     eax, [ebp+arg_4] ; jumptable 004013EB 
case 0
  ... ; REMAINDER OF FUNCTION EXCLUDED FOR BREVITY
  .text:00401477            retn
  .text:00401477 sub_4013B0 endp
  .text:00401477 ; -------------------------------------------------------------
 .text:00401478 off_401478 dd offset 
loc_4013F2  ; DATA XREF: sub_4013B0+3B↓r
  .text:00401478            dd offset loc_4013FA  ; jump table for switch statement
  .text:00401478            dd offset loc_401402
  .text:00401478            dd offset loc_40140A
  .text:00401478            dd offset loc_401415
  .text:00401478            dd offset loc_401420
  .text:00401478            dd offset loc_40142B
  .text:00401478            dd offset loc_401436
  .text:00401478            dd offset loc_401441
  .text:00401478            dd offset loc_40144C
  .text:00401478            dd offset loc_401458
  .text:00401478            dd offset loc_401464

Several differences are apparent when comparing this code with the code generated by the Borland compiler. One obvious difference is that the jump table has been relocated to space immediately following the function containing the switch statement (as opposed to being embedded within the function itself in the case of the Borland code). Other than providing a cleaner separation of code and data, relocating the jump table in this manner has little effect on the behavior of the program. Despite the different layout of the code, IDA remains capable of annotating the key features of the switch statement, including the number of cases and the code blocks associated with each case.

A few of the implementation details of the switch statement include the fact that the switch variable (var_8 in this case) is decremented to shift the range of valid values to 0 through 11 , allowing the variable to be used directly as an index into the jump table without the need to create a dummy slot for the unused case 0. As a result, the first entry (or zero index entry) in the jump table actually refers to the code for switch case 1.

Rounding out our comparison of switch statements is the following code generated by gcc:

.text:004011FA          cmp     [ebp+arg_0], 0Ch ; switch 13 cases
  .text:004011FE              ja  loc_40129D      ; jumptable 00401210 case 0
  .text:00401204              mov     eax, [ebp+arg_0]
  .text:00401207              shl     eax, 2
  .text:0040120A         mov     eax,  ds:off_402010[eax]
  .text:00401210         jmp     eax             ; switch jump
  .text:00401212
  .text:00401212  loc_401212:                          ; DATA XREF:
  .rdata:off_402010 o
  .text:00401212        mov     eax, [ebp+arg_4] ; jumptable 00401210 case 1
  ... ; REMAINDER OF .text SECTION EXCLUDED FOR BREVITY
 .rdata:00402010 off_402010  dd offset 
loc_40129D   ; DATA XREF: sub_4011ED+1D↑r
  .rdata:00402010            dd offset 
loc_401212   ; jump table for switch statement
  .rdata:00402010            dd offset loc_40121D
  .rdata:00402010            dd offset loc_401225
  .rdata:00402010            dd offset loc_40122D
  .rdata:00402010            dd offset loc_40123C
  .rdata:00402010            dd offset loc_40124B
  .rdata:00402010            dd offset loc_40125A
  .rdata:00402010            dd offset loc_401265
  .rdata:00402010            dd offset loc_401270
  .rdata:00402010            dd offset loc_40127B
  .rdata:00402010            dd offset loc_401287
  .rdata:00402010            dd offset loc_401293

This code bears some similarities to the Borland code as seen by the comparison to 12 , the jump table that contains 13 entries, and the use of a pointer to the default case in the case 0 slot of the jump table. As in the Borland code, the address for the case 1 handler can be found at index 1 into the jump table. Notable differences between the gcc code and previous examples include a different style of executing the jump and the fact that the jump table is stored in the read-only data (.rdata) section of the binary, providing a logical separation between the code associated with the switch statement and the data required to implement the switch statement. As in the other two examples, IDA is able to locate and annotate the key elements of the switch statement.

One of the points we are making here is that there is no single correct way to compile source to assembly. Familiarity with code generated by a specific compiler in no way guarantees that you will recognize high-level constructs compiled using an entirely different compiler (or even different versions of the same compiler family). More important, do not assume that something is not a switch statement simply because IDA fails to add comments to that effect. Like you, IDA is more familiar with the output of some compilers than others. Rather than relying entirely on IDA’s analysis capabilities to recognize commonly used code and data constructs, you should always be prepared to utilize your own skills—your familiarity with a given assembly language, your knowledge of compilers, and your research skills—to properly interpret a disassembly.



[140] For you algorithmic analysis fans, this means that the switch variable is matched after at most log-2N operations, where N is the number of cases contained in the switch statement.

[141] Again for those analyzing algorithms at home, the use of a table lookup allows the target case to be found in a single operation, which you may recall from your algorithms class is also called constant time or O(1).