Table of Contents for
The IDA Pro Book, 2nd Edition

Version ebook / Retour

Cover image for bash Cookbook, 2nd Edition The IDA Pro Book, 2nd Edition by Chris Eagle Published by No Starch Press, 2011
  1. Cover
  2. The IDA Pro Book
  3. PRAISE FOR THE FIRST EDITION OF THE IDA PRO BOOK
  4. Acknowledgments
  5. Introduction
  6. I. Introduction to IDA
  7. 1. Introduction to Disassembly
  8. The What of Disassembly
  9. The Why of Disassembly
  10. The How of Disassembly
  11. Summary
  12. 2. Reversing and Disassembly Tools
  13. Summary Tools
  14. Deep Inspection Tools
  15. Summary
  16. 3. IDA Pro Background
  17. Obtaining IDA Pro
  18. IDA Support Resources
  19. Your IDA Installation
  20. Thoughts on IDA’s User Interface
  21. Summary
  22. II. Basic IDA Usage
  23. 4. Getting Started with IDA
  24. IDA Database Files
  25. Introduction to the IDA Desktop
  26. Desktop Behavior During Initial Analysis
  27. IDA Desktop Tips and Tricks
  28. Reporting Bugs
  29. Summary
  30. 5. IDA Data Displays
  31. Secondary IDA Displays
  32. Tertiary IDA Displays
  33. Summary
  34. 6. Disassembly Navigation
  35. Stack Frames
  36. Searching the Database
  37. Summary
  38. 7. Disassembly Manipulation
  39. Commenting in IDA
  40. Basic Code Transformations
  41. Basic Data Transformations
  42. Summary
  43. 8. Datatypes and Data Structures
  44. Creating IDA Structures
  45. Using Structure Templates
  46. Importing New Structures
  47. Using Standard Structures
  48. IDA TIL Files
  49. C++ Reversing Primer
  50. Summary
  51. 9. Cross-References and Graphing
  52. IDA Graphing
  53. Summary
  54. 10. The Many Faces of IDA
  55. Using IDA’s Batch Mode
  56. Summary
  57. III. Advanced IDA Usage
  58. 11. Customizing IDA
  59. Additional IDA Configuration Options
  60. Summary
  61. 12. Library Recognition Using FLIRT Signatures
  62. Applying FLIRT Signatures
  63. Creating FLIRT Signature Files
  64. Summary
  65. 13. Extending IDA’s Knowledge
  66. Augmenting Predefined Comments with loadint
  67. Summary
  68. 14. Patching Binaries and Other IDA Limitations
  69. IDA Output Files and Patch Generation
  70. Summary
  71. IV. Extending IDA’s Capabilities
  72. 15. IDA Scripting
  73. The IDC Language
  74. Associating IDC Scripts with Hotkeys
  75. Useful IDC Functions
  76. IDC Scripting Examples
  77. IDAPython
  78. IDAPython Scripting Examples
  79. Summary
  80. 16. The IDA Software Development Kit
  81. The IDA Application Programming Interface
  82. Summary
  83. 17. The IDA Plug-in Architecture
  84. Building Your Plug-ins
  85. Installing Plug-ins
  86. Configuring Plug-ins
  87. Extending IDC
  88. Plug-in User Interface Options
  89. Scripted Plug-ins
  90. Summary
  91. 18. Binary Files and IDA Loader Modules
  92. Manually Loading a Windows PE File
  93. IDA Loader Modules
  94. Writing an IDA Loader Using the SDK
  95. Alternative Loader Strategies
  96. Writing a Scripted Loader
  97. Summary
  98. 19. IDA Processor Modules
  99. The Python Interpreter
  100. Writing a Processor Module Using the SDK
  101. Building Processor Modules
  102. Customizing Existing Processors
  103. Processor Module Architecture
  104. Scripting a Processor Module
  105. Summary
  106. V. Real-World Applications
  107. 20. Compiler Personalities
  108. RTTI Implementations
  109. Locating main
  110. Debug vs. Release Binaries
  111. Alternative Calling Conventions
  112. Summary
  113. 21. Obfuscated Code Analysis
  114. Anti–Dynamic Analysis Techniques
  115. Static De-obfuscation of Binaries Using IDA
  116. Virtual Machine-Based Obfuscation
  117. Summary
  118. 22. Vulnerability Analysis
  119. After-the-Fact Vulnerability Discovery with IDA
  120. IDA and the Exploit-Development Process
  121. Analyzing Shellcode
  122. Summary
  123. 23. Real-World IDA Plug-ins
  124. IDAPython
  125. collabREate
  126. ida-x86emu
  127. Class Informer
  128. MyNav
  129. IdaPdf
  130. Summary
  131. VI. The IDA Debugger
  132. 24. The IDA Debugger
  133. Basic Debugger Displays
  134. Process Control
  135. Automating Debugger Tasks
  136. Summary
  137. 25. Disassembler/Debugger Integration
  138. IDA Databases and the IDA Debugger
  139. Debugging Obfuscated Code
  140. IdaStealth
  141. Dealing with Exceptions
  142. Summary
  143. 26. Additional Debugger Features
  144. Debugging with Bochs
  145. Appcall
  146. Summary
  147. A. Using IDA Freeware 5.0
  148. Using IDA Freeware
  149. B. IDC/SDK Cross-Reference
  150. Index
  151. About the Author

Chapter 22. Vulnerability Analysis

image with no caption

Before we get too far into this chapter, we need to make one thing clear: IDA is not a vulnerability discovery tool. There, we said it; what a relief! IDA seems to have attained mystical qualities in some people’s minds. All too often people seem to have the impression that merely opening a binary with IDA will reveal all the secrets of the universe, that the behavior of a piece of malware will be fully explained to them in comments automatically generated by IDA, that vulnerabilities will be highlighted in red, and that IDA will automatically generate exploit code if you right-click while standing on one foot in some obscure Easter egg–activation sequence.

While IDA is certainly a very capable tool, without a clever user sitting at the keyboard (and perhaps a handy collection of scripts and plug-ins), it is really only a disassembler/debugger. As a static-analysis tool, it can only facilitate your attempts to locate software vulnerabilities. Ultimately, it is up to your skills and how you apply them as to whether IDA makes your search for vulnerabilities easier. Based on our experience, IDA is not the optimal tool for locating new vulnerabilities,[186] but when used in conjunction with a debugger, it is one of the best tools available for assisting in exploit development once a vulnerability has been discovered.

Over the past several years, IDA has taken on a new role in discovering existing vulnerabilities. Initially, it may seem unusual to search for known vulnerabilities until we stop to consider exactly what is known about these vulnerabilities and exactly who knows it. In the closed-source, binary-only software world, vendors frequently release software patches without disclosing exactly what has been patched and why. By performing differential analysis between new patched versions of a piece of software and old un-patched versions of the same software, it is possible to isolate the areas that have changed within a binary. Under the assumption that these changes were made for a reason, such differential-analysis techniques actually help to shine a spotlight on what were formerly vulnerable code sequences. With the search thusly narrowed, anyone with the requisite skills can develop an exploit for use against unpatched systems. In fact, given Microsoft’s well-known Patch Tuesday cycle of publishing updates, large numbers of security researchers prepare to sit down and do just that once every month.

Considering that entire books exist on the topic,[187] there is no way that we can do justice to vulnerability analysis in a single chapter in a book dedicated to IDA. What we will do is assume that the reader is familiar with some of the basic concepts of software vulnerabilities, such as buffer overflows, and discuss some of the ways that IDA may be used to hunt down, analyze, and ultimately develop exploits for those vulnerabilities.

Discovering New Vulnerabilities with IDA

Vulnerability researchers take many different approaches to discovering new vulnerabilities in software. When source code is available, it may be possible to utilize any of a growing number of automated source code–auditing tools to highlight potential problem areas within a program. In many cases, such automated tools will only point out the low-hanging fruit, while discovery of deeper vulnerabilities may require extensive manual auditing.

Tools for performing automated auditing of binaries offer many of the same reporting capabilities offered by automated source-auditing tools. A clear advantage of automated binary analysis is that no access to the application source code is required. Therefore, it is possible to perform automated analysis of closed-source, binary-only programs. Veracode[188] is an example of a company that offers a subscription-based service in which users may submit binary files for analysis by Veracode’s proprietary binary-analysis tools. While there is no guarantee that such tools can find any or all vulnerabilities within a binary, these technologies bring binary analysis within reach of the average person seeking some measure of confidence that the software she uses is free from vulnerabilities.

Whether auditing at the source or binary level, basic static-analysis techniques include auditing for the use of problematic functions such as strcpy and sprintf, auditing the use of buffers returned by dynamic memory-allocation routines such as malloc and VirtualAlloc, and auditing the handling of user-supplied input received via functions such as recv, read, fgets, and many other similar functions. Locating such calls within a database is not difficult. For example, to track down all calls to strcpy, we could perform the following steps:

  1. Find the strcpy function.

  2. Display all cross-references to the strcpy function by positioning the cursor on the strcpy label and then choosing ViewOpen SubviewsCross References.

  3. Visit each cross-reference and analyze the parameters provided to strcpy to determine whether a buffer overflow may be possible.

Step 3 may require a substantial amount of code and data-flow analysis to understand all potential inputs to the function call. Hopefully, the complexity of such a task is clear. Step 1, although it seems straightforward, may require a little effort on your part. Locating strcpy may be as easy as using the Jump ▸ Jump to Address command (G) and entering strcpy as the address to jump to. In Windows PE binaries or statically linked ELF binaries, this is usually all that is needed. However, with other binaries, extra steps may be required. In a dynamically linked ELF binary, using the Jump command may not take you directly to the desired function. Instead, it is likely to take you to an entry in the extern section (which is involved in the dynamic-linking process). An IDA representation of the strcpy entry in an extern section is shown here:

 extern:804DECC          extrn strcpy:near     ; CODE XREF: _strcpy↑j
  extern:804DECC                                ; DATA XREF: .got:off_804D5E4↑o

To confuse matters, this location does not appear to be named strcpy at all (it is, but the name is indented), and the only code cross-reference to the location is a jump cross-reference from a function that appears to be named _strcpy, while a data cross-reference is also made to this location from the .got section. The referencing function is actually named .strcpy, which is not at all obvious from the display. In this case, IDA has replaced the dot character with an underscore because IDA does not consider dots to be valid identifier characters by default. Double-clicking the code cross-reference takes us to the program’s procedure linkage table (.plt) entry for strcpy, as shown here:

.plt:08049E90 _strcpy    proc near               ; CODE XREF: decode+5F↓p
.plt:08049E90                                    ; extract_int_argument+24↓p ...
.plt:08049E90            jmp     ds:off_804D5E4
.plt:08049E90 _strcpy    endp

If instead we follow the data cross-reference, we end up at the corresponding .got entry for strcpy shown here:

.got:0804D5E4 off_804D5E4     dd offset strcpy        ; DATA XREF: _strcpy↑r

In the .got entry, we encounter another data cross-reference to the .strcpy function in the .plt section. In practice, following the data cross-references is the most reliable means of navigating from the extern section to the .plt section. In dynamically linked ELF binaries, functions are called indirectly through the procedure linkage table. Now that we have reached the .plt, we can bring up the cross-references to _strcpy (actually .strcpy) and begin to audit each call (of which there are at least two in this example).

This process can become tedious when we have a list of several common functions whose calls we wish to locate and audit. At this point it may be useful to develop a script that can automatically locate and comment all interesting function calls for us. With comments in place, we can perform simple searches to move from one audit location to another. The foundation for such a script is a function that can reliably locate another function so that we can locate all cross-references to that function. With the understanding of ELF binaries gained in the preceding discussion, the IDC function in Example 22-1 takes a function name as an input argument and returns an address suitable for cross-reference iteration.

Example 22-1. Finding a function’s callable address

static getFuncAddr(fname) {
   auto func = LocByName(fname);
   if (func != BADADDR) {
      auto seg = SegName(func);
      //what segment did we find it in?
      if (seg == "extern") { //Likely an ELF if we are in "extern"
         //First (and only) data xref should be from got
         func = DfirstB(func);
         if (func != BADADDR) {
            seg = SegName(func);
            if (seg != ".got") return BADADDR;
            //Now, first (and only) data xref should be from plt
            func = DfirstB(func);
            if (func != BADADDR) {
               seg = SegName(func);
               if (seg != ".plt") return BADADDR;
            }
         }
      }
      else if (seg != ".text") {
         //otherwise, if the name was not in the .text section, then we
         // don't have an algorithm for finding it automatically
         func = BADADDR;
      }
   }
   return func;
}

Using the supplied return address, it is now possible to track down all of the references to any function whose use we want to audit. The IDC function in Example 22-2 leverages the getFuncAddr function from the preceding example to obtain a function address and add comments at all calls to the function.

Example 22-2. Flagging calls to a designated function

static flagCalls(fname) {
     auto func, xref;
     //get the callable address of the named function
    func = getFuncAddr(fname);
     if (func != BADADDR) {
        //Iterate through calls to the named function, and add a comment
        //at each call
       for (xref
 = RfirstB(func); xref != BADADDR; xref = RnextB(func, xref)) {
           if (XrefType() == fl_CN || XrefType() == fl_CF) {
              MakeComm(xref, "*** AUDIT HERE ***");
           }
        }
        //Iterate through data references to the named function, and add a
        //comment at reference
       for
 (xref = DfirstB(func); xref != BADADDR; xref = DnextB(func, xref)) {
           if (XrefType() == dr_O) {
              MakeComm(xref, "*** AUDIT HERE ***");
           }
        }
     }
  }

Once the desired function’s address has been located , two loops are used to iterate over cross-references to the function. In the first loop , a comment is inserted at each location that calls the function of interest. In the second loop , additional comments are inserted at each location that takes the address of the function (use of an offset cross-reference type). The second loop is required in order to track down calls of the following style:

 .text:000194EA                 mov     esi, ds:strcpy
  .text:000194F0                 push    offset loc_40A006
  .text:000194F5                 add     edi, 160h
    .text:000194FB                 push    edi
 .text:000194FC call    esi

In this example, the compiler has cached the address of the strcpy function in the ESI register in order to make use of a faster means of calling strcpy later in the program. The call instruction shown here is faster to execute because it is both smaller (2 bytes) and requires no additional operations to resolve the target of the call, since the address is already contained within the CPU within the ESI register. A compiler may choose to generate this type of code when one function makes several calls to another function.

Given the indirect nature of the call in this example, the flagCalls function in our example may see only the data cross-reference to strcpy while failing to see the call to strcpy because the call instruction does not reference strcpy directly. In practice, however, IDA possesses the capability to perform some limited data-flow analysis in cases such as these and is likely to generate the disassembly shown here:

.text:000194EA                 mov     esi, ds:strcpy
  .text:000194F0                 push    offset loc_40A006
  .text:000194F5                 add     edi, 160h
  .text:000194FB                 push    edi
 .text:000194FC                 call    esi ; strcpy

Note that the call instruction has been annotated with a comment indicating which function IDA believes is being called. In addition to inserting the comment, IDA adds a code cross-reference from the point of the call to the function being called. This benefits the flagCalls function, because in this case the call instruction will be found and annotated via a code cross-reference.

To finish up our example script, we need a main function that invokes flagCalls for all of the functions that we are interested in auditing. A simple example to annotate calls to some of the functions mentioned earlier in this section is shown here:

static main() {
   flagCalls("strcpy");
   flagCalls("strcat");
   flagCalls("sprintf");
   flagCalls("gets");
}

After running this script, we can move from one interesting call to the next by searching for the inserted comment text, *** AUDIT ***. Of course this still leaves a lot of work to be done from an analysis perspective, since the mere fact that a program calls strcpy does not make that program exploitable. This is where data-flow analysis comes into play. In order to understand whether a particular call to strcpy is exploitable or not, you must determine what parameters are being passed in to strcpy and evaluate whether those parameters can be manipulated to your advantage or not.

Data-flow analysis is a far more complex task than simply finding calls to problem functions. In order to track the flow of data in a static-analysis environment, a thorough understanding of the instruction set being used is required. Your static-analysis tools need to understand where registers may have been assigned values and how those values may have changed and propagated to other registers. Further, your tools need a means for determining the sizes of source and destination buffers being referenced within the program, which in turn requires the ability to understand the layout of stack frames and global variables as well as the ability to deduce the size of dynamically allocated memory blocks. And, of course, all of this is being attempted without actually running the program.

An interesting example of what can be accomplished with creative scripting comes in the form of the BugScam[189] scripts created by Halvar Flake. BugScam utilizes techniques similar to the preceding examples to locate calls to problematic functions and takes the additional step of performing rudimentary data-flow analysis at each function call. The result of BugScam’s analysis is an HTML report of potential problems in a binary. A sample report table generated as a result of a sprintf analysis is shown here:

Address

Severity

Description

8048c03

5

The maximum expansion of the data appears to be larger than the target buffer; this might be the cause of a buffer overrun! Maximum Expansion: 1053. Target Size: 1036.

In this case, BugScam was able to determine the size of the input and output buffers, which, when combined with the format specifiers contained in the format string, were used to determine the maximum size of the generated output.

Developing scripts of this nature requires an in-depth understanding of various exploit classes in order to develop an algorithm that can be applied generically across a large body of binaries. Lacking such knowledge, we can still develop scripts (or plug-ins) that answer simple questions for us faster than we can find the answers manually.

As a final example, consider the task of locating all functions that contain stack-allocated buffers, since these are the functions that might be susceptible to stack-based buffer-overflow attacks. Rather than manually scrolling through a database, we can develop a script to analyze the stack frame of each function, looking for variables that occupy large amounts of space. The Python function in Example 22-3 iterates through the defined members of a given function’s stack frame in search of variables whose size is larger than a specified minimum size.

Example 22-3. Scanning for stack-allocated buffers

def findStackBuffers(func_addr, minsize):
     prev_idx = −1
     frame = GetFrame(func_addr)
     if frame == −1: return   #bad function
       idx = 0
     prev = None
     while idx < GetStrucSize(frame):
       member = GetMemberName(frame, idx)
        if member is not None:
           if prev_idx != −1:
              #compute distance from previous field to current field
             delta = idx - prev_idx
             if delta >= minsize:
                 Message("%s: possible buffer %s: %d bytes\n" %  \
                         (GetFunctionName(func_addr), prev, delta))
           prev_idx = idx
           prev = member
          idx = idx + GetMemberSize(frame, idx)
        else:
          idx = idx + 1

This function locates all the variables in a stack frame using repeated calls to GetMemberName for all valid offsets within the stack frame. The size of a variable is computed as the difference between the starting offsets of two successive variables . If the size exceeds a threshold size (minsize) , then the variable is reported as a possible stack buffer. The index into the structure is moved along by either 1 byte when no member is defined at the current offset or by the size of any member found at the current offset . The GetMem-berSize function may seem like a more suitable choice for computing the size of each stack variable; however, this is true only if the variable has been sized properly by either IDA or the user. Consider the following stack frame:

.text:08048B38 sub_8048B38     proc near
.text:08048B38
.text:08048B38 var_818         = byte ptr −818h
.text:08048B38 var_418         = byte ptr −418h
.text:08048B38 var_C           = dword ptr −0Ch
.text:08048B38 arg_0           = dword ptr  8

Using the displayed byte offsets, we can compute that there are 1,024 bytes from the start of var_818 to the start of var_418 (818h - 418h = 400h) and 1,036 bytes between the start of var_418 and the start of var_C (418h - 0Ch). However, the stack frame might be expanded to show the following layout:

-00000818 var_818         db ?
−00000817                 db ? ; undefined
−00000816                 db ? ; undefined
...
−0000041A                 db ? ; undefined
−00000419                 db ? ; undefined
−00000418 var_418         db 1036 dup(?)
−0000000C var_C           dd ?

Here, var_418 has been collapsed into an array, while var_818 appears to be only a single byte (with 1,023 undefined bytes filling the space between var_818 and var_418). For this stack layout, GetMemberSize will report 1 byte for var_818 and 1,036 bytes for var_418, which is an undesirable result. The output of a call to findStackBuffers(0x08048B38, 16) results in the following output, regardless of whether var_818 is defined as a single byte or an array of 1,024 bytes:

sub_8048B38: possible buffer var_818: 1024 bytes
sub_8048B38: possible buffer var_418: 1036 bytes

Creating a main function that iterates through all functions in a database (see Chapter 15) and calls findStackBuffers for each function yields a script that quickly points out the use of stack buffers within a program. Of course, determining whether any of those buffers can be overflowed requires additional (usually manual) study of each function. The tedious nature of static analysis is precisely the reason that fuzz testing is so popular.



[186] In general, far more vulnerabilities are discovered through fuzz testing than through static analysis.

[187] For example, see Jon Erickson’s Hacking: The Art of Exploitation, 2nd Edition (http://nostarch.com/hacking2.htm).