Table of Contents for
The IDA Pro Book, 2nd Edition

Version ebook / Retour

Cover image for bash Cookbook, 2nd Edition The IDA Pro Book, 2nd Edition by Chris Eagle Published by No Starch Press, 2011
  1. Cover
  2. The IDA Pro Book
  3. PRAISE FOR THE FIRST EDITION OF THE IDA PRO BOOK
  4. Acknowledgments
  5. Introduction
  6. I. Introduction to IDA
  7. 1. Introduction to Disassembly
  8. The What of Disassembly
  9. The Why of Disassembly
  10. The How of Disassembly
  11. Summary
  12. 2. Reversing and Disassembly Tools
  13. Summary Tools
  14. Deep Inspection Tools
  15. Summary
  16. 3. IDA Pro Background
  17. Obtaining IDA Pro
  18. IDA Support Resources
  19. Your IDA Installation
  20. Thoughts on IDA’s User Interface
  21. Summary
  22. II. Basic IDA Usage
  23. 4. Getting Started with IDA
  24. IDA Database Files
  25. Introduction to the IDA Desktop
  26. Desktop Behavior During Initial Analysis
  27. IDA Desktop Tips and Tricks
  28. Reporting Bugs
  29. Summary
  30. 5. IDA Data Displays
  31. Secondary IDA Displays
  32. Tertiary IDA Displays
  33. Summary
  34. 6. Disassembly Navigation
  35. Stack Frames
  36. Searching the Database
  37. Summary
  38. 7. Disassembly Manipulation
  39. Commenting in IDA
  40. Basic Code Transformations
  41. Basic Data Transformations
  42. Summary
  43. 8. Datatypes and Data Structures
  44. Creating IDA Structures
  45. Using Structure Templates
  46. Importing New Structures
  47. Using Standard Structures
  48. IDA TIL Files
  49. C++ Reversing Primer
  50. Summary
  51. 9. Cross-References and Graphing
  52. IDA Graphing
  53. Summary
  54. 10. The Many Faces of IDA
  55. Using IDA’s Batch Mode
  56. Summary
  57. III. Advanced IDA Usage
  58. 11. Customizing IDA
  59. Additional IDA Configuration Options
  60. Summary
  61. 12. Library Recognition Using FLIRT Signatures
  62. Applying FLIRT Signatures
  63. Creating FLIRT Signature Files
  64. Summary
  65. 13. Extending IDA’s Knowledge
  66. Augmenting Predefined Comments with loadint
  67. Summary
  68. 14. Patching Binaries and Other IDA Limitations
  69. IDA Output Files and Patch Generation
  70. Summary
  71. IV. Extending IDA’s Capabilities
  72. 15. IDA Scripting
  73. The IDC Language
  74. Associating IDC Scripts with Hotkeys
  75. Useful IDC Functions
  76. IDC Scripting Examples
  77. IDAPython
  78. IDAPython Scripting Examples
  79. Summary
  80. 16. The IDA Software Development Kit
  81. The IDA Application Programming Interface
  82. Summary
  83. 17. The IDA Plug-in Architecture
  84. Building Your Plug-ins
  85. Installing Plug-ins
  86. Configuring Plug-ins
  87. Extending IDC
  88. Plug-in User Interface Options
  89. Scripted Plug-ins
  90. Summary
  91. 18. Binary Files and IDA Loader Modules
  92. Manually Loading a Windows PE File
  93. IDA Loader Modules
  94. Writing an IDA Loader Using the SDK
  95. Alternative Loader Strategies
  96. Writing a Scripted Loader
  97. Summary
  98. 19. IDA Processor Modules
  99. The Python Interpreter
  100. Writing a Processor Module Using the SDK
  101. Building Processor Modules
  102. Customizing Existing Processors
  103. Processor Module Architecture
  104. Scripting a Processor Module
  105. Summary
  106. V. Real-World Applications
  107. 20. Compiler Personalities
  108. RTTI Implementations
  109. Locating main
  110. Debug vs. Release Binaries
  111. Alternative Calling Conventions
  112. Summary
  113. 21. Obfuscated Code Analysis
  114. Anti–Dynamic Analysis Techniques
  115. Static De-obfuscation of Binaries Using IDA
  116. Virtual Machine-Based Obfuscation
  117. Summary
  118. 22. Vulnerability Analysis
  119. After-the-Fact Vulnerability Discovery with IDA
  120. IDA and the Exploit-Development Process
  121. Analyzing Shellcode
  122. Summary
  123. 23. Real-World IDA Plug-ins
  124. IDAPython
  125. collabREate
  126. ida-x86emu
  127. Class Informer
  128. MyNav
  129. IdaPdf
  130. Summary
  131. VI. The IDA Debugger
  132. 24. The IDA Debugger
  133. Basic Debugger Displays
  134. Process Control
  135. Automating Debugger Tasks
  136. Summary
  137. 25. Disassembler/Debugger Integration
  138. IDA Databases and the IDA Debugger
  139. Debugging Obfuscated Code
  140. IdaStealth
  141. Dealing with Exceptions
  142. Summary
  143. 26. Additional Debugger Features
  144. Debugging with Bochs
  145. Appcall
  146. Summary
  147. A. Using IDA Freeware 5.0
  148. Using IDA Freeware
  149. B. IDC/SDK Cross-Reference
  150. Index
  151. About the Author

Chapter 9. Cross-References and Graphing

image with no caption

Some of the more common questions asked while reverse engineering a binary are along the lines of “Where is this function called from?” and “What functions access this data?” These and other similar questions seek to catalog the references to and from various resources in a program. Two examples serve to show the usefulness of such questions.

Consider the case in which you have located a function containing a stack-allocated buffer that can be overflowed, possibly leading to exploitation of the program. Since the function may be buried deep within a complex application, your next step might be to determine exactly how the function can be reached. The function is useless to you unless you can get it to execute. This leads to the question “What functions call this vulnerable function?” as well as additional questions regarding the nature of the data that those functions may pass to the vulnerable function. This line of reasoning must continue as you work your way back up potential call chains to find one that you can influence to properly exploit the overflow that you have discovered.

In another case, consider a binary that contains a large number of ASCII strings, at least one of which you find suspicious, such as “Executing Denial of Service attack!” Does the presence of this string indicate that the binary actually performs a Denial of Service attack? No, it simply indicates that the binary happens to contain that particular ASCII sequence. You might infer that the message is displayed somehow just prior to launching an attack; however, you need to find the related code in order to verify your suspicions. Here the answer to the question “Where is this string referenced?” would help you to quickly track down the program location(s) that make use of the string. From there, perhaps it can assist you in locating any actual Denial of Service attack code.

IDA helps to answer these types of questions through its extensive cross-referencing features. IDA provides a number of mechanisms for displaying and accessing cross-reference data, including graph-generation capabilities that provide a highly visual representation of the relationships between code and data. In this chapter we discuss the types of cross-reference information that IDA makes available, the tools for accessing cross-reference data, and how to interpret that data.

Cross-References

We begin our discussion by noting that cross-references within IDA are often referred to simply as xrefs. Within this text, we will use xref only where it is used to refer to the content of an IDA menu item or dialog. In all other cases we will stick to the term cross-reference.

There are two basic categories of cross-references in IDA: code cross-references and data cross-references. Within each category, we will detail several different types of cross-references. Associated with each cross-reference is the notion of a direction. All cross-references are made from one address to another address. The from and to addresses may be either code or data addresses. If you are familiar with graph theory, you may choose to think of addresses as nodes in a directed graph and cross-references as the edges in that graph. Figure 9-1 provides a quick refresher on graph terminology. In this simple graph, three nodes are connected by two directed edges .

Basic graph components

Figure 9-1. Basic graph components

Note that nodes may also be referred to as vertices. Directed edges are drawn using arrows to indicate the allowed direction of travel across the edge. In Figure 9-1, it is possible to travel from the upper node to either of the lower nodes, but it is not possible to travel from either of the lower nodes to the upper node.

Code cross-references are a very important concept, as they facilitate IDA’s generation of control flow graphs and function call graphs, each of which we discuss later in the chapter.

Before we dive into the details of cross-references, it is useful to understand how IDA displays cross-reference information in a disassembly listing. Figure 9-2 shows the header line for a disassembled function (sub_401000) containing a cross-reference as a regular comment (right side of the figure).

A basic cross-reference

Figure 9-2. A basic cross-reference

The text CODE XREF indicates that this is a code cross-reference rather than a data cross-reference (DATA XREF). An address follows, _main+2A in this case, indicating the address from which the cross-reference originates. Note that this is a more descriptive form of address than .text:0040154A, for example. While both forms represent the same program location, the format used in the cross-reference offers the additional information that the cross-reference is being made from within the function named _main, specifically 0x2A (42) bytes into the _main function. An up or down arrow will always follow the address, indicating the relative direction to the referencing location. In Figure 9-2, the down arrow indicates that _main+2A lies at a higher address than sub_401000, and thus you would need to scroll down to reach it. Similarly, an up arrow indicates that a referencing location lies at a lower memory address, requiring that you scroll up to reach it. Finally, every cross-reference comment contains a single-character suffix to identify the type of cross-reference that is being made. Each suffix is described later as we detail all of IDA’s cross-reference types.

Code Cross-References

A code cross-reference is used to indicate that an instruction transfers or may transfer control to another instruction. The manner in which instructions transfer control is referred to as a flow within IDA. IDA distinguishes among three basic flow types: ordinary, jump, and call. Jump and call flows are further divided according to whether the target address is a near or far address. Far addresses are encountered only in binaries that make use of segmented addresses. In the discussion that follows, we make use of the disassembled version of the following program:

int read_it;            //integer variable read in main
int write_it;           //integer variable written 3 times in main
int ref_it;             //integer variable whose address is taken in main

void callflow() {}      //function called twice from main

int main() {
   int *p = &ref_it;    //results in an "offset" style data reference
   *p = read_it;        //results in a "read" style data reference
   write_it = *p;       //results in a "write" style data reference
   callflow();          //results in a "call" style code reference
   if (read_it == 3) {  //results in "jump" style code reference
      write_it = 2;     //results in a "write" style data reference
   }
   else {               //results in an "jump" style code reference
      write_it = 1;     //results in a "write" style data reference
   }
   callflow();          //results in an "call" style code reference
}

The program contains operations that will exercise all of IDA’s cross-referencing features, as noted in the comment text.

An ordinary flow is the simplest flow type, and it represents sequential flow from one instruction to another. This is the default execution flow for all nonbranching instructions such as ADD. There are no special display indicators for ordinary flows other than the order in which instructions are listed in the disassembly. If instruction A has an ordinary flow to instruction B, then instruction B will immediately follow instruction A in the disassembly listing. In the following listing, every instruction other than and has an associated ordinary flow to its immediate successor:

Example 9-1. Cross-reference sources and targets

.text:00401010 _main           proc near
  .text:00401010
  .text:00401010 p               = dword ptr −4
  .text:00401010
  .text:00401010                 push    ebp
  .text:00401011                 mov     ebp, esp
  .text:00401013                 push    ecx
  .text:00401014                mov     [ebp+p], offset ref_it
  .text:0040101B                 mov     eax, [ebp+p]
  .text:0040101E                mov     ecx, read_it
  .text:00401024                 mov     [eax], ecx
  .text:00401026                 mov     edx, [ebp+p]
  .text:00401029                 mov     eax, [edx]
  .text:0040102B                mov     write_it, eax
  .text:00401030                call    callflow
  .text:00401035                cmp     read_it, 3
  .text:0040103C                 jnz     short loc_40104A
  .text:0040103E                mov     write_it, 2
  .text:00401048                jmp     short loc_401054

 .text:0040104A ; -------------------------------------------------------------
  .text:0040104A
  .text:0040104A loc_40104A:                         ; CODE XREF: _main+2C↑j
  .text:0040104A                mov     write_it, 1
  .text:00401054
  .text:00401054 loc_401054:                         ; CODE XREF: _main+38↑j
  .text:00401054                call    callflow
  .text:00401059                 xor     eax, eax
    .text:0040105B                 mov     esp, ebp
  .text:0040105D                 pop     ebp
  .text:0040105E                retn
  .text:0040105E _main           endp

Instructions used to invoke functions, such as the x86 call instructions at , are assigned a call flow, indicating transfer of control to the target function. In most cases, an ordinary flow is also assigned to call instructions, as most functions return to the location that follows the call. If IDA believes that a function does not return (as determined during the analysis phase), then calls to that function will not have an ordinary flow assigned. Call flows are noted by the display of cross-references at the target function (the destination address of the flow). The resulting disassembly of the callflow function is shown here:

.text:00401000 callflow        proc near               ; CODE XREF: _main+20↓p
.text:00401000                                         ; _main:loc_401054↓p
.text:00401000                 push    ebp
.text:00401001                 mov     ebp, esp
.text:00401003                 pop     ebp
.text:00401004                 retn
.text:00401004 callflow        endp

In this example, two cross-references are displayed at the address of callflow to indicate that the function is called twice. The address displayed in the cross-references is displayed as an offset into the calling function unless the calling address has an associated name, in which case the name is used. Both forms of addresses are used in the cross-references shown here. Cross-references resulting from function calls are distinguished through use of the p suffix (think P for Procedure).

A jump flow is assigned to each unconditional and conditional branch instruction. Conditional branches are also assigned ordinary flows to account for control flow when the branch is not taken. Unconditional branches have no associated ordinary flow because the branch is always taken in such cases. The dashed line break at is a display device used to indicate that an ordinary flow does not exist between two adjacent instructions. Jump flows are associated with jump-style cross-references displayed at the target of the jump, as shown at . As with call-style cross-references, jump cross-references display the address of the referring location (the source of the jump). Jump cross-references are distinguished by the use of a j suffix (think J for Jump).

Data Cross-References

Data cross-references are used to track the manner in which data is accessed within a binary. Data cross-references can be associated with any byte in an IDA database that is associated with a virtual address (in other words, data cross-references are never associated with stack variables). The three most commonly encountered types of data cross-references are used to indicate when a location is being read, when a location is being written, and when the address of a location is being taken. The global variables associated with the previous example program are shown here, as they provide several examples of data cross-references.

.data:0040B720 read_it       dd ?                    ; DATA XREF: _main+E↑r
.data:0040B720                                       ; _main+25↑r
.data:0040B724 write_it      dd ?                    ; DATA XREF: _main+1B↑w
.data:0040B724                                      ; _main+2E↑w ...
.data:0040B728 ref_it        db    ? ;               ; DATA XREF: _main+4↑o
.data:0040B729               db    ? ;
.data:0040B72A               db    ? ;
.data:0040B72B               db    ? ;

A read cross-reference is used to indicate that the contents of a memory location are being accessed. Read cross-references can originate only from an instruction address but may refer to any program location. The global variable read_it is read at locations marked in Example 9-1. The associated cross-reference comments shown in this listing indicate exactly which locations in main are referencing read_it and are recognizable as read cross-references based on the use of the r suffix. The first read performed on read_it is a 32-bit read into the ECX register, which leads IDA to format read_it as a dword (dd). In general IDA takes as many cues as it possibly can in order to determine the size and/or type of variables based on how they are accessed and how they are used as parameters to functions.

The global variable write_it is referenced at the locations marked in Example 9-1. Associated write cross-references are generated and displayed as comments for the write_it variable, indicating the program locations that modify the contents of the variable. Write cross-references utilize the w suffix. Here again, IDA has determined the size of the variable based on the fact that the 32-bit EAX register is copied into write_it. Note that the list of cross-references displayed at write_it terminates with an ellipsis ( above), indicating that the number of cross-references to write_it exceeds the current display limit for cross-references. This limit can be modified through the Number of displayed xrefs setting on the Cross-references tab in the Options ▸ General dialog. As with read cross-references, write cross-references can originate only from a program instruction but may reference any program location. Generally speaking, a write cross-reference that targets a program instruction byte is indicative of self-modifying code, which is usually considered bad form and is frequently encountered in the de-obfuscation routines used in malware.

The third type of data cross-reference, an offset cross-reference, indicates that the address of a location is being used (rather than the content of the location). The address of global variable ref_it is taken at location in Example 9-1, resulting in the offset cross-reference comment at ref_it in the previous listing (suffix o). Offset cross-references are commonly the result of pointer operations either in code or in data. Array access operations, for example, are typically implemented by adding an offset to the starting address of the array. As a result, the first address in most global arrays can often be recognized by the presence of an offset cross-reference. For this reason, most string data (strings being arrays of characters in C/C++) is the target of offset cross-references.

Unlike read and write cross-references, which can originate only from instruction locations, offset cross-references can originate from either instruction locations or data locations. An example of an offset that can originate from a program’s data section is any table of pointers (such as a vtable) that results in the generation of an offset cross-reference from each location within the table to the location being pointed to by those locations. You can see this if you examine the vtable for class SubClass from Chapter 8, whose disassembly is shown here:

.rdata:00408148 off_408148  dd offset SubClass::vfunc1
(void) ; DATA XREF: SubClass::SubClass(void)+12↑o
.rdata:0040814C          dd offset BaseClass::vfunc2(void)
.rdata:00408150          dd offset SubClass::vfunc3(void)
.rdata:00408154          dd offset BaseClass::vfunc4(void)
.rdata:00408158          dd offset SubClass::vfunc5(void)

Here you see that the address of the vtable is used in the function SubClass::SubClass(void), which is the class constructor. The header lines for function SubClass::vfunc3(void), shown here, show the offset cross-reference that links the function to a vtable.

.text:00401080 public: virtual void __thiscall SubClass::vfunc3(void) proc near
.text:00401080                                      ; DATA XREF: .rdata:00408150↓o

This example demonstrates one of the characteristics of C++ virtual functions that becomes quite obvious when combined with offset cross-references, namely that C++ virtual functions are never called directly and should never be the target of a call cross-reference. Instead, all C++ virtual functions should be referred to by at least one vtable entry and should always be the target of at least one offset cross-reference. Remember that overriding a virtual function is not mandatory. Therefore, a virtual function can appear in more than one vtable, as discussed in Chapter 8. Backtracking offset cross-references is one technique for easily locating C++ vtables in a program’s data section.

Cross-Reference Lists

With an understanding of what cross-references are, we can now discuss the manner in which you may access all of this data within IDA. As mentioned previously, the number of cross-reference comments that can be displayed at a given location is limited by a configuration setting that defaults to 2. As long as the number of cross-references to a location does not exceed this limit, then working with those cross-references is fairly straightforward. Mousing over the cross-reference text displays the disassembly of the source region in a tool tip–style display, while double-clicking the cross-reference address jumps the disassembly window to the source of the cross-reference.

There are two methods for viewing the complete list of cross-references to a location. The first method is to open a cross-references subview associated with a specific address. By positioning the cursor on an address that is the target of one or more cross-references and selecting View ▸ Open Subviews ▸ Cross-References, you can open the complete list of cross-references to a given location, as shown in Figure 9-3, which shows the complete list of cross-references to variable write_it.

Cross-reference display window

Figure 9-3. Cross-reference display window

The columns of the window indicate the direction (Up or Down) to the source of the cross-reference, the type of cross-reference (using the type suffixes discussed previously), the source address of the cross-reference, and the corresponding disassembled text at the source address, including any comments that may exist at the source address. As with other windows that display lists of addresses, double-clicking any entry repositions the disassembly display to the corresponding source address. Once opened, the cross-reference display window remains open and accessible via a title tab displayed along with every other open subview’s title tab above the disassembly area.

The second way to access a list of cross-references is to highlight a name that you are interested in learning about and choose Jump ▸ Jump to xref (hotkey ctrl-X) to open a dialog that lists every location that references the selected symbol. The resulting dialog, shown in Figure 9-4, is nearly identical in appearance to the cross-reference subview shown in Figure 9-3. In this case, the dialog was activated using the ctrl-X hotkey with the first instance of write_it (.text:0040102B) selected.

Jump to cross-reference dialog

Figure 9-4. Jump to cross-reference dialog

The primary difference in the two displays is behavioral. Being a modal dialog,[52] the display in Figure 9-4 has buttons to interact with and terminate the dialog. The primary purpose of this dialog is to select a referencing location and jump to it. Double-clicking one of the listed locations dismisses the dialog and repositions the disassembly window at the selected location. The second difference between the dialog and the cross-reference subview is that the former can be opened using a hotkey or context-sensitive menu from any instance of a symbol, while the latter can be opened only when you position the cursor on an address that is the target of a cross-reference and choose View ▸ Open Subviews ▸ Cross-References. Another way of thinking about it is that the dialog can be opened at the source of any cross-reference, while the subview can be opened only at the destination of the cross-reference.

An example of the usefulness of cross-reference lists might be to rapidly locate every location from which a particular function is called. Many people consider the use of the C strcpy[53] function to be dangerous. Using cross-references, locating every call to strcpy is as simple as finding any one call to strcpy, using the ctrl-X hotkey to bring up the cross-reference dialog, and working your way through every call cross-reference. If you don’t want to take the time to find strcpy used somewhere in the binary, you can even get away with adding a comment with the text strcpy in it and activating the cross-reference dialog using the comment.[54]

Function Calls

A specialized cross-reference listing dealing exclusively with function calls is available by choosing View ▸ Open Subviews ▸ Function Calls. Figure 9-5 shows the resulting dialog, which lists all locations that call the current function (as defined by the cursor location at the time the view is opened) in the upper half of the window and all calls made by the current function in the lower half of the window.

Function calls window

Figure 9-5. Function calls window

Here again, each listed cross-reference can be used to quickly reposition the disassembly listing to the corresponding cross-reference location. Restricting ourselves to considering function call cross-references allows us to think about more abstract relationships than simple mappings from one address to another and instead consider how functions relate to one another. In the next section, we show how IDA takes advantage of this by providing several types of graphs, all designed to assist you in interpreting a binary.



[52] A modal dialog must be closed before you can continue normal interaction with the underlying application. Modeless dialogs can remain open while you continue normal interaction with the application.

[53] The C strcpy function copies a source array of characters, up to and including the associated null termination character, to a destination array, with no checks whatsoever that the destination array is large enough to hold all of the characters from the source.

[54] When a symbol name appears in a comment, IDA treats that symbol just as if it was an operand in a disassembled instruction. Double-clicking the symbol repositions the disassembly window, and the right-click context-sensitive menu becomes available.