Table of Contents for
The IDA Pro Book, 2nd Edition

Version ebook / Retour

Cover image for bash Cookbook, 2nd Edition The IDA Pro Book, 2nd Edition by Chris Eagle Published by No Starch Press, 2011
  1. Cover
  2. The IDA Pro Book
  3. PRAISE FOR THE FIRST EDITION OF THE IDA PRO BOOK
  4. Acknowledgments
  5. Introduction
  6. I. Introduction to IDA
  7. 1. Introduction to Disassembly
  8. The What of Disassembly
  9. The Why of Disassembly
  10. The How of Disassembly
  11. Summary
  12. 2. Reversing and Disassembly Tools
  13. Summary Tools
  14. Deep Inspection Tools
  15. Summary
  16. 3. IDA Pro Background
  17. Obtaining IDA Pro
  18. IDA Support Resources
  19. Your IDA Installation
  20. Thoughts on IDA’s User Interface
  21. Summary
  22. II. Basic IDA Usage
  23. 4. Getting Started with IDA
  24. IDA Database Files
  25. Introduction to the IDA Desktop
  26. Desktop Behavior During Initial Analysis
  27. IDA Desktop Tips and Tricks
  28. Reporting Bugs
  29. Summary
  30. 5. IDA Data Displays
  31. Secondary IDA Displays
  32. Tertiary IDA Displays
  33. Summary
  34. 6. Disassembly Navigation
  35. Stack Frames
  36. Searching the Database
  37. Summary
  38. 7. Disassembly Manipulation
  39. Commenting in IDA
  40. Basic Code Transformations
  41. Basic Data Transformations
  42. Summary
  43. 8. Datatypes and Data Structures
  44. Creating IDA Structures
  45. Using Structure Templates
  46. Importing New Structures
  47. Using Standard Structures
  48. IDA TIL Files
  49. C++ Reversing Primer
  50. Summary
  51. 9. Cross-References and Graphing
  52. IDA Graphing
  53. Summary
  54. 10. The Many Faces of IDA
  55. Using IDA’s Batch Mode
  56. Summary
  57. III. Advanced IDA Usage
  58. 11. Customizing IDA
  59. Additional IDA Configuration Options
  60. Summary
  61. 12. Library Recognition Using FLIRT Signatures
  62. Applying FLIRT Signatures
  63. Creating FLIRT Signature Files
  64. Summary
  65. 13. Extending IDA’s Knowledge
  66. Augmenting Predefined Comments with loadint
  67. Summary
  68. 14. Patching Binaries and Other IDA Limitations
  69. IDA Output Files and Patch Generation
  70. Summary
  71. IV. Extending IDA’s Capabilities
  72. 15. IDA Scripting
  73. The IDC Language
  74. Associating IDC Scripts with Hotkeys
  75. Useful IDC Functions
  76. IDC Scripting Examples
  77. IDAPython
  78. IDAPython Scripting Examples
  79. Summary
  80. 16. The IDA Software Development Kit
  81. The IDA Application Programming Interface
  82. Summary
  83. 17. The IDA Plug-in Architecture
  84. Building Your Plug-ins
  85. Installing Plug-ins
  86. Configuring Plug-ins
  87. Extending IDC
  88. Plug-in User Interface Options
  89. Scripted Plug-ins
  90. Summary
  91. 18. Binary Files and IDA Loader Modules
  92. Manually Loading a Windows PE File
  93. IDA Loader Modules
  94. Writing an IDA Loader Using the SDK
  95. Alternative Loader Strategies
  96. Writing a Scripted Loader
  97. Summary
  98. 19. IDA Processor Modules
  99. The Python Interpreter
  100. Writing a Processor Module Using the SDK
  101. Building Processor Modules
  102. Customizing Existing Processors
  103. Processor Module Architecture
  104. Scripting a Processor Module
  105. Summary
  106. V. Real-World Applications
  107. 20. Compiler Personalities
  108. RTTI Implementations
  109. Locating main
  110. Debug vs. Release Binaries
  111. Alternative Calling Conventions
  112. Summary
  113. 21. Obfuscated Code Analysis
  114. Anti–Dynamic Analysis Techniques
  115. Static De-obfuscation of Binaries Using IDA
  116. Virtual Machine-Based Obfuscation
  117. Summary
  118. 22. Vulnerability Analysis
  119. After-the-Fact Vulnerability Discovery with IDA
  120. IDA and the Exploit-Development Process
  121. Analyzing Shellcode
  122. Summary
  123. 23. Real-World IDA Plug-ins
  124. IDAPython
  125. collabREate
  126. ida-x86emu
  127. Class Informer
  128. MyNav
  129. IdaPdf
  130. Summary
  131. VI. The IDA Debugger
  132. 24. The IDA Debugger
  133. Basic Debugger Displays
  134. Process Control
  135. Automating Debugger Tasks
  136. Summary
  137. 25. Disassembler/Debugger Integration
  138. IDA Databases and the IDA Debugger
  139. Debugging Obfuscated Code
  140. IdaStealth
  141. Dealing with Exceptions
  142. Summary
  143. 26. Additional Debugger Features
  144. Debugging with Bochs
  145. Appcall
  146. Summary
  147. A. Using IDA Freeware 5.0
  148. Using IDA Freeware
  149. B. IDC/SDK Cross-Reference
  150. Index
  151. About the Author

The IDC Language

Unlike for some other aspects of IDA, a reasonable amount of help is available for the IDC language in IDA’s help system. Topics available at the top level of the help system include IDC language, which covers the basics of IDC syntax, and Index of IDC functions, which provides an exhaustive list of built-in functions available to IDC programmers.

IDC is a scripting language that borrows most of its syntactic elements from C. Beginning with IDA 5.6, IDC actually takes on more of the flavor of C++ with the introduction of object-oriented features and exception handling. Because of its similarity to C and C++, we will describe IDC in terms of these languages and focus primarily on where IDC differs.

IDC Variables

IDC is a loosely typed language, meaning that variables have no explicit type. The three primary datatypes used in IDC are integers (IDA documentation uses the type name long), strings, and floating point values, with the overwhelming majority of operations taking place on integers and strings. Strings are treated as a native datatype in IDC, and there is no need to keep track of the space required to store a string or whether a string is null terminated or not. Beginning with IDA 5.6, IDC incorporates a number of additional variable types, including objects, references, and function pointers.

All variables must be declared prior to their use. IDC supports local variables and, since IDA 5.4, global variables as well. The IDC keyword auto is used to introduce a local variable declaration, and local variable declarations may include initial values. The following examples show legal IDC local variable declarations:

auto addr, reg, val;   // legal, multiple variables declared with no initializers
auto count = 0;        // declaration with initialization

IDC recognizes C-style multiline comments using /* */ and C++–style line-terminating comments using //. Also, note that several variables may be declared in a single statement and that all statements in IDC are terminated using a semicolon (as in C). IDC does not support C-style arrays (slices are introduced in IDA 5.6), pointers (though references are supported beginning with IDA 5.6), or complex datatypes such as structs and unions. Classes are introduced in IDA 5.6.

Global variable declarations are introduced using the extern keyword, and their declarations are legal both inside and outside of any function definition. It is not legal to provide an initial value when a global variable is declared. The following listing shows the declaration of two global variables.

extern outsideGlobal;

static main() {
   extern insideGlobal;
   outsideGlobal = "Global";
   insideGlobal = 1;
}

Global variables are allocated the first time they are encountered during an IDA session and persist as long as that session remains active, regardless of the number of databases that you may open and close.

IDC Expressions

With a few exceptions, IDC supports virtually all of the arithmetic and logical operators available in C, including the ternary operator (? :). Compound assignment operators of the form op= (+=, *=, >>=, and the like) are not supported. The comma operator is supported beginning with IDA 5.6. All integer operands are treated as signed values. This affects integer comparisons (which are always signed) and the right-shift operator (>>), which always performs an arithmetic shift with sign bit replication. If you require logical right shifts, you must implement them yourself by masking off the top bit of the result, as shown here:

result = (x >> 1) & 0x7fffffff;  //set most significant bit to zero

Because strings are a native type in IDC, some operations on strings take on a different meaning than they might in C. The assignment of a string operand into a string variable results in a string copy operation; thus there is no need for string copying or duplicating functions such as C’s strcpy and strdup. Also, the addition of two string operands results in the concatenation of the two operands; thus “Hello” + “World” yields “HelloWorld”; there is no need for a concatenation function such as C’s strcat. Starting with IDA 5.6, IDC offers a slice operator for use with strings. Python programmers will be familiar with slices, which basically allow you to specify subsequences of array-like variables. Slices are specified using square brackets and a start (inclusive) and end (exclusive) index. At least one index is required. The following listing demonstrates the use of IDC slices.

auto str = "String to slice";
auto s1, s2, s3, s4;
s1 = str[7:9];     // "to"
s2 = str[:6];      // "String", omitting start index starts at 0
s3 = str[10:];     // "slice", omitting end index goes to end of string
s4 = str[5];       // "g", single element slice, similar to array element access

Note that while there are no array datatypes available in IDC, the slice operator effectively allows you to treat IDC strings as if they were arrays.

IDC Statements

As in C, all simple statements are terminated with a semicolon. The only C-style compound statement that IDC does not support is the switch statement. When using for loops, keep in mind that IDC does not support compound assignment operators, which may affect you if you wish to count by anything other than one, as shown here:

auto i;
for (i = 0; i < 10; i += 2) {}     // illegal, += is not supported
for (i = 0; i < 10; i = i + 2) {}  // legal

With IDA 5.6, IDC introduces try/catch blocks and the associated throw statement, which are syntactically similar to C++ exceptions.[99] IDA’s built-in help contains specifics on IDC’s exception-handling implementation.

For compound statements, IDC utilizes the same bracing ({}) syntax and semantics as C. Within a braced block, it is permissible to declare new variables as long as the variable declarations are the first statements within the block. However, IDC does not rigorously enforce the scope of the newly introduced variables, because such variables may be referenced beyond the block in which they were declared. Consider the following example:

if (1) {    //always true
   auto x;
   x = 10;
}
else {      //never executes
   auto y;
   y = 3;
}
Message("x = %d\n", x);   // x remains accessible after its block terminates
Message("y = %d\n", y);   // IDC allows this even though the else did not execute

The output statements (the Message function is analogous to C’s printf) will inform us that x = 10 and y = 0. Given that IDC does not strictly enforce the scope of x, it is not terribly surprising that we are allowed to print the value of x. What is somewhat surprising is that y is accessible at all, given that the block in which y is declared is never executed. This is simply a quirk of IDC. Note that while IDC may loosely enforce variable scoping within a function, variables declared within one function continue to remain inaccessible in any other function.

IDC Functions

IDC supports user-defined functions in standalone programs (.idc files) only. User-defined functions are not supported when using the IDC command dialog (see USING THE IDC COMMAND DIALOG in USING THE IDC COMMAND DIALOG). IDC’s syntax for declaring user-defined functions is where it differs most from C. The static keyword is used to introduce a user-defined function, and the function’s parameter list consists solely of a comma-separated list of parameter names. The following listing details the basic structure of a user-defined function:

static my_func(x, y, z) {
   //declare any local variables first
   auto a, b, c;
   //add statements to define the function's behavior
   // ...
}

Prior to IDA 5.6, all function parameters are strictly call-by-value. Call-by-reference parameter passing was introduced with IDA 5.6. Interestingly, whether a parameter is passed using call-by-value or call-by-reference is determined by the manner in which the function is called, not the manner in which the function is declared. The unary & operator is used in a function call (not the function declaration) to denote that an argument is being passed by reference. The following examples show invocations of the my_func function from the previous listing making use of both call-by-value and call-by-reference parameter passing.

auto q = 0, r = 1, s = 2;
my_func(q, r, s);   //all three arguments passed using call-by-value
                    //upon return, q, r, and s hold 0, 1, and 2 respectively
my_func(q, &r, s);  //q and s passed call-by-value, r is passed call-by-reference
                    //upon return, q, and s hold 0 and 2 respectively, but r may have
                    //changed. In this second case, any changes
 that my_func makes to its
                    //formal parameter y will be reflected in the
 caller as changes to r

Function declarations never indicate whether a function explicitly returns a value or what type of value is returned when a function does yield a result.

When you wish to return a value from a function, use a return statement to return the desired value. It is permissible to return entirely different datatypes from different paths of execution within a function. In other words, a function may return a string in some cases, while in other cases the same function may return an integer. As in C, use of a return statement within a function is optional. However, unlike C, any function that does not explicitly return a value implicitly returns the value zero.

As a final note, beginning with IDA 5.6, functions take a step closer to becoming first-class objects in IDC. It is now possible to pass function references as arguments to other functions and return function references as the result of a function. The following listing demonstrates the use of function parameters and functions as return values.

static getFunc() {
   return Message;  //return the built-in Message function as a result
}

static useFunc(func, arg) {  //func here is expected to be a function reference
   func(arg);
}

static main() {
   auto f = getFunc();
   f("Hello World\n");       //invoke the returned function f
   useFunc(f, "Print me\n"); //no need for & operator,
 functions always call-by-reference
}

IDC Objects

Another feature introduced in IDA 5.6 is the ability to define classes and, as a result, have variables that represent objects. In the discussion that follows, we assume that you have some familiarity with an object-oriented programming language such as C++ or Java.

IDC defines a root class named object from which all classes ultimately derive, and single inheritance is supported when creating new classes. IDC does not make use of access specifiers such as public and private; all class members are effectively public. Class declarations contain only the definitions of the class’s member functions. In order to create data members within a class, you simply create an assignment statement that assigns a value to the data member. The following listing will help to clarify.

class ExampleClass {
   ExampleClass(x, y) {   //constructor
      this.a = x;         //all ExampleClass objects have data member a
      this.b = y;         //all ExampleClass objects have data member b
   }
   ~ExampleClass() {      //destructor
   }
   foo(x) {
      this.a = this.a + x;
   }
   //...   other member functions as desired
};

static main() {
   ExampleClass ex;            //DON'T DO THIS!! This is not
 a valid variable declaration
   auto ex = ExampleClass(1, 2);   //reference variables are initialized by assigning
                                   //the result of calling the class constructor
   ex.foo(10);                 //dot notation is used to access members
   ex.z = "string";            //object ex now has a member z, BUT the class does not
}

For more information on IDC classes and their syntax, refer to the appropriate section within IDA’s built-in help file.

IDC Programs

For any scripting applications that require more than a few IDC statements, you are likely to want to create a standalone IDC program file. Among other things, saving your scripts as programs gives you some measure of persistence and portability.

IDC program files require you to make use of user-defined functions. At a minimum, you must define a function named main that takes no arguments. In most cases, you will also want to include the file idc.idc in order to pick up useful macro definitions that it contains. The following listing details the components of a minimal IDC program file:

#include <idc.idc>    // useful include directive
//declare additional functions as required
static main() {
   //do something fun here
}

IDC recognizes the following C-style preprocessor directives:

#include <file>

Includes the named file in the current file.

#define <name> [optional value]

Creates a macro named name and optionally assigns it the specified value. IDC predefines a number of macros that may be used to test various aspects of your script’s execution environment. These include _NT_, _LINUX_, _MAC_, _GUI_, and _TXT_ among others. See the Predefined symbols section of the IDA help file for more information on these and other symbols.

#ifdef <name>

Tests for the existence of the named macro and optionally processes any statements that follow if the named macro exists.

#else

Optionally used in conjunction with an #ifdef to provide an alternative set of statements to process in the event the named macro does not exist.

#endif

This is a required terminator for an #ifdef or #ifdef/#else block.

#undef <name>

Deletes the named macro.

Error Handling in IDC

No one is ever going to praise IDC for its error-reporting capabilities. There are two types of errors that you can expect to encounter when running IDC scripts: parsing errors and runtime errors.

Parsing errors are those errors that prevent your program from ever being executed and include such things as syntax errors, references to undefined variables, and supplying an incorrect number of arguments to a function. During the parsing phase, IDC reports only the first parsing error that it encounters. In some cases, error messages correctly identify both the location and the type of an error (hello_world.idc,20: Missing semicolon), while in other cases, error messages offer no real assistance (Syntax error near: <END>). Only the first error encountered during parsing is reported. As a result, in a script with 15 syntax errors, it may take 15 attempts at running the script before you are informed of every error.

Runtime errors are generally encountered less frequently than parsing errors. When encountered, runtime errors cause a script to terminate immediately. One example of a runtime error results from an attempt to call an undefined function that for some reason is not detected when the script is initially parsed. Another problem arises with scripts that take an excessive amount of time to execute. Once a script is started, there is no easy way to terminate the script if it inadvertently ends up in an infinite loop or simply takes longer to execute than you are willing to wait. Once a script has executed for more than two to three seconds, IDA displays the dialog shown in Figure 15-4.

This dialog is the only means by which you can terminate a script that fails to terminate properly.

Script cancellation dialog

Figure 15-4. Script cancellation dialog

Debugging is another of IDC’s weak points. Other than liberal use of output statements, there is no way to debug IDC scripts. With the introduction of exception handling (try/catch) in IDA 5.6, it does become possible to build more robust scripts that can terminate or continue as gracefully as you choose.

Persistent Data Storage in IDC

Perhaps you are the curious type who, not trusting that we would provide sufficient coverage of IDA’s scripting capability, raced off to see what the IDA help system has to say on the subject. If so, welcome back, and if not, we appreciate you sticking with us this far. In any case, somewhere along the way you may have acquired knowledge that claims that IDC does in fact support arrays, in which case you must surely be questioning the quality of this book. We urge you to give us a chance to sort out this potential confusion.

As mentioned previously, IDC does not support arrays in the traditional sense of declaring a large block of storage and then using a subscript notation to access individual items within that block. However, IDA’s documentation on scripting does mention something called global persistent arrays. IDC global arrays are better thought of as persistent named objects. The objects just happen to be sparse arrays.[101] Global arrays are stored within an IDA database and are persistent across script invocations and IDA sessions. Data is stored in global arrays by specifying an index and a data value to be stored at the specified index in the array. Each element in an array can simultaneously hold one integer value and one string value. IDC’s global arrays provide no means for storing floating point values.

Note

For the overly curious, IDA’s internal mechanism for storing persistent arrays is called a netnode. While the array-manipulation functions described next provide an abstracted interface to netnodes, lower-level access to netnode data is available using the IDA SDK, which is discussed, along with netnodes, in Chapter 16.

All interaction with global arrays occurs through the use of IDC functions dedicated to array manipulation. Descriptions of these functions follow:

long CreateArray(string name)

This function creates a persistent object with the specified name. The return value is an integer handle required for all future access to the array. If the named object already exists, the return value is −1.

long GetArrayId(string name)

Once an array has been created, subsequent access to the array must be done through an integer handle, which can be obtained by looking up the array name. The return value for this function is an integer handle to be used for all future interaction with the array. If the named array does not exist, the return value is −1.

long SetArrayLong(long id, long idx, long value)

Stores an integer value into the array referred to by id at the position specified by idx. The return value is 1 on success or 0 on failure. The operation will fail if the array id is invalid.

long SetArrayString(long id, long idx, string str)

Stores a string value into the array referred to by id at the position specified by idx. The return value is 1 on success or 0 on failure. The operation will fail if the array id is invalid.

string or long GetArrayElement(long tag, long id, long idx)

While there are distinct functions for storing data into an array depending on the type of data to be stored, there is only one function for retrieving data from an array. This function retrieves either an integer or a string value from the specified index (idx) in the specified array (id). Whether an integer or a string is retrieved is determined by the value of the tag parameter, which must be one of the constants AR_LONG (to retrieve an integer) or AR_STR (to retrieve a string).

long DelArrayElement(long tag, long id, long idx)

Deletes the contents of the specified array location from the specified array. The value of tag determines whether the integer value or string value associated with the specified index is deleted.

void DeleteArray(long id)

Deletes the array referenced by id and all of its associated contents. Once an array has been created, it continues to exist, even after a script terminates, until a call is made to DeleteArray to remove the array from the database in which it was created.

long RenameArray(long id, string newname)

Renames the array referenced by id to newname. Returns 1 if successful or 0 if the operation fails.

Possible uses for global arrays include approximating global variables, approximating complex datatypes, and providing persistent storage across script invocations. Global variables for a script are simulated by creating a global array when the script begins and storing global values in the array. These global values are shared either by passing the array handle to functions requiring access to the values or by requiring any function that requires access to perform a name lookup for the desired array.

Values stored in an IDC global array persist for the lifetime of the database in which the script was executed. You may test for the existence of an array by examining the return value of the CreateArray function. If the values stored in an array are applicable only to a specific invocation of a script, then the array should be deleted before the script terminates. Deleting the array ensures that no global values carry over from one execution of a script to a subsequent execution of the same script.



[101] Sparse arrays do not necessarily preallocate space for the entire array, nor are they limited to a particular maximum index. Instead, space for array elements is allocated on an as-needed basis when elements are added to the array.