Table of Contents for
The IDA Pro Book, 2nd Edition

Version ebook / Retour

Cover image for bash Cookbook, 2nd Edition The IDA Pro Book, 2nd Edition by Chris Eagle Published by No Starch Press, 2011
  1. Cover
  2. The IDA Pro Book
  3. PRAISE FOR THE FIRST EDITION OF THE IDA PRO BOOK
  4. Acknowledgments
  5. Introduction
  6. I. Introduction to IDA
  7. 1. Introduction to Disassembly
  8. The What of Disassembly
  9. The Why of Disassembly
  10. The How of Disassembly
  11. Summary
  12. 2. Reversing and Disassembly Tools
  13. Summary Tools
  14. Deep Inspection Tools
  15. Summary
  16. 3. IDA Pro Background
  17. Obtaining IDA Pro
  18. IDA Support Resources
  19. Your IDA Installation
  20. Thoughts on IDA’s User Interface
  21. Summary
  22. II. Basic IDA Usage
  23. 4. Getting Started with IDA
  24. IDA Database Files
  25. Introduction to the IDA Desktop
  26. Desktop Behavior During Initial Analysis
  27. IDA Desktop Tips and Tricks
  28. Reporting Bugs
  29. Summary
  30. 5. IDA Data Displays
  31. Secondary IDA Displays
  32. Tertiary IDA Displays
  33. Summary
  34. 6. Disassembly Navigation
  35. Stack Frames
  36. Searching the Database
  37. Summary
  38. 7. Disassembly Manipulation
  39. Commenting in IDA
  40. Basic Code Transformations
  41. Basic Data Transformations
  42. Summary
  43. 8. Datatypes and Data Structures
  44. Creating IDA Structures
  45. Using Structure Templates
  46. Importing New Structures
  47. Using Standard Structures
  48. IDA TIL Files
  49. C++ Reversing Primer
  50. Summary
  51. 9. Cross-References and Graphing
  52. IDA Graphing
  53. Summary
  54. 10. The Many Faces of IDA
  55. Using IDA’s Batch Mode
  56. Summary
  57. III. Advanced IDA Usage
  58. 11. Customizing IDA
  59. Additional IDA Configuration Options
  60. Summary
  61. 12. Library Recognition Using FLIRT Signatures
  62. Applying FLIRT Signatures
  63. Creating FLIRT Signature Files
  64. Summary
  65. 13. Extending IDA’s Knowledge
  66. Augmenting Predefined Comments with loadint
  67. Summary
  68. 14. Patching Binaries and Other IDA Limitations
  69. IDA Output Files and Patch Generation
  70. Summary
  71. IV. Extending IDA’s Capabilities
  72. 15. IDA Scripting
  73. The IDC Language
  74. Associating IDC Scripts with Hotkeys
  75. Useful IDC Functions
  76. IDC Scripting Examples
  77. IDAPython
  78. IDAPython Scripting Examples
  79. Summary
  80. 16. The IDA Software Development Kit
  81. The IDA Application Programming Interface
  82. Summary
  83. 17. The IDA Plug-in Architecture
  84. Building Your Plug-ins
  85. Installing Plug-ins
  86. Configuring Plug-ins
  87. Extending IDC
  88. Plug-in User Interface Options
  89. Scripted Plug-ins
  90. Summary
  91. 18. Binary Files and IDA Loader Modules
  92. Manually Loading a Windows PE File
  93. IDA Loader Modules
  94. Writing an IDA Loader Using the SDK
  95. Alternative Loader Strategies
  96. Writing a Scripted Loader
  97. Summary
  98. 19. IDA Processor Modules
  99. The Python Interpreter
  100. Writing a Processor Module Using the SDK
  101. Building Processor Modules
  102. Customizing Existing Processors
  103. Processor Module Architecture
  104. Scripting a Processor Module
  105. Summary
  106. V. Real-World Applications
  107. 20. Compiler Personalities
  108. RTTI Implementations
  109. Locating main
  110. Debug vs. Release Binaries
  111. Alternative Calling Conventions
  112. Summary
  113. 21. Obfuscated Code Analysis
  114. Anti–Dynamic Analysis Techniques
  115. Static De-obfuscation of Binaries Using IDA
  116. Virtual Machine-Based Obfuscation
  117. Summary
  118. 22. Vulnerability Analysis
  119. After-the-Fact Vulnerability Discovery with IDA
  120. IDA and the Exploit-Development Process
  121. Analyzing Shellcode
  122. Summary
  123. 23. Real-World IDA Plug-ins
  124. IDAPython
  125. collabREate
  126. ida-x86emu
  127. Class Informer
  128. MyNav
  129. IdaPdf
  130. Summary
  131. VI. The IDA Debugger
  132. 24. The IDA Debugger
  133. Basic Debugger Displays
  134. Process Control
  135. Automating Debugger Tasks
  136. Summary
  137. 25. Disassembler/Debugger Integration
  138. IDA Databases and the IDA Debugger
  139. Debugging Obfuscated Code
  140. IdaStealth
  141. Dealing with Exceptions
  142. Summary
  143. 26. Additional Debugger Features
  144. Debugging with Bochs
  145. Appcall
  146. Summary
  147. A. Using IDA Freeware 5.0
  148. Using IDA Freeware
  149. B. IDC/SDK Cross-Reference
  150. Index
  151. About the Author

Useful IDC Functions

At this point, you have all the information required to write well-formed IDC scripts. What you are lacking is the ability to perform any useful interaction with IDA itself. IDC provides a long list of built-in functions that offer many different ways to access a database. All of the functions are documented to some degree in the IDA help system under the topic Index of IDC functions. In most cases, the documentation is nothing more than relevant lines copied from the main IDC include file, idc.idc. Becoming comfortable with the rather terse documentation is one of the more frustrating aspects of learning IDC. In general, there is no easy way to answer the question “How do I do x in IDC?” The most common way to figure out how to do something is to browse the list of IDC functions looking for one that, based on its name, appears to do what you need. This presumes, of course, that the functions are named according to their purpose, but their purpose may not always be obvious. For example, in many cases, functions that retrieve information from the database are named GetXXX; however; in many other cases, the Get prefix is not used. Functions that change the database may be named SetXXX, MakeXXX, or something else entirely. In summary, if you want to use IDC, get used to browsing the list of functions and reading through their descriptions. If you find yourself at a complete loss, don’t be afraid to use the support forums at Hex-Rays.[102]

The intent of the remainder of this section is to point out some of the more useful (in our experience) IDC functions and group them into functional areas. Even if you intend to script in Python only, familiarity with the listed functions will be useful to you because IDAPython provides Python equivalents to each function listed here. We make no attempt to cover every IDC function, however, since they are already covered in the IDA help system.

Functions for Reading and Modifying Data

The following functions provide access to individual bytes, words, and double words in a database:

long Byte(long addr)

Reads a byte value from virtual address addr.

long Word(long addr)

Reads a word (2-byte) value from virtual address addr.

long Dword(long addr)

Reads a double word (4-byte) value from virtual address addr.

void PatchByte(long addr, long val)

Sets a byte value at virtual address addr.

void PatchWord(long addr, long val)

Sets a word value at virtual address addr.

void PatchDword(long addr, long val)

Sets a double word value at virtual address addr.

bool isLoaded(long addr)

Returns 1 if addr contains valid data, 0 otherwise.

Each of these functions takes the byte ordering (little-endian or big-endian) of the current processor module into account when reading and writing the database. The PatchXXX functions also trim the supplied value to an appropriate size by using only the proper number of low-order bytes according to the function called. For example, a call to PatchByte(0x401010, 0x1234) will patch location 0x401010 with the byte value 0x34 (the low-order byte of 0x1234). If an invalid address is supplied while reading the database with Byte, Word, and Dword, the values 0xFF, 0xFFFF, and 0xFFFFFFFF will be returned, respectively. Because there is no way to distinguish these error values from legitimate data stored in the database, you may wish to call isLoaded to determine whether an address in the database contains any data prior to attempting to read from that address.

Because of a quirk in refreshing IDA’s disassembly view, you may find that the results of a patch operation are not immediately visible. In such cases, scrolling away from the patched location and then scrolling back to the patched location generally forces the display to be updated properly.

User Interaction Functions

In order to perform any user interaction at all, you will need to familiarize yourself with IDC input/output functions. The following list summarizes some of IDC’s more useful interface functions:

void Message(string format, ...)

Prints a formatted message to the output window. This function is analogous to C’s printf function and accepts a printf-style format string.

void print(...)

Prints the string representation of each argument to the output window.

void Warning(string format, ...)

Displays a formatted message in a dialog.

string AskStr(string default, string prompt)

Displays an input dialog asking the user to enter a string value. Returns the user’s string or 0 if the dialog was canceled.

string AskFile(long doSave, string mask, string prompt)

Displays a file-selection dialog to simplify the task of choosing a file. New files may be created for saving data (doSave = 1), or existing files may be chosen for reading data (doSave = 0). The displayed list of files may be filtered according to mask (such as *.* or *.idc). Returns the name of the selected file or 0 if the dialog was canceled.

long AskYN(long default, string prompt)

Prompts the user with a yes or no question, highlighting a default answer (1 = yes, 0 = no, −1 = cancel). Returns an integer representing the selected answer.

long ScreenEA()

Returns the virtual address of the current cursor location.

bool Jump(long addr)

Jumps the disassembly window to the specified address.

Because IDC lacks any debugging facilities, you may find yourself using the Message function as your primary debugging tool. Several other AskXXX functions exist to handle more specialized input cases such as integer input. Please refer to the help system documentation for a complete list of available AskXXX functions. The ScreenEA function is very useful for picking up the current cursor location when you wish to create a script that tailors its behavior based on the location of the cursor. Similarly, the Jump function is useful when you have a script that needs to call the user’s attention to a specific location within the disassembly.

String-Manipulation Functions

Although simple string assignment and concatenation are taken care of with basic operators in IDC, more complex operations must be performed using available string-handling functions, some of which are detailed here:

string form(string format, ...) // pre IDA 5.6

Returns a new string formatted according to the supplied format strings and values. This is the rough equivalent to C’s sprintf function.

string sprintf(string format, ...) // IDA 5.6+

With IDA 5.6, sprintf replaces form (see above).

long atol(string val)

Converts the decimal value val to its corresponding integer representation.

long xtol(string val)

Converts the hexadecimal value val (which may optionally begin with 0x) to its corresponding integer representation.

string ltoa(long val, long radix)

Returns a string representation of val in the specified radix (2, 8, 10, or 16).

long ord(string ch)

Returns the ASCII value of the one-character string ch.

long strlen(string str)

Returns the length of the provided string.

long strstr(string str, string substr)

Returns the index of substr within str or −1 if the substring is not found.

string substr(string str, long start, long end)

Returns the substring containing the characters from start through end-1 of str. Using slices (IDA 5.6+) this function is equivalent to str[start:end].

Recall that there is no character datatype in IDC, nor is there any array syntax. Lacking slices, if you want to iterate through the individual characters within a string, you must take successive one-character substrings for each character in the string.

File Input/Output Functions

The output window may not always be the ideal place to send the output of your scripts. For scripts that generate a large amount of text or scripts that generate binary data, you may wish to output to disk files instead. We have already discussed using the AskFile function to ask a user for a filename. However, AskFile returns only a string containing the name of a file. IDC’s file-handling functions are detailed here:

long fopen(string filename, string mode)

Returns an integer file handle (or 0 on error) for use with all IDC file I/O functions. The mode parameter is similar to the modes used in C’s fopen (r to read, w to write, and so on).

void fclose(long handle)

Closes the file specified by the file handle from fopen.

long filelength(long handle)

Returns the length of the indicated file or −1 on error.

long fgetc(long handle)

Reads a single byte from the given file. Returns −1 on error.

long fputc(long val, long handle)

Writes a single byte to the given file. Returns 0 on success or −1 on error.

long fprintf(long handle, string format, ...)

Writes a formatted string to the given file.

long writestr(long handle, string str)

Writes the specified string to the given file.

string/long readstr(long handle)

Reads a string from the given file. This function reads all characters (including non-ASCII) up to and including the next line feed (ASCII 0xA) character. Returns the string on success or −1 on end of file.

long writelong(long handle, long val, long bigendian)

Writes a 4-byte integer to the given file using big-endian (bigendian = 1) or little-endian (bigendian = 0) byte order.

long readlong(long handle, long bigendian)

Reads a 4-byte integer from the given file using big-endian (bigendian = 1) or little-endian (bigendian = 0) byte order.

long writeshort(long handle, long val, long bigendian)

Writes a 2-byte integer to the given file using big-endian (bigendian = 1) or little-endian (bigendian = 0) byte order.

long readshort(long handle, long bigendian)

Reads a 2-byte integer from the given file using big-endian (bigendian = 1) or little-endian (bigendian = 0) byte order.

bool loadfile(long handle, long pos, long addr, long length)

Reads length number of bytes from position pos in the given file and writes those bytes into the database beginning at address addr.

bool savefile(long handle, long pos, long addr, long length)

Writes length number of bytes beginning at database address addr to position pos in the given file.

Manipulating Database Names

The need to manipulate named locations arises fairly often in scripts. The following IDC functions are available for working with named locations in an IDA database:

string Name(long addr)

Returns the name associated with the given address or returns the empty string if the location has no name. This function does not return user-assigned names when the names are marked as local.

string NameEx(long from, long addr)

Returns the name associated with addr. Returns the empty string if the location has no name. This function returns user-defined local names if from is any address within a function that also contains addr.

bool MakeNameEx(long addr, string name, long flags)

Assigns the given name to the given address. The name is created with attributes specified in the flags bitmask. These flags are described in the help file documentation for MakeNameEx and are used to specify attributes such as whether the name is local or public or whether it should be listed in the names window.

long LocByName(string name)

Returns the address of the location with the given name. Returns BADADDR (−1) if no such name exists in the database.

long LocByNameEx(long funcaddr, string localname)

Searches for the given local name within the function containing funcaddr. Returns BADADDR (−1) if no such name exists in the given function.

Functions Dealing with Functions

Many scripts are designed to perform analysis of functions within a database. IDA assigns disassembled functions a number of attributes, such as the size of the function’s local variable area or the size of the function’s arguments on the runtime stack. The following IDC functions can be used to access information about functions within a database.

long GetFunctionAttr(long addr, long attrib)

Returns the requested attribute for the function containing the given address. Refer to the IDC help documentation for a list of attribute constants. As an example, to find the ending address of a function, use GetFunctionAttr(addr, FUNCATTR_END);.

string GetFunctionName(long addr)

Returns the name of the function that contains the given address or an empty string if the given address does not belong to a function.

long NextFunction(long addr)

Returns the starting address of the next function following the given address. Returns −1 if there are no more functions in the database.

long PrevFunction(long addr)

Returns the starting address of the nearest function that precedes the given address. Returns −1 if no function precedes the given address.

Use the LocByName function to find the starting address of a function given the function’s name.

Code Cross-Reference Functions

Cross-references were covered in Chapter 9. IDC offers functions for accessing cross-reference information associated with any instruction. Deciding which functions meet the needs of your scripts can be a bit confusing. It requires you to understand whether you are interested in following the flows leaving a given address or whether you are interested in iterating over all of the locations that refer to a given address. Functions for performing both of the preceding operations are described here. Several of these functions are designed to support iteration over a set of cross-references. Such functions support the notion of a sequence of cross-references and require a current cross-reference in order to return a next cross-reference. Examples of using cross-reference iterators are provided in Enumerating Cross-References in Enumerating Cross-References.

long Rfirst(long from)

Returns the first location to which the given address transfers control. Returns BADADDR (−1) if the given address refers to no other address.

long Rnext(long from, long current)

Returns the next location to which the given address (from) transfers control, given that current has already been returned by a previous call to Rfirst or Rnext. Returns BADADDR if no more cross-references exist.

long XrefType()

Returns a constant indicating the type of the last cross-reference returned by a cross-reference lookup function such as Rfirst. For code cross-references, these constants are fl_CN (near call), fl_CF (far call), fl_JN (near jump), fl_JF (far jump), and fl_F (ordinary sequential flow).

long RfirstB(long to)

Returns the first location that transfers control to the given address. Returns BADADDR (−1) if there are no references to the given address.

long RnextB(long to, long current)

Returns the next location that transfers control to the given address (to), given that current has already been returned by a previous call to RfirstB or RnextB. Returns BADADDR if no more cross-references to the given location exist.

Each time a cross-reference function is called, an internal IDC state variable is set that indicates the type of the last cross-reference that was returned. If you need to know what type of cross-reference you have received, then you must call XrefType prior to calling another cross-reference lookup function.

Data Cross-Reference Functions

The functions for accessing data cross-reference information are very similar to the functions used to access code cross-reference information. These functions are described here:

long Dfirst(long from)

Returns the first location to which the given address refers to a data value. Returns BADADDR (−1) if the given address refers to no other addresses.

long Dnext(long from, long current)

Returns the next location to which the given address (from) refers a data value, given that current has already been returned by a previous call to Dfirst or Dnext. Returns BADADDR if no more cross-references exist.

long XrefType()

Returns a constant indicating the type of the last cross-reference returned by a cross-reference lookup function such as Dfirst. For data cross-references, these constants include dr_O (offset taken), dr_W (data write), and dr_R (data read).

long DfirstB(long to)

Returns the first location that refers to the given address as data. Returns BADADDR (−1) if there are no references to the given address.

long DnextB(long to, long current)

Returns the next location that refers to the given address (to) as data, given that current has already been returned by a previous call to DfirstB or DnextB. Returns BADADDR if no more cross-references to the given location exist.

As with code cross-references, if you need to know what type of cross-reference you have received, then you must call XrefType prior to calling another cross-reference lookup function.

Database Manipulation Functions

A number of functions exist for formatting the contents of a database. Here are descriptions of a few of these functions:

void MakeUnkn(long addr, long flags)

Undefines the item at the specified address. The flags (see the IDC documentation for MakeUnkn) dictate whether subsequent items will also be undefined and whether any names associated with undefined items will be deleted. Related function MakeUnknown allows you to undefine large blocks of data.

long MakeCode(long addr)

Converts the bytes at the specified address into an instruction. Returns the length of the instruction or 0 if the operation fails.

bool MakeByte(long addr)

Converts the item at the specified address into a data byte. MakeWord and MakeDword are also available.

bool MakeComm(long addr, string comment)

Adds a regular comment at the given address.

bool MakeFunction(long begin, long end)

Converts the range of instructions from begin to end into a function. If end is specified as BADADDR (-1), IDA attempts to automatically identify the end of the function by locating the function’s return instruction.

bool MakeStr(long begin, long end)

Creates a string of the current string type (as returned by GetStringType), spanning the bytes from begin to end - 1. If end is specified as BADADDR, IDA attempts to automatically identify the end of the string.

Many other MakeXXX functions exist that offer behavior similar to the functions just described. Please refer to the IDC documentation for a full list of these functions.

Database Search Functions

The majority of IDA’s search capabilities are accessible in IDC in the form of various FindXXX functions, some of which are described here. The flags parameter used in the FindXXX functions is a bitmask that specifies the behavior of the find operation. Three of the more useful flags are SEARCH_DOWN, which causes the search to scan toward higher addresses; SEARCH_NEXT, which skips the current occurrence in order to search for the next occurrence; and SEARCH_CASE, which causes binary and text searches to be performed in a case-sensitive manner.

long FindCode(long addr, long flags)

Searches for an instruction from the given address.

long FindData(long addr, long flags)

Searches for a data item from the given address.

long FindBinary(long addr, long flags, string binary)

Searches for a sequence of bytes from the given address. The binary string specifies a sequence of hexadecimal byte values. If SEARCH_CASE is not specified and a byte value specifies an uppercase or lowercase ASCII letter, then the search will also match corresponding, complementary case values. For example, “41 42” will match “61 62” (and “61 42”) unless the SEARCH_CASE flag is set.

long FindText(long addr, long flags, long row, long column, string text)

Searches for a text string from the given column on the given line (row) at the given address. Note that the disassembly text at a given address may span several lines, hence the need to specify on which line the search should begin.

Also note that SEARCH_NEXT does not define the direction of search, which may be either up or down according to the SEARCH_DOWN flag. In addition, when SEARCH_NEXT is not specified, it is perfectly reasonable for a FindXXX function to return the same address that was passed in as the addr argument when the item at addr satisfies the search.

Disassembly Line Components

From time to time it is useful to extract the text, or portions of the text, of individual lines in a disassembly listing. The following functions provide access to various components of a disassembly line:

string GetDisasm(long addr)

Returns disassembly text for the given address. The returned text includes any comments but does not include address information.

string GetMnem(long addr)

Returns the mnemonic portion of the instruction at the given address.

string GetOpnd(long addr, long opnum)

Returns the text representation of the specified operand at the specified address. Operands are numbered from zero beginning with the leftmost operand.

long GetOpType(long addr, long opnum)

Returns an integer representing the type for the given operand at the given address. Refer to the IDC documentation for GetOpType for a complete list of operand type codes.

long GetOperandValue(long addr, long opnum)

Returns the integer value associated with the given operand at the given address. The nature of the returned value depends on the type of the given operand as specified by GetOpType.

string CommentEx(long addr, long type)

Returns the text of any comment present at the given address. If type is 0, the text of the regular comment is returned. If type is 1, the text of the repeatable comment is returned. If no comment is present at the given address, an empty string is returned.



[102] The support forum is currently located at http://www.hex-rays.com/forum/