Table of Contents for
The IDA Pro Book, 2nd Edition

Version ebook / Retour

Cover image for bash Cookbook, 2nd Edition The IDA Pro Book, 2nd Edition by Chris Eagle Published by No Starch Press, 2011
  1. Cover
  2. The IDA Pro Book
  3. PRAISE FOR THE FIRST EDITION OF THE IDA PRO BOOK
  4. Acknowledgments
  5. Introduction
  6. I. Introduction to IDA
  7. 1. Introduction to Disassembly
  8. The What of Disassembly
  9. The Why of Disassembly
  10. The How of Disassembly
  11. Summary
  12. 2. Reversing and Disassembly Tools
  13. Summary Tools
  14. Deep Inspection Tools
  15. Summary
  16. 3. IDA Pro Background
  17. Obtaining IDA Pro
  18. IDA Support Resources
  19. Your IDA Installation
  20. Thoughts on IDA’s User Interface
  21. Summary
  22. II. Basic IDA Usage
  23. 4. Getting Started with IDA
  24. IDA Database Files
  25. Introduction to the IDA Desktop
  26. Desktop Behavior During Initial Analysis
  27. IDA Desktop Tips and Tricks
  28. Reporting Bugs
  29. Summary
  30. 5. IDA Data Displays
  31. Secondary IDA Displays
  32. Tertiary IDA Displays
  33. Summary
  34. 6. Disassembly Navigation
  35. Stack Frames
  36. Searching the Database
  37. Summary
  38. 7. Disassembly Manipulation
  39. Commenting in IDA
  40. Basic Code Transformations
  41. Basic Data Transformations
  42. Summary
  43. 8. Datatypes and Data Structures
  44. Creating IDA Structures
  45. Using Structure Templates
  46. Importing New Structures
  47. Using Standard Structures
  48. IDA TIL Files
  49. C++ Reversing Primer
  50. Summary
  51. 9. Cross-References and Graphing
  52. IDA Graphing
  53. Summary
  54. 10. The Many Faces of IDA
  55. Using IDA’s Batch Mode
  56. Summary
  57. III. Advanced IDA Usage
  58. 11. Customizing IDA
  59. Additional IDA Configuration Options
  60. Summary
  61. 12. Library Recognition Using FLIRT Signatures
  62. Applying FLIRT Signatures
  63. Creating FLIRT Signature Files
  64. Summary
  65. 13. Extending IDA’s Knowledge
  66. Augmenting Predefined Comments with loadint
  67. Summary
  68. 14. Patching Binaries and Other IDA Limitations
  69. IDA Output Files and Patch Generation
  70. Summary
  71. IV. Extending IDA’s Capabilities
  72. 15. IDA Scripting
  73. The IDC Language
  74. Associating IDC Scripts with Hotkeys
  75. Useful IDC Functions
  76. IDC Scripting Examples
  77. IDAPython
  78. IDAPython Scripting Examples
  79. Summary
  80. 16. The IDA Software Development Kit
  81. The IDA Application Programming Interface
  82. Summary
  83. 17. The IDA Plug-in Architecture
  84. Building Your Plug-ins
  85. Installing Plug-ins
  86. Configuring Plug-ins
  87. Extending IDC
  88. Plug-in User Interface Options
  89. Scripted Plug-ins
  90. Summary
  91. 18. Binary Files and IDA Loader Modules
  92. Manually Loading a Windows PE File
  93. IDA Loader Modules
  94. Writing an IDA Loader Using the SDK
  95. Alternative Loader Strategies
  96. Writing a Scripted Loader
  97. Summary
  98. 19. IDA Processor Modules
  99. The Python Interpreter
  100. Writing a Processor Module Using the SDK
  101. Building Processor Modules
  102. Customizing Existing Processors
  103. Processor Module Architecture
  104. Scripting a Processor Module
  105. Summary
  106. V. Real-World Applications
  107. 20. Compiler Personalities
  108. RTTI Implementations
  109. Locating main
  110. Debug vs. Release Binaries
  111. Alternative Calling Conventions
  112. Summary
  113. 21. Obfuscated Code Analysis
  114. Anti–Dynamic Analysis Techniques
  115. Static De-obfuscation of Binaries Using IDA
  116. Virtual Machine-Based Obfuscation
  117. Summary
  118. 22. Vulnerability Analysis
  119. After-the-Fact Vulnerability Discovery with IDA
  120. IDA and the Exploit-Development Process
  121. Analyzing Shellcode
  122. Summary
  123. 23. Real-World IDA Plug-ins
  124. IDAPython
  125. collabREate
  126. ida-x86emu
  127. Class Informer
  128. MyNav
  129. IdaPdf
  130. Summary
  131. VI. The IDA Debugger
  132. 24. The IDA Debugger
  133. Basic Debugger Displays
  134. Process Control
  135. Automating Debugger Tasks
  136. Summary
  137. 25. Disassembler/Debugger Integration
  138. IDA Databases and the IDA Debugger
  139. Debugging Obfuscated Code
  140. IdaStealth
  141. Dealing with Exceptions
  142. Summary
  143. 26. Additional Debugger Features
  144. Debugging with Bochs
  145. Appcall
  146. Summary
  147. A. Using IDA Freeware 5.0
  148. Using IDA Freeware
  149. B. IDC/SDK Cross-Reference
  150. Index
  151. About the Author

Creating FLIRT Signature Files

As we discussed previously, it is simply impractical for IDA to ship with signature files for every static library in existence. In order to provide IDA users with the tools and information necessary to create their own signatures, Hex-Rays distributes the Fast Library Acquisition for Identification and Recognition (FLAIR) tool set. The FLAIR tools are made available on your IDA distribution CD or via download from the Hex-Rays website[80] for authorized customers. Like several other IDA add-ons, the FLAIR tools are distributed in a Zip file. Hex-Rays does not necessarily release a new version of the FLAIR tools with each version of IDA, so you should use the most recent version of FLAIR that does not exceed your version of IDA.

Installation of the FLAIR utilities is a simple matter of extracting the contents of the associated Zip file, though we highly recommend that you create a dedicated flair directory as the destination because the Zip file is not organized with a top-level directory. Inside the FLAIR distribution you will find several text files that constitute the documentation for the FLAIR tools. Files of particular interest include these:

readme.txt

This is a top-level overview of the signature-creation process.

plb.txt

This file describes the use of the static library parser, plb.exe. Library parsers are discussed in more detail in Creating Pattern Files in Creating Pattern Files.

pat.txt

This file details the format of pattern files, which represent the first step in the signature-creation process. Pattern files are also described in Creating Pattern Files in Creating Pattern Files.

sigmake.txt

This file describes the use of sigmake.exe for generating .sig files from pattern files. Please refer to Creating Signature Files in Creating Signature Files for more details.

Additional top-level content of interest includes the bin directory, which contains all of the FLAIR tools executable files, and the startup directory, which contains pattern files for common startup sequences associated with various compilers and their associated output file types (PE, ELF, and so on). Prior to version 6.1, the FLAIR tools area is available for Windows only; however, the resulting signature files may be used with all IDA variants (Windows, Linux, and OS X).

Signature-Creation Overview

The basic process for creating signatures files does not seem complicated, as it boils down to four simple-sounding steps.

  1. Obtain a copy of the static library for which you wish to create a signature file.

  2. Utilize one of the FLAIR parsers to create a pattern file for the library.

  3. Run sigmake.exe to process the resulting pattern file and generate a signature file.

  4. Install the new signature file in IDA by copying it to <IDADIR>/sig.

Unfortunately, in practice, only the last step is as easy as it sounds. In the following sections, we discuss the first three steps in more detail.

Identifying and Acquiring Static Libraries

The first step in the signature-generation process is to locate a copy of the static library for which you wish to generate signatures. This can pose a bit of a challenge for a variety of reasons. The first obstacle is to determine which library you actually need. If the binary you are analyzing has not been stripped, you might be lucky enough to have actual function names available in your disassembly, in which case an Internet search will probably provide several pointers to likely candidates.

Stripped binaries are not quite as forthcoming regarding their origins. Lacking function names, you may find that a good strings search may yield sufficiently unique strings to allow for library identification, such as the following, which is a dead giveaway:

OpenSSL 1.0.0b-fips 16 Nov 2010

Copyright notices and error strings are often sufficiently unique that once again you can use an Internet search to narrow your candidates. If you choose to run strings from the command line, remember to use the -a option to force strings to scan the entire binary; otherwise you may miss some potentially useful string data.

In the case of open source libraries, you are likely to find source code readily available. Unfortunately, while the source code may be useful in helping you understand the behavior of the binary, you cannot use it to generate your signatures. It might be possible to use the source to build your own version of the static library and then use that version in the signature-generation process. However, in all likelihood, variations in the build process will result in enough differences between the resulting library and the library you are analyzing that any signatures you generate will not be terribly accurate.

The best option is to attempt to determine the exact origin of the binary in question. By this we mean the exact operating system, operating system version, and distribution (if applicable). Given this information, the best option for creating signatures is to copy the libraries in question from an identically configured system. Naturally, this leads to the next challenge: Given an arbitrary binary, on what system was it created? A good first step is to use the file utility to obtain some preliminary information about the binary in question. In Chapter 2 we saw some sample output from file. In several cases, this output was sufficient to provide likely candidate systems. The following is just one example of very specific output from file:

$ file sample_file_1
sample_file_1: ELF 32-bit LSB executable, Intel 80386, version 1 (FreeBSD),
statically linked, for FreeBSD 8.0 (800107), stripped

In this case we might head straight to a FreeBSD 8.0 system and track down libc.a for starters. The following example is somewhat more ambiguous, however:

$ file sample_file_2
sample_file_2: ELF 32-bit LSB executable, Intel 80386, version 1 (GNU/Linux),
statically linked, for GNU/Linux 2.6.32, stripped

We appear to have narrowed the source of the file to a Linux system, which, given the abundance of available Linux distributions, is not saying much. Turning to strings we find the following:

GCC: (GNU) 4.5.1 20100924 (Red Hat 4.5.1-4)

Here the search has been narrowed to Red Hat distributions (or derivatives) that shipped with gcc version 4.5.1. GCC tags such as this are not uncommon in binaries compiled using gcc, and fortunately for us, they survive the stripping process and remain visible to strings.

Keep in mind that the file utility is not the be all and end all in file identification. The following output demonstrates a simple case in which file seems to know the type of the file being examined but for which the output is rather nonspecific.

$ file sample_file_3
sample_file_3: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV),
dynamically linked (uses shared libs), stripped

This example was taken from a Solaris 10 x86 system. Here again, the strings utility might be useful in pinpointing this fact.

Creating Pattern Files

At this point you should have one or more libraries for which you wish to create signatures. The next step is to create a pattern file for each library. Pattern files are created using an appropriate FLAIR parser utility. Like executable files, library files are built to various file format specifications. FLAIR provides parsers for several popular library file formats. As detailed in FLAIR’s readme.txt file, the following parsers can be found in FLAIR’s bin directory:

plb.exe/plb

Parser for OMF libraries (commonly used by Borland compilers)

pcf.exe/pcf

Parser for COFF libraries (commonly used by Microsoft compilers)

pelf.exe/pelf

Parser for ELF libraries (found on many Unix systems)

ppsx.exe/ppsx

Parser for Sony PlayStation PSX libraries

ptmobj.exe/ptmobj

Parser for TriMedia libraries

pomf166.exe/pomf166

Parser for Kiel OMF 166 object files

To create a pattern file for a given library, specify the parser that corresponds to the library’s format, the name of the library you wish to parse, and the name of the resulting pattern file that should be generated. For a copy of libc.a from a FreeBSD 8.0 system, you might use the following:

$ ./pelf libc.a libc_FreeBSD80.pat
libc.a: skipped 1, total 1089

Here, the parser reports the file that was parsed (libc.a), the number of functions that were skipped (1),[81] and the number of signature patterns that were generated (1089). Each parser accepts a slightly different set of command-line options documented only through the parser’s usage statement. Executing a parser with no arguments displays the list of command-line options accepted by that parser. The plb.txt file contains more detailed information on the options accepted by the plb parser. This file is a good basic source of information, since other parsers accept many of the options it describes as well. In many cases, simply naming the library to be parsed and the pattern file to be generated is sufficient.

A pattern file is a text file that contains, one per line, the extracted patterns that represent functions within a parsed library. A few lines from the pattern file created previously are shown here:

57568B7C240C8B742410FC8B4C2414C1E902F3A775108B4C241483E10
3F3A675 1E A55D 003E :0000 _memcmp
0FBC442404740340C39031C0C3...................................... 00
 0000 000D :0000 _ffs
57538B7C240C8B4C2410FC31C083F90F7E1B89FAF7DA83E20389CB29D389D1F3
 12 9E31 0032 :0000 _bzero

The format of an individual pattern is described in FLAIR’s pat.txt file. In a nutshell, the first portion of a pattern lists the initial byte sequence of the function to a maximum of 32 bytes. Allowance is made for bytes that may vary as a result of relocation entries. Such bytes are displayed using two dots. Dots are also used to fill the pattern out to 64 characters[82] when a function is shorter than 32 bytes (as _ffs is in the previous code). Beyond the initial 32 bytes, additional information is recorded to provide more precision in the signature-matching process. Additional information encoded into each pattern line includes a CRC16[83] value computed over a portion of the function, the length of the function in bytes, and a list of symbol names referenced by the function. In general, longer functions that reference many other symbols yield more complex pattern lines. In the file libc_FreeBSD80.pat generated previously, some pattern lines exceed 20,000 characters in length.

Several third-party programmers have created utilities designed to generate patterns from existing IDA databases. One such utility is IDB_2_PAT,[84] an IDA plug-in written by J.C. Roberts that is capable of generating patterns for one or more functions in an existing database. Utilities such as these are useful if you expect to encounter similar code in additional databases and have no access to the original library files used to create the binary being analyzed.

Creating Signature Files

Once you have created a pattern file for a given library, the next step in the signature-creation process is to generate a .sig file suitable for use with IDA. The format of an IDA signature file is substantially different from that of a pattern file. Signature files utilize a proprietary binary format designed both to minimize the amount of space required to represent all of the information present in a pattern file and to allow for efficient matching of signatures against actual database content. A high-level description of the structure of a signature file is available on the Hex-Rays website.[85]

FLAIR’s sigmake utility is used to create signature files from pattern files. By splitting pattern generation and signature generation into two distinct phases, the signature-generation process is completely independent of the pattern-generation process, which allows for the use of third-party pattern generators. In its simplest form, signature generation takes place by using sigmake to parse a .pat file and create a .sig file, as shown here:

$ ./sigmake libssl.pat libssl.sig

If all goes well, a .sig file is generated and ready to install into <IDADIR>/sig. However, the process seldom runs that smoothly.

Note

The sigmake documentation file, sigmake.txt, recommends that signature filenames follow the MS-DOS 8.3 name-length convention. This is not a hard-and-fast requirement, however. When longer filenames are used, only the first eight characters of the base filename are displayed in the signature-selection dialog.

Signature generation is often an iterative process, as it is during this phase when collisions must be handled. A collision occurs anytime two functions have identical patterns. If collisions are not resolved in some manner, it is not possible to determine which function is actually being matched during the signature-application process. Therefore, sigmake must be able to resolve each generated signature to exactly one function name. When this is not possible, based on the presence of identical patterns for one or more functions, sigmake refuses to generate a .sig file and instead generates an exclusions file (.exc). A more typical first pass using sigmake and a new .pat file (or set of .pat files) might yield the following.

$ ./sigmake libc_FreeBSD80.pat libc_FreeBSD80.sig
libc_FreeBSD80.sig: modules/leaves: 1088/1024, COLLISIONS: 10
See the documentation to learn how to resolve collisions.

The documentation being referred to is sigmake.txt, which describes the use of sigmake and the collision-resolution process. In reality, each time sigmake is executed, it searches for a corresponding exclusions file that might contain information on how to resolve any collisions that sigmake may encounter while processing the named pattern file. In the absence of such an exclusions file, and when collisions occur, sigmake generates such an exclusions file rather than a signature file. In the previous example, we would find a newly created file named libc_FreeBSD80.exc. When first created, exclusions files are text files that detail the conflicts that sigmake encountered while processing the pattern file. The exclusions file must be edited to provide sigmake with guidance as to how it should resolve the conflicting patterns. The general process for editing an exclusions file follows.

When generated by sigmake, all exclusions files begin with the following lines:

;--------- (delete these lines to allow sigmake to read this file)
; add '+' at the start of a line to select a module
; add '−' if you are not sure about the selection
; do nothing if you want to exclude all modules

The intent of these lines it to remind you what to do to resolve collisions before you can successfully generate signatures. The most important thing to do is delete the four lines that begin with semicolons, or sigmake will fail to parse the exclusions file during subsequent execution. The next step is to inform sigmake of your desire for collision resolution. A few lines extracted from libc_FreeBSD80.exc appear here:

_index   00 0000 538B4424088A4C240C908A1838D974074084DB75F531C05BC3..............
_strchr  00 0000 538B4424088A4C240C908A1838D974074084DB75F531C05BC3..............
_rindex  00 0000 538B5424088A4C240C31C0908A1A38D9750289D04284DB75F35BC3..........
_strrchr 00 0000 538B5424088A4C240C31C0908A1A38D9750289D04284DB75F35BC3..........
_flsl    01 EF04 5531D289E58B450885C0741183F801B201740AD1E883C20183F80175F65D89D0
_fls     01 EF04 5531D289E58B450885C0741183F801B201740AD1E883C20183F80175F65D89D0

These lines detail three separate collisions. In this case, we are being told that the function index is indistinguishable from strchr, rindex has the same signature as strrchr, and flsl collides with fls. If you are familiar with any of these functions, this result may not surprise you, as the colliding functions are essentially identical (for example, index and strchr perform the same action).

In order to leave you in control of your own destiny, sigmake expects you to designate no more than one function in each group as the proper function for the associated signature. You select a function by prefixing the name with a plus character (+) if you want the name applied anytime the corresponding signature is matched in a database or a minus character (-) if you simply want a comment added to the database whenever the corresponding signature is matched. If you do not want any names applied when the corresponding signature is matched in a database, then you do not add any characters. The following listing represents one possible way to provide a valid resolution for the three collisions noted previously:

+_index   00 0000 538B4424088A4C240C908A1838D974074084DB75F531C05BC3..............
_strchr  00 0000 538B4424088A4C240C908A1838D974074084DB75F531C05BC3..............
_rindex  00 0000 538B5424088A4C240C31C0908A1A38D9750289D04284DB75F35BC3..........
_strrchr 00 0000 538B5424088A4C240C31C0908A1A38D9750289D04284DB75F35BC3..........
_flsl    01 EF04 5531D289E58B450885C0741183F801B201740AD1E883C20183F80175F65D89D0
-_fls     01 EF04 5531D289E58B450885C0741183F801B201740AD1E883C20183F80175F65D89D0

In this case we elect to use the name index whenever the first signature is matched, do nothing at all when the second signature is matched, and have a comment about fls added when the third signature is matched. The following points are useful when attempting to resolve collisions:

  1. To perform minimal collision resolution, simply delete the four commented lines at the beginning of the exclusions file.

  2. Never add a +/- to more than one function in a collision group.

  3. If a collision group contains only a single function, do not add a +/- in front of that function; simply leave it alone.

  4. Subsequent failures of sigmake cause data, including comment lines, to be appended to any existing exclusions file. This extra data should be removed and the original data corrected (if the data was correct, sigmake would not have failed a second time) before rerunning sigmake.

Once you have made appropriate changes to your exclusions file, you must save the file and rerun sigmake using the same command-line arguments that you used initially. The second time through, sigmake should locate, and abide by, your exclusions file, resulting in the successful generation of a .sig file. Successful operation of sigmake is noted by the lack of error messages and the presence of a .sig file, as shown here:

$ ./sigmake libc_FreeBSD80.pat libc_FreeBSD80.sig

After a signature file has been successfully generated, you make it available to IDA by copying it to your <IDADIR>/sig directory. Then your new signatures are available using File ▸ Load File ▸ FLIRT Signature File.

Note that we have purposefully glossed over all of the options that can be supplied to both the pattern generators and sigmake. A rundown of available options is provided in plb.txt and sigmake.txt. The only option we will make note of is the -n option used with sigmake. This option allows you to embed a descriptive name inside a generated signature file. This name is displayed during the signature-selection process (see Figure 12-1), and it can be very helpful when sorting through the list of available signatures. The following command line embeds the name string “FreeBSD 8.0 C standard library” within the generated signature file:

$ ./sigmake -n"FreeBSD 8.0 C standard library" libc_FreeBSD80.pat libc_FreeBSD80.sig

As an alternative, library names can be specified using directives within exclusion files. However, since exclusion files may not be required in all signature-generation cases, the command-line option is generally more useful. For further details, please refer to sigmake.txt.

Startup Signatures

IDA also recognizes a specialized form of signatures called startup signatures. Startup signatures are applied when a binary is first loaded into a database in an attempt to identify the compiler that was used to create the binary. If IDA can identify the compiler used to build a binary, then additional signature files, associated with the identified compiler, are automatically loaded during the initial analysis of the binary.

Given that the compiler type is initially unknown when a file is first loaded, startup signatures are grouped by and selected according to the file type of the binary being loaded. For example, if a Windows PE binary is being loaded, then startup signatures specific to PE binaries are loaded in an effort to determine the compiler used to build the PE binary in question.

In order to generate startup signatures, sigmake processes patterns that describe the startup routine[86] generated by various compilers and groups the resulting signatures into a single type-specific signature file. The startup directory in the FLAIR distribution contains the startup patterns used by IDA, along with the script, startup.bat, used to create the corresponding startup signatures from those patterns. Refer to startup.bat for examples of using sigmake to create startup signatures for a specific file format.

In the case of PE files, you would notice several pe_*.pat files in the startup directory that describe startup patterns used by several popular Windows compilers, including pe_vc.pat for Visual Studio patterns and pe_gcc.pat for Cygwin/gcc patterns. If you wish to add additional startup patterns for PE files, you would need to add them to one of the existing PE pattern files or create a new pattern file with a pe_ prefix in order for the startup signature-generation script to properly find your patterns and incorporate them into the newly generated PE signatures.

One last note about startup patterns concerns their format, which unfortunately is slightly different from patterns generated for library functions. The difference lies in the fact that a startup pattern line is capable of relating the pattern to additional sets of signatures that should also be applied if a match against the pattern is made. Other than the example startup patterns included in the startup directory, the format of a startup pattern is not documented in any of the text files included with FLAIR.



[80] The current version is flair61.zip and is available here: http://www.hex-rays.com/idapro/ida/flair61.zip. A username and password supplied by Hex-Rays are required to access the download.

[81] The plb and pcf parsers may skip some functions depending on the command-line options supplied to the parsers and the structure of the library being parsed.

[82] At two characters per byte, 64 hexadecimal characters are required to display the contents of 32 bytes.

[83] This is a 16-bit cyclic redundancy check value. The CRC16 implementation utilized for pattern generation is included with the FLAIR tool distribution in the file crc16.cpp.

[86] The startup routine is generally designated as the program’s entry point. In a C/C++ program, the purpose of the startup routine is to initialize the program’s environment prior to passing control to the main function.