Creating FLIRT Signature Files

As we discussed previously, it is simply impractical for IDA to ship with signature files for every static library in existence. In order to provide IDA users with the tools and information necessary to create their own signatures, Hex-Rays distributes the Fast Library Acquisition for Identification and Recognition (FLAIR) tool set. The FLAIR tools are made available on your IDA distribution CD or via download from the Hex-Rays website^[80] for authorized customers. Like several other IDA add-ons, the FLAIR tools are distributed in a Zip file. Hex-Rays does not necessarily release a new version of the FLAIR tools with each version of IDA, so you should use the most recent version of FLAIR that does not exceed your version of IDA.

Installation of the FLAIR utilities is a simple matter of extracting the contents of the associated Zip file, though we highly recommend that you create a dedicated flair directory as the destination because the Zip file is not organized with a top-level directory. Inside the FLAIR distribution you will find several text files that constitute the documentation for the FLAIR tools. Files of particular interest include these:

readme.txt: This is a top-level overview of the signature-creation process.
plb.txt: This file describes the use of the static library parser, plb.exe. Library parsers are discussed in more detail in Creating Pattern Files in Creating Pattern Files.
pat.txt: This file details the format of pattern files, which represent the first step in the signature-creation process. Pattern files are also described in Creating Pattern Files in Creating Pattern Files.
sigmake.txt: This file describes the use of sigmake.exe for generating .sig files from pattern files. Please refer to Creating Signature Files in Creating Signature Files for more details.

Additional top-level content of interest includes the bin directory, which contains all of the FLAIR tools executable files, and the startup directory, which contains pattern files for common startup sequences associated with various compilers and their associated output file types (PE, ELF, and so on). Prior to version 6.1, the FLAIR tools area is available for Windows only; however, the resulting signature files may be used with all IDA variants (Windows, Linux, and OS X).

Signature-Creation Overview

The basic process for creating signatures files does not seem complicated, as it boils down to four simple-sounding steps.

Obtain a copy of the static library for which you wish to create a signature file.
Utilize one of the FLAIR parsers to create a pattern file for the library.
Run sigmake.exe to process the resulting pattern file and generate a signature file.
Install the new signature file in IDA by copying it to <IDADIR>/sig.

Unfortunately, in practice, only the last step is as easy as it sounds. In the following sections, we discuss the first three steps in more detail.

Identifying and Acquiring Static Libraries

The first step in the signature-generation process is to locate a copy of the static library for which you wish to generate signatures. This can pose a bit of a challenge for a variety of reasons. The first obstacle is to determine which library you actually need. If the binary you are analyzing has not been stripped, you might be lucky enough to have actual function names available in your disassembly, in which case an Internet search will probably provide several pointers to likely candidates.

Stripped binaries are not quite as forthcoming regarding their origins. Lacking function names, you may find that a good strings search may yield sufficiently unique strings to allow for library identification, such as the following, which is a dead giveaway:

OpenSSL 1.0.0b-fips 16 Nov 2010

Copyright notices and error strings are often sufficiently unique that once again you can use an Internet search to narrow your candidates. If you choose to run strings from the command line, remember to use the -a option to force strings to scan the entire binary; otherwise you may miss some potentially useful string data.

In the case of open source libraries, you are likely to find source code readily available. Unfortunately, while the source code may be useful in helping you understand the behavior of the binary, you cannot use it to generate your signatures. It might be possible to use the source to build your own version of the static library and then use that version in the signature-generation process. However, in all likelihood, variations in the build process will result in enough differences between the resulting library and the library you are analyzing that any signatures you generate will not be terribly accurate.

The best option is to attempt to determine the exact origin of the binary in question. By this we mean the exact operating system, operating system version, and distribution (if applicable). Given this information, the best option for creating signatures is to copy the libraries in question from an identically configured system. Naturally, this leads to the next challenge: Given an arbitrary binary, on what system was it created? A good first step is to use the file utility to obtain some preliminary information about the binary in question. In Chapter 2 we saw some sample output from file. In several cases, this output was sufficient to provide likely candidate systems. The following is just one example of very specific output from file:

$ file sample_file_1
sample_file_1: ELF 32-bit LSB executable, Intel 80386, version 1 (FreeBSD),
statically linked, for FreeBSD 8.0 (800107), stripped

In this case we might head straight to a FreeBSD 8.0 system and track down libc.a for starters. The following example is somewhat more ambiguous, however:

$ file sample_file_2
sample_file_2: ELF 32-bit LSB executable, Intel 80386, version 1 (GNU/Linux),
statically linked, for GNU/Linux 2.6.32, stripped

We appear to have narrowed the source of the file to a Linux system, which, given the abundance of available Linux distributions, is not saying much. Turning to strings we find the following:

GCC: (GNU) 4.5.1 20100924 (Red Hat 4.5.1-4)

Here the search has been narrowed to Red Hat distributions (or derivatives) that shipped with gcc version 4.5.1. GCC tags such as this are not uncommon in binaries compiled using gcc, and fortunately for us, they survive the stripping process and remain visible to strings.

Keep in mind that the file utility is not the be all and end all in file identification. The following output demonstrates a simple case in which file seems to know the type of the file being examined but for which the output is rather nonspecific.

$ file sample_file_3
sample_file_3: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV),
dynamically linked (uses shared libs), stripped

This example was taken from a Solaris 10 x86 system. Here again, the strings utility might be useful in pinpointing this fact.

Creating Pattern Files

At this point you should have one or more libraries for which you wish to create signatures. The next step is to create a pattern file for each library. Pattern files are created using an appropriate FLAIR parser utility. Like executable files, library files are built to various file format specifications. FLAIR provides parsers for several popular library file formats. As detailed in FLAIR’s readme.txt file, the following parsers can be found in FLAIR’s bin directory:

plb.exe/plb: Parser for OMF libraries (commonly used by Borland compilers)
pcf.exe/pcf: Parser for COFF libraries (commonly used by Microsoft compilers)
pelf.exe/pelf: Parser for ELF libraries (found on many Unix systems)
ppsx.exe/ppsx: Parser for Sony PlayStation PSX libraries
ptmobj.exe/ptmobj: Parser for TriMedia libraries
pomf166.exe/pomf166: Parser for Kiel OMF 166 object files

To create a pattern file for a given library, specify the parser that corresponds to the library’s format, the name of the library you wish to parse, and the name of the resulting pattern file that should be generated. For a copy of libc.a from a FreeBSD 8.0 system, you might use the following:

$ ./pelf libc.a libc_FreeBSD80.pat
libc.a: skipped 1, total 1089

Here, the parser reports the file that was parsed (libc.a), the number of functions that were skipped (1),^[81] and the number of signature patterns that were generated (1089). Each parser accepts a slightly different set of command-line options documented only through the parser’s usage statement. Executing a parser with no arguments displays the list of command-line options accepted by that parser. The plb.txt file contains more detailed information on the options accepted by the plb parser. This file is a good basic source of information, since other parsers accept many of the options it describes as well. In many cases, simply naming the library to be parsed and the pattern file to be generated is sufficient.

A pattern file is a text file that contains, one per line, the extracted patterns that represent functions within a parsed library. A few lines from the pattern file created previously are shown here:

57568B7C240C8B742410FC8B4C2414C1E902F3A775108B4C241483E10
3F3A675 1E A55D 003E :0000 _memcmp
0FBC442404740340C39031C0C3...................................... 00
 0000 000D :0000 _ffs
57538B7C240C8B4C2410FC31C083F90F7E1B89FAF7DA83E20389CB29D389D1F3
 12 9E31 0032 :0000 _bzero

The format of an individual pattern is described in FLAIR’s pat.txt file. In a nutshell, the first portion of a pattern lists the initial byte sequence of the function to a maximum of 32 bytes. Allowance is made for bytes that may vary as a result of relocation entries. Such bytes are displayed using two dots. Dots are also used to fill the pattern out to 64 characters^[82] when a function is shorter than 32 bytes (as _ffs is in the previous code). Beyond the initial 32 bytes, additional information is recorded to provide more precision in the signature-matching process. Additional information encoded into each pattern line includes a CRC16^[83] value computed over a portion of the function, the length of the function in bytes, and a list of symbol names referenced by the function. In general, longer functions that reference many other symbols yield more complex pattern lines. In the file libc_FreeBSD80.pat generated previously, some pattern lines exceed 20,000 characters in length.

Several third-party programmers have created utilities designed to generate patterns from existing IDA databases. One such utility is IDB_2_PAT,^[84] an IDA plug-in written by J.C. Roberts that is capable of generating patterns for one or more functions in an existing database. Utilities such as these are useful if you expect to encounter similar code in additional databases and have no access to the original library files used to create the binary being analyzed.

Creating Signature Files

Once you have created a pattern file for a given library, the next step in the signature-creation process is to generate a .sig file suitable for use with IDA. The format of an IDA signature file is substantially different from that of a pattern file. Signature files utilize a proprietary binary format designed both to minimize the amount of space required to represent all of the information present in a pattern file and to allow for efficient matching of signatures against actual database content. A high-level description of the structure of a signature file is available on the Hex-Rays website.^[85]

FLAIR’s sigmake utility is used to create signature files from pattern files. By splitting pattern generation and signature generation into two distinct phases, the signature-generation process is completely independent of the pattern-generation process, which allows for the use of third-party pattern generators. In its simplest form, signature generation takes place by using sigmake to parse a .pat file and create a .sig file, as shown here:

$ ./sigmake libssl.pat libssl.sig

If all goes well, a .sig file is generated and ready to install into <IDADIR>/sig. However, the process seldom runs that smoothly.

Note

The sigmake documentation file, sigmake.txt, recommends that signature filenames follow the MS-DOS 8.3 name-length convention. This is not a hard-and-fast requirement, however. When longer filenames are used, only the first eight characters of the base filename are displayed in the signature-selection dialog.

Signature generation is often an iterative process, as it is during this phase when collisions must be handled. A collision occurs anytime two functions have identical patterns. If collisions are not resolved in some manner, it is not possible to determine which function is actually being matched during the signature-application process. Therefore, sigmake must be able to resolve each generated signature to exactly one function name. When this is not possible, based on the presence of identical patterns for one or more functions, sigmake refuses to generate a .sig file and instead generates an exclusions file (.exc). A more typical first pass using sigmake and a new .pat file (or set of .pat files) might yield the following.

$ ./sigmake libc_FreeBSD80.pat libc_FreeBSD80.sig
libc_FreeBSD80.sig: modules/leaves: 1088/1024, COLLISIONS: 10
See the documentation to learn how to resolve collisions.

The documentation being referred to is sigmake.txt, which describes the use of sigmake and the collision-resolution process. In reality, each time sigmake is executed, it searches for a corresponding exclusions file that might contain information on how to resolve any collisions that sigmake may encounter while processing the named pattern file. In the absence of such an exclusions file, and when collisions occur, sigmake generates such an exclusions file rather than a signature file. In the previous example, we would find a newly created file named libc_FreeBSD80.exc. When first created, exclusions files are text files that detail the conflicts that sigmake encountered while processing the pattern file. The exclusions file must be edited to provide sigmake with guidance as to how it should resolve the conflicting patterns. The general process for editing an exclusions file follows.

When generated by sigmake, all exclusions files begin with the following lines:

;--------- (delete these lines to allow sigmake to read this file)
; add '+' at the start of a line to select a module
; add '−' if you are not sure about the selection
; do nothing if you want to exclude all modules

The intent of these lines it to remind you what to do to resolve collisions before you can successfully generate signatures. The most important thing to do is delete the four lines that begin with semicolons, or sigmake will fail to parse the exclusions file during subsequent execution. The next step is to inform sigmake of your desire for collision resolution. A few lines extracted from libc_FreeBSD80.exc appear here:

_index   00 0000 538B4424088A4C240C908A1838D974074084DB75F531C05BC3..............
_strchr  00 0000 538B4424088A4C240C908A1838D974074084DB75F531C05BC3..............
_rindex  00 0000 538B5424088A4C240C31C0908A1A38D9750289D04284DB75F35BC3..........
_strrchr 00 0000 538B5424088A4C240C31C0908A1A38D9750289D04284DB75F35BC3..........
_flsl    01 EF04 5531D289E58B450885C0741183F801B201740AD1E883C20183F80175F65D89D0
_fls     01 EF04 5531D289E58B450885C0741183F801B201740AD1E883C20183F80175F65D89D0

These lines detail three separate collisions. In this case, we are being told that the function index is indistinguishable from strchr, rindex has the same signature as strrchr, and flsl collides with fls. If you are familiar with any of these functions, this result may not surprise you, as the colliding functions are essentially identical (for example, index and strchr perform the same action).

In order to leave you in control of your own destiny, sigmake expects you to designate no more than one function in each group as the proper function for the associated signature. You select a function by prefixing the name with a plus character (+) if you want the name applied anytime the corresponding signature is matched in a database or a minus character (-) if you simply want a comment added to the database whenever the corresponding signature is matched. If you do not want any names applied when the corresponding signature is matched in a database, then you do not add any characters. The following listing represents one possible way to provide a valid resolution for the three collisions noted previously:

+_index   00 0000 538B4424088A4C240C908A1838D974074084DB75F531C05BC3..............
_strchr  00 0000 538B4424088A4C240C908A1838D974074084DB75F531C05BC3..............
_rindex  00 0000 538B5424088A4C240C31C0908A1A38D9750289D04284DB75F35BC3..........
_strrchr 00 0000 538B5424088A4C240C31C0908A1A38D9750289D04284DB75F35BC3..........
_flsl    01 EF04 5531D289E58B450885C0741183F801B201740AD1E883C20183F80175F65D89D0
-_fls     01 EF04 5531D289E58B450885C0741183F801B201740AD1E883C20183F80175F65D89D0

In this case we elect to use the name index whenever the first signature is matched, do nothing at all when the second signature is matched, and have a comment about fls added when the third signature is matched. The following points are useful when attempting to resolve collisions:

To perform minimal collision resolution, simply delete the four commented lines at the beginning of the exclusions file.
Never add a +/- to more than one function in a collision group.
If a collision group contains only a single function, do not add a +/- in front of that function; simply leave it alone.
Subsequent failures of sigmake cause data, including comment lines, to be appended to any existing exclusions file. This extra data should be removed and the original data corrected (if the data was correct, sigmake would not have failed a second time) before rerunning sigmake.

Once you have made appropriate changes to your exclusions file, you must save the file and rerun sigmake using the same command-line arguments that you used initially. The second time through, sigmake should locate, and abide by, your exclusions file, resulting in the successful generation of a .sig file. Successful operation of sigmake is noted by the lack of error messages and the presence of a .sig file, as shown here:

$ ./sigmake libc_FreeBSD80.pat libc_FreeBSD80.sig

After a signature file has been successfully generated, you make it available to IDA by copying it to your <IDADIR>/sig directory. Then your new signatures are available using File ▸ Load File ▸ FLIRT Signature File.

Note that we have purposefully glossed over all of the options that can be supplied to both the pattern generators and sigmake. A rundown of available options is provided in plb.txt and sigmake.txt. The only option we will make note of is the -n option used with sigmake. This option allows you to embed a descriptive name inside a generated signature file. This name is displayed during the signature-selection process (see Figure 12-1), and it can be very helpful when sorting through the list of available signatures. The following command line embeds the name string “FreeBSD 8.0 C standard library” within the generated signature file:

$ ./sigmake -n"FreeBSD 8.0 C standard library" libc_FreeBSD80.pat libc_FreeBSD80.sig

As an alternative, library names can be specified using directives within exclusion files. However, since exclusion files may not be required in all signature-generation cases, the command-line option is generally more useful. For further details, please refer to sigmake.txt.

Startup Signatures

IDA also recognizes a specialized form of signatures called startup signatures. Startup signatures are applied when a binary is first loaded into a database in an attempt to identify the compiler that was used to create the binary. If IDA can identify the compiler used to build a binary, then additional signature files, associated with the identified compiler, are automatically loaded during the initial analysis of the binary.

Given that the compiler type is initially unknown when a file is first loaded, startup signatures are grouped by and selected according to the file type of the binary being loaded. For example, if a Windows PE binary is being loaded, then startup signatures specific to PE binaries are loaded in an effort to determine the compiler used to build the PE binary in question.

In order to generate startup signatures, sigmake processes patterns that describe the startup routine^[86] generated by various compilers and groups the resulting signatures into a single type-specific signature file. The startup directory in the FLAIR distribution contains the startup patterns used by IDA, along with the script, startup.bat, used to create the corresponding startup signatures from those patterns. Refer to startup.bat for examples of using sigmake to create startup signatures for a specific file format.

In the case of PE files, you would notice several pe_*.pat files in the startup directory that describe startup patterns used by several popular Windows compilers, including pe_vc.pat for Visual Studio patterns and pe_gcc.pat for Cygwin/gcc patterns. If you wish to add additional startup patterns for PE files, you would need to add them to one of the existing PE pattern files or create a new pattern file with a pe_ prefix in order for the startup signature-generation script to properly find your patterns and incorporate them into the newly generated PE signatures.

One last note about startup patterns concerns their format, which unfortunately is slightly different from patterns generated for library functions. The difference lies in the fact that a startup pattern line is capable of relating the pattern to additional sets of signatures that should also be applied if a match against the pattern is made. Other than the example startup patterns included in the startup directory, the format of a startup pattern is not documented in any of the text files included with FLAIR.

^[80]The current version is flair61.zip and is available here: http://www.hex-rays.com/idapro/ida/flair61.zip. A username and password supplied by Hex-Rays are required to access the download.

^[81]The plb and pcf parsers may skip some functions depending on the command-line options supplied to the parsers and the structure of the library being parsed.

^[82]At two characters per byte, 64 hexadecimal characters are required to display the contents of 32 bytes.

^[83]This is a 16-bit cyclic redundancy check value. The CRC16 implementation utilized for pattern generation is included with the FLAIR tool distribution in the file crc16.cpp.

^[84]See http://www.openrce.org/downloads/details/26/IDB_2_PAT.

^[85]See http://www.hex-rays.com/idapro/flirt.htm.

^[86]The startup routine is generally designated as the program’s entry point. In a C/C++ program, the purpose of the startup routine is to initialize the program’s environment prior to passing control to the main function.

Previous Chapter

Applying FLIRT Signatures

Next Chapter

Summary

Table of Contents for The IDA Pro Book, 2nd Edition