Shellcode can be found in a variety of sources, including network traffic, web pages, media files, and malware. Because it is not always possible to create an environment with the correct version of the vulnerable program that the exploit targets, the malware analyst must try to reverse-engineer shellcode using only static analysis.
Malicious web pages typically use JavaScript to profile a user’s system and check for
vulnerable versions of the browser and installed plug-ins. The JavaScript unescape is typically used to convert the encoded shellcode text into a binary package
suitable for execution. Shellcode is often stored as an encoded text string included with the script
that triggers the exploit.
The encoding understood by unescape treats the text
%uXXYY as an encoded big-endian Unicode
character, where XX and YY are hex values. On little-endian machines (such as x86), the
byte sequence YY
XX will be the result after decoding. For
example, consider this text string:
%u1122%u3344%u5566%u7788%u99aa%ubbcc%uddee
It will be decoded to the following binary byte sequence:
22 11 44 33 66 55 88 77 aa 99 cc bb ee dd
A % symbol that is not immediately followed by the letter
u is treated as a single encoded hex byte. For example, the text
string %41%42%43%44 will be decoded to the binary byte sequence
41 42 43 44.
Both single- and double-byte encoded characters can be used within the same text string. This is a popular technique wherever JavaScript is used, including in PDF documents.
Shellcode used within a malicious executable is usually easy to identify because the entire program will be written using shellcode techniques as obfuscation, or a shellcode payload will be stored within the malware and will be injected into another process.
The shellcode payload is usually found by looking for the typical process-injection API calls
discussed in Chapter 12: VirtualAllocEx, WriteProcessMemory, and CreateRemoteThread. The buffer written into the other process probably
contains shellcode if the malware launches a remote thread without applying relocation fix-ups or
resolving external dependencies. This may be convenient for the malware writer, since shellcode can
bootstrap itself and execute without help from the originating malware.
Sometimes shellcode is stored unencoded within a media file. Disassemblers such as IDA Pro can load arbitrary binary files, including those suspected of containing shellcode. However, even if IDA Pro loads the file, it may not analyze the shellcode, because it does not know which bytes are valid code.
Finding shellcode usually means searching for the initial decoder that is likely present at the start of the shellcode. Useful opcodes to search for are listed in Table 19-2.
Table 19-2. Some Opcode Bytes to Search For
Instruction type | Common opcodes |
|---|---|
Call |
|
Unconditional jumps |
|
Loops |
|
Short conditional jumps |
|
Attempt to disassemble each instance of the opcodes listed in Table 19-2 in the loaded file. Any valid code should be immediately obvious. Just remember that the payload is likely encoded, so only the decoder will be visible at first.
If none of those searches work, there may still be embedded shellcode, because some file formats allow for encoded embedded data. For example, exploits targeting the CVE-2010-0188 critical vulnerability in Adobe Reader use malformed TIFF images, embedded within PDFs, stored as a Base64-encoded string, which may be zlib-compressed. When working with particular file formats, you will need to be familiar with that format and the kind of data it can contain in order to search for malicious content.