Lab 13-3 Solutions

Short Answers

Dynamic analysis might reveal some random-looking content that may be encoded. There are no recognizable strings in the program output, so nothing else suggests encoding.
Searching for xor instructions reveals six separate functions that may be associated with encoding, but the type of encoding is not immediately clear.
All three techniques identify the Advanced Encryption Standard (AES) algorithm (Rijndael algorithm), which is associated with all six of the XOR functions identified. The IDA Entropy Plugin also identifies a custom Base64 indexing string, which shows no evidence of association with xor instructions.
The malware uses AES and a custom Base64 cipher.
The key for AES is ijklmnopqrstuvwx. The key for the custom Base64 cipher is the index string:
```
CDEFGHIJKLMNOPQRSTUVWXYZABcdefghijklmnopqrstuvwxyzab0123456789+/
```
The index string is sufficient for the custom Base64 implementation. For AES, variables other than the key may be needed to implement decryption, including the key-generation algorithm if one is used, the key size, the mode of operation, and the initialization vector if one is needed.
The malware establishes a reverse command shell with the incoming commands decoded using the custom Base64 cipher and the outgoing command-shell responses encrypted with AES.
See the detailed analysis for an example of how to decrypt content.

Detailed Analysis

Starting with basic dynamic analysis, we see that the malware tries to resolve the domain name www.practicalmalwareanalysis.com and connect out on TCP port 8910 to that host. We use Netcat to send some content over the connection, and see the malware respond with some random content, but not with any recognizable strings. If we then terminate the socket from the Netcat side, we see a message like this:

ERROR: API    = ReadConsole.
   error code = 0.
   message    = The operation completed successfully.

Examining the output of strings, we see evidence related to all of the strings we have seen so far: www.practicalmalwareanalysis.com, ERROR: API = %s., error code = %d., message = %s., and ReadConsole. There are other relevant strings, like WriteConsole and DuplicateHandle, which may be part of error messages like the preceding ReadConsole error.

The random content seen during dynamic analysis suggests that encoding is being used, although we can’t tell what is encoded. Certain strings suggest that the malware performs encryption, including Data not multiple of Block Size, Empty key, Incorrect key length, and Incorrect block length.

Examining the xor instructions and eliminating those associated with register clearing and library functions, we find six that contain xor. Given the large number of identified functions, let’s just label them for now and see how they correspond with the additional techniques we will apply. Table C-6 summarizes how we rename the IDA Pro function names.

Table C-6. Functions Containing Suspect xor Instructions

Assigned Function Name	Address of Function
`s_xor1`	00401AC2
`s_xor2`	0040223A
`s_xor3`	004027ED
`s_xor4`	00402DA8
`s_xor5`	00403166
`s_xor6`	00403990

Using the FindCrypt2 plug-in for IDA Pro, we find the constants shown in Example C-105.

Example C-105. FindCrypt2 output

40CB08: found const array Rijndael_Te0 (used in Rijndael)
40CF08: found const array Rijndael_Te1 (used in Rijndael)
40D308: found const array Rijndael_Te2 (used in Rijndael)
40D708: found const array Rijndael_Te3 (used in Rijndael)
40DB08: found const array Rijndael_Td0 (used in Rijndael)
40DF08: found const array Rijndael_Td1 (used in Rijndael)
40E308: found const array Rijndael_Td2 (used in Rijndael)
40E708: found const array Rijndael_Td3 (used in Rijndael)
Found 8 known constant arrays in total.

Example C-105 refers to Rijndael, the original name of the AES cipher. After looking at the cross-references, it is clear that s_xor2 and s_xor4 are connected with the encryption constants (_TeX), and s_xor3 and s_xor5 are connected with the decryption constants (_TdX).

The PEiD KANAL plug-in reveals AES constants in a similar location. Example C-106 shows the output of the PEiD tool. PEiD’s identification of S and S-inv refer to the S-box structures that are a basic component of some cryptographic algorithms.

Example C-106. PEiD KANAL output

RIJNDAEL [S] [char] :: 0000C908 :: 0040C908
RIJNDAEL [S-inv] [char] :: 0000CA08 :: 0040CA08

Finally, the IDA Entropy Plugin shows areas of high entropy. First, an examination of regions of high 8-bit entropy (256-bit chunk size with a minimum entropy value of 7.9) highlights the area between 0x0040C900 and 0x0040CB00—the same area previously identified as S-box regions. Looking at regions of high 6-bit entropy (64-bit chunk size with a minimum entropy value of 5.95), we also find an area within the .data section between 0x004120A3 and 0x004120A7, as shown in Figure C-49.

Figure C-49. IDA Entropy Plugin high 6-bit entropy findings

Looking at the high entropy areas shown in Figure C-49, we see a string starting at 0x004120A4 that contains all 64 Base64 characters:

CDEFGHIJKLMNOPQRSTUVWXYZABcdefghijklmnopqrstuvwxyzab0123456789+/

Notice that this is not the standard Base64 string, because the capital AB and the lowercase ab have been moved to the back of their uppercase or lowercase sections. This malware may use a custom Base64-encoding algorithm.

Let’s review the relationship between the XOR-related functions we identified and other information we have collected. From the location of the Rijndael constants we’ve identified, it is clear that the s_xor2 and s_xor4 functions are related to AES encryption, and that the s_xor3 and s_xor5 functions are related to AES decryption.

The code inside the s_xor6 function is shown in Figure C-50.

Figure C-50. XOR encoding loop in s_xor6

The loop in Figure C-50 contains the xor instruction at ❶ that shows that s_xor6 is being used for XOR encoding. The variable arg_0 is a pointer to a source buffer that is being transformed, and arg_4 points to the buffer providing the XOR material. As the loop is followed, pointers to the two buffers (arg_0 and arg_4), as well as the counter var_4, are updated as shown by the three references at ❷.

To determine if s_xor6 is related to the other encoding functions, we examine its cross-references. The function that calls s_xor6 starts at 0x0040352D. Figure C-51 shows a graph of the function cross-references from 0x0040352D.

Figure C-51. Relationship of encryption functions

From this graph, we see that s_xor6 is indeed related to the other AES encryption functions s_xor2 and s_xor4.

Although we have evidence that s_xor3 and s_xor5 are related to AES decryption, the relationship of these two functions to other functions is less clear. For example, when we look for the cross-reference to s_xor5, we see that the two locations from which s_xor5 is called (0x004037EE and 0x0040392D) appear to contain valid code, but the area is not defined as a function. This suggests that while AES code was linked to the malware, decryption is not used, and thus the decryption routines show up initially as dead code.

Having identified the function from which s_xor5 is called (0x00403745) as a decryption function, we re-create a graph that shows all of the functions called from 0x00403745 (which we rename s_AES_decrypt) and 0x0040352D (which we rename s_AES_encrypt), as shown in Figure C-52.

Figure C-52. Relationship of XOR functions to AES

This graph shows more clearly the relationship among all of the AES functions, and in it we can see that all XOR functions other than s_xor1 are related to the AES implementation.

Looking at s_xor1, we see several early branches in the code that occur when the arguments are incorrect, and luckily the malware still has the error messages present. These error messages include Empty key, Incorrect key length, and Incorrect block length, implying that this is the key initialization code.

To confirm that we’ve identified the key initialization code, we can try to find a connection between this function and the previously identified AES functions. Looking at the calling function for s_xor1, we see that just before s_xor1 is called, there is a reference to unk_412EF8. This offset is passed to the s_xor1 function using ECX. Looking at other references to unk_412EF8, we find that 0x401429 is one of the places that the offset of unk_412EF8 is loaded into ECX, just before the call to s_AES_encrypt. The address unk_412EF8 must be a C++ object representing the AES encryptor, and s_xor1 is the initialization function for that encryptor.

Looking back at s_xor1, we see that the Empty key message is issued after a test of the arg_0 parameter. From this, we can assume that the arg_0 parameter is the key. Looking at the parameter setup in main near the call to s_xor1 (at 0x401895), we can associate arg_0 with the string ijklmnopqrstuvwx, which is pushed on the stack. This string is the key used for AES in this malware.

Here’s a review of what we know about how AES is used in this malware:

s_AES_encrypt is used in the function at 0x0040132B. The encryption occurs between a call to ReadFile and a call to WriteFile.
s_xor1 is the AES initialization function that occurs once at the start of the process.
s_xor1 sets the AES password as ijklmnopqrstuvwx.

In addition to AES, we identified the possible use of a custom Base64 cipher with the use of the IDA Entropy Plugin (indicated in Figure C-49). Examining the references to the string CDEFGHIJKLMNOPQRSTUVWXYZABcdefghijklmnopqrstuvwxyzab0123456789+/, we learn that this string is in the function at 0x0040103F. This function does the indexed lookup into the string, and the calling function (at 0x00401082) divides the string to be decoded into 4-byte chunks. The function at 0x00401082 then is the custom Base64 decode function, and we can see in the function that calls it (0x0040147C) that the decode function lies in between a ReadFile and a WriteFile. This is the same pattern we saw for the use of AES, but in a different function.

Before we can decrypt content, we need to determine the connection between the content and encoding algorithm. As we know, the AES encryption function is used by the function starting at 0x0040132B. Looking at the function that calls the function at 0x0040132B in Example C-107, we see that 0x0040132B is the start of a new thread created with the CreateThread shown at ❶, so we rename 0x0040132B to aes_thread.

Example C-107. Parameters to CreateThread for aes_thread

00401823                 mov     eax, [ebp+var_18]
00401826                 mov     [ebp+var_58], eax ❷
00401829                 mov     ecx, [ebp+arg_10]
0040182C                 mov     [ebp+var_54], ecx ❸
0040182F                 mov     edx, dword_41336C
00401835                 mov     [ebp+var_50], edx ❹
00401838                 lea     eax, [ebp+var_3C]
0040183B                 push    eax               ; lpThreadId
0040183C                 push    0                 ; dwCreationFlags
0040183E                 lea     ecx, [ebp+var_58]
00401841                 push    ecx               ; lpParameter
00401842                 push    offset aes_thread ; lpStartAddress
00401847                 push    0                 ; dwStackSize
00401849                 push    0                 ; lpThreadAttributes
0040184B                 call    ds:CreateThread ❶

The parameters to the thread start function are passed as the location of var_58, and we see three variables pushed onto the stack relative to var_58 as follows:

var_18 is moved to var_58 at ❷.
arg_10 is moved to var_54 at ❸.
dword_41336C is moved to var_50 at ❹.

In aes_thread (0x40132B), we see how the parameters are used. Example C-108 shows select portions of aes_thread with calls to ReadFile and WriteFile, and the origin of the handles passed to those functions.

Example C-108. Handles passed to ReadFile and WriteFile in aes_thread

0040137A         mov     eax, [ebp+arg_0]
0040137D         mov     [ebp+var_BE0], eax
...
004013A2         mov     ecx, [ebp+var_BE0]
004013A8         mov     edx, [ecx]
004013AA         push    edx ❶            ; hFile
004013AB         call    ds:ReadFile
...
0040144A         mov     eax, [ebp+var_BE0]
00401450         mov     ecx, [eax+4]
00401453         push    ecx ❷            ; hFile
00401454         call    ds:WriteFile

The value pushed for ReadFile at ❶ can be mapped back to var_58/var_18, as shown in Example C-107 at ❷. The value pushed for WriteFile in Example C-108 at ❷ can be mapped back to var_54/arg_10, as shown in Example C-107 at ❸.

Tracing the handle values back to their origin, we find first that var_58 and var_18 hold a handle to a pipe that is created early in the function at 0x0040132B, and that this pipe is connected with the output of a command shell. The command hSourceHandle is copied to the standard output and standard error of the command shell started by the CreateProcess command at 0x0040177B, as shown in Example C-109.

Example C-109. Connecting a pipe to shell output

00401748                 mov     ecx, [ebp+hSourceHandle]
0040174B                 mov     [ebp+StartupInfo.hStdOutput], ecx
0040174E                 mov     edx, [ebp+hSourceHandle]
00401751                 mov     [ebp+StartupInfo.hStdError], edx

The other handle used by WriteFile in aes_thread (var_54/arg_10) can be traced to the parameter passed in from the _main function (0x00401879)—a networking socket created with the connect call.

The aes_thread (0x0040132B) function reads the output of the launched command shell and encrypts it before writing it to the network socket.

The custom Base64-encoding function (0x00401082) is also used in a function (0x0040147C) that is started via its own thread. The tracing of inputs is very similar to the tracing of the inputs for the AES thread, with a mirror image conclusion: The Base64 thread reads as input the remote socket, and after it decodes the function, it sends the result to the input of the command shell.

Modified Base64 Decoding

Having established the two types of encoding in this malware, let’s try to decrypt the content. Beginning with the custom Base64 encoding, we’ll assume that part of the captured network communication coming from the remote site is the string: BInaEi==. Example C-110 shows a custom script for decrypting modified Base64 implementations.

Example C-110. Custom Base64 decryption script

import string
import base64

s = ""
tab = 'CDEFGHIJKLMNOPQRSTUVWXYZABcdefghijklmnopqrstuvwxyzab0123456789+/'
b64 = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/'

ciphertext = 'BInaEi=='

for ch in ciphertext:
    if (ch in tab):
        s += b64[string.find(tab,str(ch))]
    elif (ch == '='):
        s += '='

print base64.decodestring(s)

Note

The code in Example C-110 is a generic script that can be repurposed for any custom Base64 implementation by redefining the tab variable.

Using this script, we translate the string to see what command was sent to the command shell. The output in Example C-111 shows that the attacker is sending a request for a directory listing (dir).

Example C-111. Output of custom Base64 decryption script

$ python custom_b64_decrypt.py
dir

Decrypting AES

Translating the AES side of the command channel is slightly more challenging. For example, say that the malware sends the raw stream content shown in Example C-112.

Example C-112. AES-encrypted network content

00000000  37 f3 1f 04 51 20 e0 b5  86 ac b6 0f 65 20 89 92 7...Q .. ....e ..
00000010  4f af 98 a4 c8 76 98 a6  4d d5 51 8f a5 cb 51 c5 O....v.. M.Q...Q.
00000020  cf 86 11 0d c5 35 38 5c  9c c5 ab 66 78 40 1d df .....58\ ...fx@..
00000030  4a 53 f0 11 0f 57 6d 4f  b7 c9 c8 bf 29 79 2f c1 JS...WmO ....)y/.
00000040  ec 60 b2 23 00 7b 28 fa  4d c1 7b 81 93 bb ca 9e .`.#.{(. M.{.....
00000050  bb 27 dd 47 b6 be 0b 0f  66 10 95 17 9e d7 c4 8d .'.G.... f.......
00000060  ee 11 09 99 20 49 3b df  de be 6e ef 6a 12 db bd .... I;. ..n.j...
00000070  a6 76 b0 22 13 ee a9 38  2d 2f 56 06 78 cb 2f 91 .v."...8 -/V.x./.
00000080  af 64 af a6 d1 43 f1 f5  47 f6 c2 c8 6f 00 49 39 .d...C.. G...o.I9

The PyCrypto library provides convenient cryptographic routines for dealing with data like this. Using the code shown in Example C-113, we can decrypt the content.

Example C-113. AES decryption script

from Crypto.Cipher import AES
import binascii

raw = ' 37 f3 1f 04 51 20 e0 b5  86 ac b6 0f 65 20 89 92 ' + \
' 4f af 98 a4 c8 76 98 a6  4d d5 51 8f a5 cb 51 c5 ' + \
' cf 86 11 0d c5 35 38 5c  9c c5 ab 66 78 40 1d df ' + \
' 4a 53 f0 11 0f 57 6d 4f  b7 c9 c8 bf 29 79 2f c1 ' + \
' ec 60 b2 23 00 7b 28 fa  4d c1 7b 81 93 bb ca 9e ' + \
' bb 27 dd 47 b6 be 0b 0f  66 10 95 17 9e d7 c4 8d ' + \
' ee 11 09 99 20 49 3b df  de be 6e ef 6a 12 db bd ' + \
' a6 76 b0 22 13 ee a9 38  2d 2f 56 06 78 cb 2f 91 ' + \
' af 64 af a6 d1 43 f1 f5  47 f6 c2 c8 6f 00 49 39 ' ❶

ciphertext = binascii.unhexlify(raw.replace(' ','')) ❷
obj = AES.new('ijklmnopqrstuvwx', AES.MODE_CBC) ❸
print 'Plaintext is:\n' + obj.decrypt(ciphertext) ❹

The raw variable defined at ❶ contains the raw network content identified in Example C-112. The raw.replace function at ❷ removes the spaces from the raw string, and the binascii.unhexlify function turns the hex representation into a binary string. The AES.new call at ❸ creates a new AES object with the appropriate password and mode of operation, which allows for the following decrypt call at ❹.

The output of the AES script is shown in Example C-114. Note that this captured content was simply a command prompt.

Example C-114. AES decryption script output

$ python aes_decrypt.py
Plaintext is:
Microsoft Windows XP [Version 5.1.2600]
(C) Copyright 1985-2001 Microsoft Corp.

C:\Documents and Settings\user\Desktop\13_3_demo>

Crypto Pitfalls

The default use of the PyCrypto library routines worked successfully in Lab 13-3 Solutions, but there are many potential pitfalls when trying to implement decryption routines directly, including the following:

Block cryptography algorithms have many possible modes of operation, such as Electronic Code Book (ECB), Cipher Block Chaining (CBC), and Cipher Feedback (CFB). Each mode requires a different set of steps between the encoding or decoding of each block, and some require an initialization vector in addition to a password. If you don’t match the implementation used, decryption may work only partially or not at all.
In this lab, the key was provided directly. A given implementation may have its own technique for generating a key given a user-provided or string-based password. In such cases, the key-generation algorithm will need to be identified and duplicated separately.
Within a standard algorithm, there may be options that must be specified correctly. For example, a single encryption algorithm may allow multiple key sizes, block sizes, rounds of encryption or decryption, and padding strategies.

Table of Contents for Practical Malware Analysis