Table of Contents for
Practical Malware Analysis

Version ebook / Retour

Cover image for bash Cookbook, 2nd Edition Practical Malware Analysis by Andrew Honig Published by No Starch Press, 2012
  1. Cover
  2. Practical Malware Analysis: The Hands-On Guide to Dissecting Malicious Software
  3. Praise for Practical Malware Analysis
  4. Warning
  5. About the Authors
  6. About the Technical Reviewer
  7. About the Contributing Authors
  8. Foreword
  9. Acknowledgments
  10. Individual Thanks
  11. Introduction
  12. What Is Malware Analysis?
  13. Prerequisites
  14. Practical, Hands-On Learning
  15. What’s in the Book?
  16. 0. Malware Analysis Primer
  17. The Goals of Malware Analysis
  18. Malware Analysis Techniques
  19. Types of Malware
  20. General Rules for Malware Analysis
  21. I. Basic Analysis
  22. 1. Basic Static Techniques
  23. Antivirus Scanning: A Useful First Step
  24. Hashing: A Fingerprint for Malware
  25. Finding Strings
  26. Packed and Obfuscated Malware
  27. Portable Executable File Format
  28. Linked Libraries and Functions
  29. Static Analysis in Practice
  30. The PE File Headers and Sections
  31. Conclusion
  32. Labs
  33. 2. Malware Analysis in Virtual Machines
  34. The Structure of a Virtual Machine
  35. Creating Your Malware Analysis Machine
  36. Using Your Malware Analysis Machine
  37. The Risks of Using VMware for Malware Analysis
  38. Record/Replay: Running Your Computer in Reverse
  39. Conclusion
  40. 3. Basic Dynamic Analysis
  41. Sandboxes: The Quick-and-Dirty Approach
  42. Running Malware
  43. Monitoring with Process Monitor
  44. Viewing Processes with Process Explorer
  45. Comparing Registry Snapshots with Regshot
  46. Faking a Network
  47. Packet Sniffing with Wireshark
  48. Using INetSim
  49. Basic Dynamic Tools in Practice
  50. Conclusion
  51. Labs
  52. II. Advanced Static Analysis
  53. 4. A Crash Course in x86 Disassembly
  54. Levels of Abstraction
  55. Reverse-Engineering
  56. The x86 Architecture
  57. Conclusion
  58. 5. IDA Pro
  59. Loading an Executable
  60. The IDA Pro Interface
  61. Using Cross-References
  62. Analyzing Functions
  63. Using Graphing Options
  64. Enhancing Disassembly
  65. Extending IDA with Plug-ins
  66. Conclusion
  67. Labs
  68. 6. Recognizing C Code Constructs in Assembly
  69. Global vs. Local Variables
  70. Disassembling Arithmetic Operations
  71. Recognizing if Statements
  72. Recognizing Loops
  73. Understanding Function Call Conventions
  74. Analyzing switch Statements
  75. Disassembling Arrays
  76. Identifying Structs
  77. Analyzing Linked List Traversal
  78. Conclusion
  79. Labs
  80. 7. Analyzing Malicious Windows Programs
  81. The Windows API
  82. The Windows Registry
  83. Networking APIs
  84. Following Running Malware
  85. Kernel vs. User Mode
  86. The Native API
  87. Conclusion
  88. Labs
  89. III. Advanced Dynamic Analysis
  90. 8. Debugging
  91. Source-Level vs. Assembly-Level Debuggers
  92. Kernel vs. User-Mode Debugging
  93. Using a Debugger
  94. Exceptions
  95. Modifying Execution with a Debugger
  96. Modifying Program Execution in Practice
  97. Conclusion
  98. 9. OllyDbg
  99. Loading Malware
  100. The OllyDbg Interface
  101. Memory Map
  102. Viewing Threads and Stacks
  103. Executing Code
  104. Breakpoints
  105. Loading DLLs
  106. Tracing
  107. Exception Handling
  108. Patching
  109. Analyzing Shellcode
  110. Assistance Features
  111. Plug-ins
  112. Scriptable Debugging
  113. Conclusion
  114. Labs
  115. 10. Kernel Debugging with WinDbg
  116. Drivers and Kernel Code
  117. Setting Up Kernel Debugging
  118. Using WinDbg
  119. Microsoft Symbols
  120. Kernel Debugging in Practice
  121. Rootkits
  122. Loading Drivers
  123. Kernel Issues for Windows Vista, Windows 7, and x64 Versions
  124. Conclusion
  125. Labs
  126. IV. Malware Functionality
  127. 11. Malware Behavior
  128. Downloaders and Launchers
  129. Backdoors
  130. Credential Stealers
  131. Persistence Mechanisms
  132. Privilege Escalation
  133. Covering Its Tracks—User-Mode Rootkits
  134. Conclusion
  135. Labs
  136. 12. Covert Malware Launching
  137. Launchers
  138. Process Injection
  139. Process Replacement
  140. Hook Injection
  141. Detours
  142. APC Injection
  143. Conclusion
  144. Labs
  145. 13. Data Encoding
  146. The Goal of Analyzing Encoding Algorithms
  147. Simple Ciphers
  148. Common Cryptographic Algorithms
  149. Custom Encoding
  150. Decoding
  151. Conclusion
  152. Labs
  153. 14. Malware-Focused Network Signatures
  154. Network Countermeasures
  155. Safely Investigate an Attacker Online
  156. Content-Based Network Countermeasures
  157. Combining Dynamic and Static Analysis Techniques
  158. Understanding the Attacker’s Perspective
  159. Conclusion
  160. Labs
  161. V. Anti-Reverse-Engineering
  162. 15. Anti-Disassembly
  163. Understanding Anti-Disassembly
  164. Defeating Disassembly Algorithms
  165. Anti-Disassembly Techniques
  166. Obscuring Flow Control
  167. Thwarting Stack-Frame Analysis
  168. Conclusion
  169. Labs
  170. 16. Anti-Debugging
  171. Windows Debugger Detection
  172. Identifying Debugger Behavior
  173. Interfering with Debugger Functionality
  174. Debugger Vulnerabilities
  175. Conclusion
  176. Labs
  177. 17. Anti-Virtual Machine Techniques
  178. VMware Artifacts
  179. Vulnerable Instructions
  180. Tweaking Settings
  181. Escaping the Virtual Machine
  182. Conclusion
  183. Labs
  184. 18. Packers and Unpacking
  185. Packer Anatomy
  186. Identifying Packed Programs
  187. Unpacking Options
  188. Automated Unpacking
  189. Manual Unpacking
  190. Tips and Tricks for Common Packers
  191. Analyzing Without Fully Unpacking
  192. Packed DLLs
  193. Conclusion
  194. Labs
  195. VI. Special Topics
  196. 19. Shellcode Analysis
  197. Loading Shellcode for Analysis
  198. Position-Independent Code
  199. Identifying Execution Location
  200. Manual Symbol Resolution
  201. A Full Hello World Example
  202. Shellcode Encodings
  203. NOP Sleds
  204. Finding Shellcode
  205. Conclusion
  206. Labs
  207. 20. C++ Analysis
  208. Object-Oriented Programming
  209. Virtual vs. Nonvirtual Functions
  210. Creating and Destroying Objects
  211. Conclusion
  212. Labs
  213. 21. 64-Bit Malware
  214. Why 64-Bit Malware?
  215. Differences in x64 Architecture
  216. Windows 32-Bit on Windows 64-Bit
  217. 64-Bit Hints at Malware Functionality
  218. Conclusion
  219. Labs
  220. A. Important Windows Functions
  221. B. Tools for Malware Analysis
  222. C. Solutions to Labs
  223. Lab 1-1 Solutions
  224. Lab 1-2 Solutions
  225. Lab 1-3 Solutions
  226. Lab 1-4 Solutions
  227. Lab 3-1 Solutions
  228. Lab 3-2 Solutions
  229. Lab 3-3 Solutions
  230. Lab 3-4 Solutions
  231. Lab 5-1 Solutions
  232. Lab 6-1 Solutions
  233. Lab 6-2 Solutions
  234. Lab 6-3 Solutions
  235. Lab 6-4 Solutions
  236. Lab 7-1 Solutions
  237. Lab 7-2 Solutions
  238. Lab 7-3 Solutions
  239. Lab 9-1 Solutions
  240. Lab 9-2 Solutions
  241. Lab 9-3 Solutions
  242. Lab 10-1 Solutions
  243. Lab 10-2 Solutions
  244. Lab 10-3 Solutions
  245. Lab 11-1 Solutions
  246. Lab 11-2 Solutions
  247. Lab 11-3 Solutions
  248. Lab 12-1 Solutions
  249. Lab 12-2 Solutions
  250. Lab 12-3 Solutions
  251. Lab 12-4 Solutions
  252. Lab 13-1 Solutions
  253. Lab 13-2 Solutions
  254. Lab 13-3 Solutions
  255. Lab 14-1 Solutions
  256. Lab 14-2 Solutions
  257. Lab 14-3 Solutions
  258. Lab 15-1 Solutions
  259. Lab 15-2 Solutions
  260. Lab 15-3 Solutions
  261. Lab 16-1 Solutions
  262. Lab 16-2 Solutions
  263. Lab 16-3 Solutions
  264. Lab 17-1 Solutions
  265. Lab 17-2 Solutions
  266. Lab 17-3 Solutions
  267. Lab 18-1 Solutions
  268. Lab 18-2 Solutions
  269. Lab 18-3 Solutions
  270. Lab 18-4 Solutions
  271. Lab 18-5 Solutions
  272. Lab 19-1 Solutions
  273. Lab 19-2 Solutions
  274. Lab 19-3 Solutions
  275. Lab 20-1 Solutions
  276. Lab 20-2 Solutions
  277. Lab 20-3 Solutions
  278. Lab 21-1 Solutions
  279. Lab 21-2 Solutions
  280. Index
  281. Index
  282. Index
  283. Index
  284. Index
  285. Index
  286. Index
  287. Index
  288. Index
  289. Index
  290. Index
  291. Index
  292. Index
  293. Index
  294. Index
  295. Index
  296. Index
  297. Index
  298. Index
  299. Index
  300. Index
  301. Index
  302. Index
  303. Index
  304. Index
  305. Index
  306. Index
  307. Updates
  308. About the Authors
  309. Copyright

Lab 20-3 Solutions

Short Answers

  1. Several strings that look like error messages (Error sending Http post, Error sending Http get, Error reading response, and so on) tell us that this program will be using HTTP GET and POST commands. We also see HTML paths (/srv.html, /put.html, and so on), which hint at the files that this malware will attempt to open.

  2. Several WS2_32 imports tell us that this program will be communicating over the network. An import to CreateProcess suggests that this program may launch another process.

  3. The function called at 0x4036F0 does not take any parameters other than the string, but ECX contains the this pointer for the object. We know the object that contains the function is an exception object because that object is later used as a parameter to the CxxThrowException functions. We can tell from the context that the function at 0x4036F0 initializes an exception object, which stores a string that describes what caused the exception.

  4. The six entries of the switch table implement six different backdoor commands: NOOP, sleep, execute a program, download a file, upload a file, and survey the victim.

  5. The program implements a backdoor that uses HTTP as the command channel and has the ability to launch programs, download or upload a file, and collect information about the victim machine.

Detailed Analysis

When we look at the program’s strings, we see several that look like error messages, as shown in Example C-214.

Example C-214. Abbreviated listing of strings from Lab20-03.exe

Encoding Args Error
Beacon response Error
Caught exception during pollstatus: %s
Polling error
Arg parsing error
Error uploading file
Error downloading file
Error conducting machine survey
Create Process Failed
Failed to gather victim information
Config error
Caught exception in main: %s
Socket Connection Error
Host lookup failed.
Send Data Error
Error reading response
Error sending Http get
Error sending Http post

These error messages provide excellent insight into the program’s functionality. These messages tell us that the malware probably does the following:

  • Uses HTTP POST and GET commands

  • Sends a beacon to a remote machine

  • Polls a remote server for some reason (probably for commands to execute)

  • Uploads files

  • Downloads files

  • Creates additional processes

  • Conducts a machine survey

With just the information from these strings, we can guess that this program is a backdoor that uses HTTP GET and POST commands for command and control. It looks like the program supports uploading files, downloading files, creating a new process, and surveying the victim’s computer.

When we open the program in IDA Pro, we see that its main method calls a function at 0x403BE0 and then returns. The function at 0x403BE0 contains the main program flow, so we will call it main2. It starts by creating a new object with the new operator and calling a function for the new object with config.dat as an argument to the function, as shown in Example C-215.

Example C-215. An object being created and used in main2

00403C03                 push    30h
00403C05                 mov     [ebp+var_4], ebx
00403C08                call    ??2@YAPAXI@Z    ; operator new(uint)
00403C0D                mov     ecx, eax
00403C0F                 add     esp, 4
00403C12                 mov     [ebp+var_14], ecx
00403C15                 cmp     ecx, ebx
00403C17                 mov     byte ptr [ebp+var_4], 1
00403C1B                 jz      short loc_403C2B
00403C1D                 push    offset FileName ; "config.dat"
00403C22                call    sub_401EE0
00403C27                 mov     esi, eax

IDA Pro labels the new operator at and returns a pointer to the new object in EAX. A pointer to the object is moved into ECX at , where it is used as the this pointer to the function call at . This tells us that the function sub_401EE0 is a member function of the class of the object created at . For now, we’ll call this object firstObject. Example C-216 shows how it’s used in sub_401EE0.

Example C-216. The first function being called on firstObject

00401EF7                mov     esi, ecx
00401EF9                 push    194h
00401EFE                call    ??2@YAPAXI@Z    ; operator new(uint)
00401F03                 add     esp, 4
00401F06                 mov     [esp+14h+var_10], eax
00401F0A                 test    eax, eax
00401F0C                 mov     [esp+14h+var_4], 0
00401F14                 jz      short loc_401F24
00401F16                 mov     ecx, [esp+14h+arg_0]
00401F1A                 push    ecx
00401F1B                 mov     ecx, eax
00401F1D                call    sub_403180

sub_401EE0 first stores the pointer to firstObject in ESI at , and then creates another new object at , which we’ll call secondObject. Then it calls a function of the secondObject at . We need to keep analyzing before we can determine the purpose of these objects, so we now look at sub_403180, as shown in Example C-217.

Example C-217. An exception being created and thrown

00403199                 push    offset FileName ; "config.dat"
0040319E                 mov     dword ptr [esi], offset off_41015C
004031A4                 mov     byte ptr [esi+18Ch], 4Eh
004031AB                call    ds:CreateFileA
004031B1                 mov     edi, eax
004031B3                 cmp     edi, 0FFFFFFFFh
004031B6                jnz     short loc_4031D5
004031B8                 push    offset aConfigError ; "Config error"
004031BD                lea     ecx, [esp+0BCh+var_AC]
004031C1                call    sub_4036F0
004031C6                 lea     eax, [esp+0B8h+var_AC]
004031CA                 push    offset unk_411560
004031CF                push    eax
004031D0                 call    __CxxThrowException@8 ; _CxxThrowException(x,x)

Based on the call to CreateFileA with the config.dat filename, we guess that this function reads the configuration file from disk, and we rename it setupConfig. The code in Example C-217 tries to open the config.dat file at . If the file is opened successfully, a jump is taken, and the remainder of the code in Example C-217 is skipped, as shown at . If the file is not opened successfully, we see the string Config error passed as an argument to the function at 0x4036F0 at .

The function at 0x4036F0 takes the strings as a parameter, but also uses ECX as the this pointer. A reference to the object used by the this pointer is stored on the stack at var_AC at . We later see that object passed to the CxxThrowException function at , which tells us that the function at 0x4036F0 is a member function of an exception object. Based on the context in which sub_4036F0 is called, we can assume that the function is initializing an exception with the string Config error.

It’s important to recognize the function call with an error string argument followed by a call to CxxThrowException because similar code consisting of an error string passed to a function followed by a call to CxxThrowException appears throughout this program. Each time we see this pattern, we can conclude that the function is initializing an exception, so we don’t need to waste time analyzing these functions.

If we continue analyzing the function at 0x403180, we realize that it reads data from the configuration file config.dat and stores it in secondObject. We can now conclude that secondObject is an object to store and read configuration information, and we rename it configObject.

Now we return to sub_401EE0 to see if we can better determine how firstObject is used. After creating the configObject object, sub_401EE0 stores a bunch of information in firstObject, as shown in Example C-218.

Example C-218. Data being stored in firstObject

00401F2A    mov    [esi], eax
00401F2C    mov    dword ptr [esi+10h], offset aIndex_html ; "/index.html"
00401F33    mov    dword ptr [esi+14h], offset aInfo_html ; "/info.html"
00401F3A    mov    dword ptr [esi+18h], offset aResponse_html ; "/response.html"
00401F41    mov    dword ptr [esi+1Ch], offset aGet_html ; "/get.html"
00401F48    mov    dword ptr [esi+20h], offset aPut_html ; "/put.html"
00401F4F    mov    dword ptr [esi+24h], offset aSrv_html ; "/srv.html"
00401F56    mov    dword ptr [esi+28h], 544F4349h
00401F5D    mov    dword ptr [esi+2Ch], 41534744h
00401F64    mov    eax, esi

First, eax is stored in firstObject, formerly a pointer to configObject. Next, we see a series of hard-coded URL paths, then two hard-coded integers, and then the function returns a pointer to firstObject. We still can’t be completely sure what firstObject does, but it appears to store all of the program’s global data, so we’ll rename this object globalDataObject for now, until we can learn enough to give it a better name.

We have now finished analyzing the first function called by main2. We have determined that it loads the configuration information from a file and initializes an object that stores the global data for the program. Having analyzed the first function that it calls, we can now return to main2. The remainder of main2 is shown in Example C-219.

Example C-219. Beacon and poll commands in the main2 function

00403C2D                mov     ecx, esi
00403C2F                 mov     byte ptr [ebp+var_4], bl
00403C32                 call    sub_401F80
00403C37                 mov     edi, ds:Sleep
00403C3D loc_403C3D:
00403C3D                 mov     eax, [esi]
00403C3F                 mov     eax, [eax+190h]
00403C45                 lea     eax, [eax+eax*4]
00403C48                 lea     eax, [eax+eax*4]
00403C4B                 lea     ecx, [eax+eax*4]
00403C4E                 shl     ecx, 2
00403C51                 push    ecx             ; dwMilliseconds
00403C52                 call    edi ; Sleep
00403C54                mov     ecx, esi
00403C56                 call    loc_402410
00403C5B                 inc     ebx
00403C5C                 jmp     short loc_403C3D

We see that this function calls sub_401F80 outside the loop, and then it calls sub_402410 and the Sleep function inside an infinite loop. From what we know about the program from the strings, we could guess that sub_401F80 sends a beacon to the remote machine and that sub_402410 polls the remote server. We’ll rename those functions maybe_beacon and maybe_poll. We see that maybe_beacon and maybe_poll are both passed our globalDataObject in the ECX pointer (at and ), and that they are member functions of what we’ve called globalDataObject. Based on this realization, we’ll rename our object mainObject.

First, we’ll analyze maybe_beacon. We see that it creates another new object and calls sub_403D50, as shown in Example C-220.

Example C-220. First function call in the maybe_beacon function

00401FC8                 mov    eax, [esi]
00401FCA                 mov    edx, [eax+144h]
00401FD0                 add    eax, 104h
00401FD5                 push    edx             ; hostshort
00401FD6                 push    eax             ; char *
00401FD7                 call    sub_403D50

We see that IDA Pro has labeled some of the arguments to sub_403D50 because it knows they will be used as parameters to imported functions later. The most telling of these is hostshort, which tells us that it will be used as a parameter to the networking function htons. The values for these parameters are retrieved from our mainObject, which was stored in ESI.

We see that ESI is dereferenced at to obtain a pointer to configObject, which is stored at offset 0 in the mainObject. Next, the hostshort is retrieved at an offset of +144 into configObject at , and char * is stored within configObject at offset 0x248 at (0x104 + 0x144). This level of indirection is common in C++ programs. In a C program, these values would be stored as global data with offsets that are labeled and tracked by IDA Pro, but in C++ they are stored as offsets into objects that are harder to track.

In order to determine the data that will be pushed onto the stack, we would need to go back to the function that initializes configObject to see what is stored at offsets 0x144 and 0x248. In practice, it’s often easier to use dynamic analysis to determine those values, but without access to the command-and-control server, you may need to go back to configObject.

Looking at sub_403D50, we see that it calls htons, socket, and connect to establish a connection to a remote socket. maybe_beacon then calls sub_402FF0, which contains the code shown in Example C-221.

Example C-221. Beginning of the victim survey function

0040301C    call    ds:GetComputerNameA
00403022    test    eax, eax
00403024    jnz     short loc_403043
00403026    push    offset aErrorConductin ; "Error conducting machine survey"
0040302B    lea     ecx, [esp+40h+var_1C]
0040302F    call    sub_403910
00403034    lea     eax, [esp+3Ch+var_1C]
00403038    push    offset unk_411150
0040303D    push    eax
0040303E    call    __CxxThrowException@8 ; _CxxThrowException(x,x)

We see from this code that the function is trying to obtain the computer’s hostname. If it fails to do so, it throws an exception with the error message “Error conducting machine survey.” This tells us that this function is conducting a survey of the victim’s machine.

The remainder of sub_402FF0 shows the malware gathering additional victim information. We can now rename sub_402FF0 to surveyVictim and move on.

Next, we analyze the function called by maybe_beacon, which calls sub_404ED0. From the error message, we can see that sub_404ED0 does an HTTP POST to the remote server. maybe_beacon then calls sub_404B10, which from the error messages we can see is checking the beacon response. Without going into too much detail, we can tell that maybe_beacon is, in fact, the beacon function and that it expects a specific beacon response in order for the program to continue running.

We return to main2 to check the maybe_poll (0x402410) function. We see that its first call is to sub_403D50, which we analyzed earlier and know initializes a connection to the command-and-control server. The maybe_poll function then calls sub_404CF0, which sends an HTTP GET in order to retrieve information from the remote server. It then calls sub_404B10, which retrieves the server’s response to the HTTP GET request. We then see two blocks of code that raise an exception if the response doesn’t meet certain formatting criteria.

Next, we come across a switch statement with six options, as shown in Example C-222.

Example C-222. switch statements inside the maybe_poll function

0040251F                 mov     al, [esi+4]
00402522                 add     eax, -61h       ; switch 6 cases
00402525                 cmp     eax, 5
00402528                 ja      short loc_40257D ; default
0040252A                 jmp     ds:off_4025C8[eax*4] ; switch jump

The value used for the switch decision is stored in [esi+4]. That value is then stored in EAX, and 0x61 is subtracted from it. If the value is not lower than five, none of the switch jumps are taken. This ensures that the value is between 0x61 and 0x66 (which represents ASCII characters a through f). 0x61 less than the value is then used as an offset into the switch table. IDA Pro has recognized and labeled the switch table.

We click off_4025C8, which takes us to the six possible locations that we need to analyze. We’ll label these case_1 through case_6 and analyze them one at a time:

  • case_1 calls the delete operator and then immediately returns without actually doing anything. We’ll rename this case_doNothing.

  • case_2 calls atoi to parse a string into a number, and then calls the sleep function before returning. We’ll rename it case_sleep.

  • case_3 does some string parsing, and then calls CreateProcess. We’ll rename it case_ExecuteCommand.

  • case_4 calls CreateFile and writes the HTTP response received from the command-and-control server to disk. We’ll rename it case_downloadFile.

  • case_5 also calls CreateFile, but it uploads the data from the file to the remote server using an HTTP POST command. We’ll rename it case_uploadFile.

  • case_6 calls GetComputerName, GetUserName, GetVersionEx, and GetDefaultLCID, which together perform a survey of the victim’s machine and send the results back to the command-and-control server.

Overall, we have a backdoor program that reads a configuration file that determines the command-and-control server, sends a beacon to the command-and-control server, and implements several different functions based on the response from the command-and-control server.