Table of Contents for
Practical Malware Analysis

Version ebook / Retour

Cover image for bash Cookbook, 2nd Edition Practical Malware Analysis by Andrew Honig Published by No Starch Press, 2012
  1. Cover
  2. Practical Malware Analysis: The Hands-On Guide to Dissecting Malicious Software
  3. Praise for Practical Malware Analysis
  4. Warning
  5. About the Authors
  6. About the Technical Reviewer
  7. About the Contributing Authors
  8. Foreword
  9. Acknowledgments
  10. Individual Thanks
  11. Introduction
  12. What Is Malware Analysis?
  13. Prerequisites
  14. Practical, Hands-On Learning
  15. What’s in the Book?
  16. 0. Malware Analysis Primer
  17. The Goals of Malware Analysis
  18. Malware Analysis Techniques
  19. Types of Malware
  20. General Rules for Malware Analysis
  21. I. Basic Analysis
  22. 1. Basic Static Techniques
  23. Antivirus Scanning: A Useful First Step
  24. Hashing: A Fingerprint for Malware
  25. Finding Strings
  26. Packed and Obfuscated Malware
  27. Portable Executable File Format
  28. Linked Libraries and Functions
  29. Static Analysis in Practice
  30. The PE File Headers and Sections
  31. Conclusion
  32. Labs
  33. 2. Malware Analysis in Virtual Machines
  34. The Structure of a Virtual Machine
  35. Creating Your Malware Analysis Machine
  36. Using Your Malware Analysis Machine
  37. The Risks of Using VMware for Malware Analysis
  38. Record/Replay: Running Your Computer in Reverse
  39. Conclusion
  40. 3. Basic Dynamic Analysis
  41. Sandboxes: The Quick-and-Dirty Approach
  42. Running Malware
  43. Monitoring with Process Monitor
  44. Viewing Processes with Process Explorer
  45. Comparing Registry Snapshots with Regshot
  46. Faking a Network
  47. Packet Sniffing with Wireshark
  48. Using INetSim
  49. Basic Dynamic Tools in Practice
  50. Conclusion
  51. Labs
  52. II. Advanced Static Analysis
  53. 4. A Crash Course in x86 Disassembly
  54. Levels of Abstraction
  55. Reverse-Engineering
  56. The x86 Architecture
  57. Conclusion
  58. 5. IDA Pro
  59. Loading an Executable
  60. The IDA Pro Interface
  61. Using Cross-References
  62. Analyzing Functions
  63. Using Graphing Options
  64. Enhancing Disassembly
  65. Extending IDA with Plug-ins
  66. Conclusion
  67. Labs
  68. 6. Recognizing C Code Constructs in Assembly
  69. Global vs. Local Variables
  70. Disassembling Arithmetic Operations
  71. Recognizing if Statements
  72. Recognizing Loops
  73. Understanding Function Call Conventions
  74. Analyzing switch Statements
  75. Disassembling Arrays
  76. Identifying Structs
  77. Analyzing Linked List Traversal
  78. Conclusion
  79. Labs
  80. 7. Analyzing Malicious Windows Programs
  81. The Windows API
  82. The Windows Registry
  83. Networking APIs
  84. Following Running Malware
  85. Kernel vs. User Mode
  86. The Native API
  87. Conclusion
  88. Labs
  89. III. Advanced Dynamic Analysis
  90. 8. Debugging
  91. Source-Level vs. Assembly-Level Debuggers
  92. Kernel vs. User-Mode Debugging
  93. Using a Debugger
  94. Exceptions
  95. Modifying Execution with a Debugger
  96. Modifying Program Execution in Practice
  97. Conclusion
  98. 9. OllyDbg
  99. Loading Malware
  100. The OllyDbg Interface
  101. Memory Map
  102. Viewing Threads and Stacks
  103. Executing Code
  104. Breakpoints
  105. Loading DLLs
  106. Tracing
  107. Exception Handling
  108. Patching
  109. Analyzing Shellcode
  110. Assistance Features
  111. Plug-ins
  112. Scriptable Debugging
  113. Conclusion
  114. Labs
  115. 10. Kernel Debugging with WinDbg
  116. Drivers and Kernel Code
  117. Setting Up Kernel Debugging
  118. Using WinDbg
  119. Microsoft Symbols
  120. Kernel Debugging in Practice
  121. Rootkits
  122. Loading Drivers
  123. Kernel Issues for Windows Vista, Windows 7, and x64 Versions
  124. Conclusion
  125. Labs
  126. IV. Malware Functionality
  127. 11. Malware Behavior
  128. Downloaders and Launchers
  129. Backdoors
  130. Credential Stealers
  131. Persistence Mechanisms
  132. Privilege Escalation
  133. Covering Its Tracks—User-Mode Rootkits
  134. Conclusion
  135. Labs
  136. 12. Covert Malware Launching
  137. Launchers
  138. Process Injection
  139. Process Replacement
  140. Hook Injection
  141. Detours
  142. APC Injection
  143. Conclusion
  144. Labs
  145. 13. Data Encoding
  146. The Goal of Analyzing Encoding Algorithms
  147. Simple Ciphers
  148. Common Cryptographic Algorithms
  149. Custom Encoding
  150. Decoding
  151. Conclusion
  152. Labs
  153. 14. Malware-Focused Network Signatures
  154. Network Countermeasures
  155. Safely Investigate an Attacker Online
  156. Content-Based Network Countermeasures
  157. Combining Dynamic and Static Analysis Techniques
  158. Understanding the Attacker’s Perspective
  159. Conclusion
  160. Labs
  161. V. Anti-Reverse-Engineering
  162. 15. Anti-Disassembly
  163. Understanding Anti-Disassembly
  164. Defeating Disassembly Algorithms
  165. Anti-Disassembly Techniques
  166. Obscuring Flow Control
  167. Thwarting Stack-Frame Analysis
  168. Conclusion
  169. Labs
  170. 16. Anti-Debugging
  171. Windows Debugger Detection
  172. Identifying Debugger Behavior
  173. Interfering with Debugger Functionality
  174. Debugger Vulnerabilities
  175. Conclusion
  176. Labs
  177. 17. Anti-Virtual Machine Techniques
  178. VMware Artifacts
  179. Vulnerable Instructions
  180. Tweaking Settings
  181. Escaping the Virtual Machine
  182. Conclusion
  183. Labs
  184. 18. Packers and Unpacking
  185. Packer Anatomy
  186. Identifying Packed Programs
  187. Unpacking Options
  188. Automated Unpacking
  189. Manual Unpacking
  190. Tips and Tricks for Common Packers
  191. Analyzing Without Fully Unpacking
  192. Packed DLLs
  193. Conclusion
  194. Labs
  195. VI. Special Topics
  196. 19. Shellcode Analysis
  197. Loading Shellcode for Analysis
  198. Position-Independent Code
  199. Identifying Execution Location
  200. Manual Symbol Resolution
  201. A Full Hello World Example
  202. Shellcode Encodings
  203. NOP Sleds
  204. Finding Shellcode
  205. Conclusion
  206. Labs
  207. 20. C++ Analysis
  208. Object-Oriented Programming
  209. Virtual vs. Nonvirtual Functions
  210. Creating and Destroying Objects
  211. Conclusion
  212. Labs
  213. 21. 64-Bit Malware
  214. Why 64-Bit Malware?
  215. Differences in x64 Architecture
  216. Windows 32-Bit on Windows 64-Bit
  217. 64-Bit Hints at Malware Functionality
  218. Conclusion
  219. Labs
  220. A. Important Windows Functions
  221. B. Tools for Malware Analysis
  222. C. Solutions to Labs
  223. Lab 1-1 Solutions
  224. Lab 1-2 Solutions
  225. Lab 1-3 Solutions
  226. Lab 1-4 Solutions
  227. Lab 3-1 Solutions
  228. Lab 3-2 Solutions
  229. Lab 3-3 Solutions
  230. Lab 3-4 Solutions
  231. Lab 5-1 Solutions
  232. Lab 6-1 Solutions
  233. Lab 6-2 Solutions
  234. Lab 6-3 Solutions
  235. Lab 6-4 Solutions
  236. Lab 7-1 Solutions
  237. Lab 7-2 Solutions
  238. Lab 7-3 Solutions
  239. Lab 9-1 Solutions
  240. Lab 9-2 Solutions
  241. Lab 9-3 Solutions
  242. Lab 10-1 Solutions
  243. Lab 10-2 Solutions
  244. Lab 10-3 Solutions
  245. Lab 11-1 Solutions
  246. Lab 11-2 Solutions
  247. Lab 11-3 Solutions
  248. Lab 12-1 Solutions
  249. Lab 12-2 Solutions
  250. Lab 12-3 Solutions
  251. Lab 12-4 Solutions
  252. Lab 13-1 Solutions
  253. Lab 13-2 Solutions
  254. Lab 13-3 Solutions
  255. Lab 14-1 Solutions
  256. Lab 14-2 Solutions
  257. Lab 14-3 Solutions
  258. Lab 15-1 Solutions
  259. Lab 15-2 Solutions
  260. Lab 15-3 Solutions
  261. Lab 16-1 Solutions
  262. Lab 16-2 Solutions
  263. Lab 16-3 Solutions
  264. Lab 17-1 Solutions
  265. Lab 17-2 Solutions
  266. Lab 17-3 Solutions
  267. Lab 18-1 Solutions
  268. Lab 18-2 Solutions
  269. Lab 18-3 Solutions
  270. Lab 18-4 Solutions
  271. Lab 18-5 Solutions
  272. Lab 19-1 Solutions
  273. Lab 19-2 Solutions
  274. Lab 19-3 Solutions
  275. Lab 20-1 Solutions
  276. Lab 20-2 Solutions
  277. Lab 20-3 Solutions
  278. Lab 21-1 Solutions
  279. Lab 21-2 Solutions
  280. Index
  281. Index
  282. Index
  283. Index
  284. Index
  285. Index
  286. Index
  287. Index
  288. Index
  289. Index
  290. Index
  291. Index
  292. Index
  293. Index
  294. Index
  295. Index
  296. Index
  297. Index
  298. Index
  299. Index
  300. Index
  301. Index
  302. Index
  303. Index
  304. Index
  305. Index
  306. Index
  307. Updates
  308. About the Authors
  309. Copyright

Lab 5-1 Solutions

Short Answers

  1. DllMain is found at 0x1000D02E in the .text section.

  2. The import for gethostbyname is found at 0x100163CC in the .idata section.

  3. The gethostbyname import is called nine times by five different functions throughout the malware.

  4. A DNS request for pics.practicalmalwareanalysis.com will be made by the malware if the call to gethostbyname at 0x10001757 succeeds.

  5. IDA Pro has recognized 23 local variables for the function at 0x10001656.

  6. IDA Pro has recognized one parameter for the function at 0x10001656.

  7. The string \cmd.exe /c is located at 0x10095B34.

  8. That area of code appears to be creating a remote shell session for the attacker.

  9. The OS version is stored in the global variable dword_1008E5C4.

  10. The registry values located at HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\WorkTime and WorkTimes are queried and sent over the remote shell connection.

  11. The PSLIST export sends a process listing across the network or finds a particular process name in the listing and gets information about it.

  12. GetSystemDefaultLangID, send, and sprintf are API calls made from sub_10004E79. This function could be renamed to something useful like GetSystemLanguage.

  13. DllMain calls strncpy, strnicmp, CreateThread, and strlen directly. At a depth of 2, it calls a variety of API calls, including Sleep, WinExec, gethostbyname, and many other networking function calls.

  14. The malware will sleep for 30 seconds.

  15. The arguments are 6, 1, and 2.

  16. These arguments correspond to three symbolic constants: IPPROTO_TCP, SOCK_STREAM, and AF_INET.

  17. The in instruction is used for virtual machine detection at 0x100061DB, and the 0x564D5868h corresponds to the VMXh string. Using the cross-reference, we see the string Found Virtual Machine in the caller function.

  18. Random data appears to exist at 0x1001D988.

  19. If you run Lab05-01.py, the random data is unobfuscated to reveal a string.

  20. By pressing the A key on the keyboard, we can turn this into the readable string: xdoor is this backdoor, string decoded for Practical Malware Analysis Lab :)1234.

  21. The script works by XOR’ing 0x50 bytes of data with 0x55 and modifying the bytes in IDA Pro using PatchByte.

Detailed Analysis

Once we load the malicious DLL into IDA Pro, we are taken directly to DllMain at 0x1000D02E. (You may need to display line numbers in the graph view by using OptionsGeneral and checking Line Prefixes, or you can toggle between the graph and traditional view by pressing the spacebar, which allows you to see the line numbers without changing the options.) DllMain is where we want to begin analysis, because all code that executes from the DllEntryPoint until DllMain has likely been generated by the compiler, and we don’t want to get bogged down analyzing compiler-generated code.

To answer questions 2 through 4, we begin by viewing the imports of this DLL, by selecting View ▸ Open Subviews ▸ Imports. In this list, we find gethostbyname and double-click it to see it in the disassembly. The gethostbyname import resides at location 0x100163CC in the .idata section of the binary.

To see the number of functions that call gethostbyname, we check its cross-references by pressing CTRL-X with the cursor on gethostbyname, which brings up the window shown in Figure C-12. The text “Line 1 of 18” at the bottom of the window tells us that there are nine cross-references for gethostbyname. Some versions of IDA Pro double-count cross-references: p is a reference because it is being called, and r is a reference because it is a “read” reference (since it is call dword ptr [...] for an import, the CPU must read the import and then call into it). Examining the cross-reference list closely, you can see that gethostbyname is called by five separate functions.

Cross-references to gethostbyname

Figure C-12. Cross-references to gethostbyname

We press G on the keyboard to quickly navigate to 0x10001757. Once at this location, we see the following code, which calls gethostbyname.

1000174E         mov     eax, off_10019040
10001753         add     eax, 0Dh 
10001756         push    eax
10001757         call    ds:gethostbyname

The gethostbyname method takes a single parameter—typically, a string containing a domain name. Therefore, we need to work backward and figure out what is in EAX when gethostbyname is called. It appears that off_10019040 is moved into EAX. If we double-click that offset, we see the string [This is RDO]pics.practicalmalwareanalysis.com at that location.

As you can see at , the pointer into the string is advanced by 0xD bytes, which gets a pointer to the string pics.practicalmalwareanalysis.com in EAX for the call to gethostbyname. Figure C-13 shows the string in memory, and how adding 0xD to EAX advances the pointer to the location of the URL in memory. The call will perform a DNS request to get an IP address for the domain.

Adjustment of the string pointer to access the URL

Figure C-13. Adjustment of the string pointer to access the URL

To answer questions 5 and 6, we press G on the keyboard to navigate to 0x10001656 in order to analyze sub_10001656. In Figure C-14, we see what IDA Pro has done to recognize and label the function’s local variables and parameters. The labeled local variables correspond to negative offsets, and we count 23 of them, most of which are prepended with var_. The freeware version of IDA Pro counts only 20 local variables, so the version you are using may detect a slightly different number of local variables. The parameters are labeled and referenced with positive offsets, and we see that IDA Pro has recognized one parameter for the function labeled arg_0.

IDA Pro function layout—recognizing local variables and parameters

Figure C-14. IDA Pro function layout—recognizing local variables and parameters

To answer questions 7 through 10, we begin by viewing the strings for this DLL by selecting View ▸ Open Subviews ▸ Strings. In this list, double-click \cmd.exe /c to see it in the disassembly. Notice that the string resides in the xdoors_d section of the PE file at 0x10095B34. On checking the cross-references to this string, we see that there is only one at 0x100101D0, where this string is pushed onto the stack.

Examining the graph view of this function shows a series of memcmp functions that are comparing strings such as cd, exit, install, inject, and uptime. We also see that the string reference earlier in the function at 0x1001009D contains the string This Remote Shell Session. Examining the function and the calls it makes shows a series of calls to recv and send. Using these three pieces of evidence, we can guess that we are looking at a remote shell session function.

The dword_1008E5C4 is a global variable that we can double-click (at 0x100101C8) to show its location in memory at 0x1008E5C4, within the .data section of the DLL. Checking the cross-references by pressing CTRL-X shows that it is referenced three times, but only one reference modifies dword_1008E5C4. The following listing shows how dword_1008E5C4 is modified.

10001673        call    sub_10003695
10001678        mov     dword_1008E5C4, eax

We see that EAX is moved into dword_1008E5C4, and that EAX is the return value from the function call made in the previous instruction. Therefore, we need to determine what that function returns. To do so, we examine sub_10003695 by double-clicking it and looking at the disassembly. The sub_10003695 function contains a call to GetVersionEx, which obtains information about the current version of the OS, as shown in the following listing.

100036AF        call    ds:GetVersionExA
100036B5        xor     eax, eax
100036B7        cmp     [ebp+VersionInformation.dwPlatformId], 2
100036BE        setz    al

The dwPlatformId is compared to the number 2 in order to determine how to set the AL register. AL will be set if the PlatformId is VER_PLATFORM_WIN32_NT. This is just a simple check to make sure that the OS is Windows 2000 or higher, and we can conclude that the global variable will typically be set to 1.

As previously discussed, the remote shell function at 0x1000FF58 contains a series of memcmp functions starting at 0x1000FF58. At 0x10010452, we see the memcmp with robotwork, as follows:

10010444         push    9                       ; Size
10010446         lea     eax, [ebp+Dst]
1001044C         push    offset aRobotwork       ; "robotwork"
10010451         push    eax                     ; Buf1
10010452         call    memcmp
10010457         add     esp, 0Ch
1001045A         test    eax, eax
1001045C         jnz     short loc_10010468 
1001045E         push    [ebp+s]                 ; s
10010461         call    sub_100052A2 

The jnz at will not be taken if the string matches robotwork, and the call at will be called. Examining sub_100052A2, we see that it queries the registry at HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\WorkTime and WorkTimes, and then returns this information over the network socket that was passed to the function at .

To answer question 11, we begin by viewing the exports for this DLL by selecting View ▸ Open Subviews ▸ Exports. We find PSLIST in this list and double-click it to move the cursor to 0x10007025, the start of the export’s code. This function appears to take one of two paths, depending on the result of sub_100036C3. The sub_100036C3 function checks to see if the OS version is Windows Vista/7 or XP/2003/2000. Both code paths use CreateToolhelp32Snapshot to help them grab a process listing, which we infer from the strings and API calls. Both code paths return the process listing over the socket using send.

To answer questions 12 and 13, we graph a function’s cross-references by selecting View ▸ Graphs ▸ Xrefs From when the cursor is on the function name of interest. We go to sub_10004E79 by pressing G on the keyboard and entering 0x10004E79.

Figure C-15 shows the result of graphing the cross-references for sub_10004E79. We see that this function calls GetSystemDefaultLangID and send. This information tells us that the function likely sends the language identifier over a network socket, so we can right-click the function name and give it a more meaningful name, such as send_languageID.

Note

Performing a quick analysis like this is an easy way to get a high-level overview of a binary. This approach is particularly handy when analyzing large binaries.

Graph of cross-references from sub_10004E79

Figure C-15. Graph of cross-references from sub_10004E79

To determine how many Windows API functions DllMain calls directly, we scroll through the method and look for API calls, or select ViewGraphs ▸ User Xrefs Chart to open the dialog shown in Figure C-16.

The start and end address should correspond to the start of DllMain—specifically, 0x1000D02E. Because we care only about the cross-references from DllMain, we select a recursion depth of 1 to display only the functions that DllMain calls directly. Figure C-17 shows the resulting graph. (The API calls are seen in gray.) To see all functions called at a recursive depth of 2, follow the same steps and select a recursion depth of 2. The result will be a much larger graph, which even shows a recursive call back to DllMain.

Dialog for setting a custom cross-reference graph from 0x1000D02E

Figure C-16. Dialog for setting a custom cross-reference graph from 0x1000D02E

Cross-reference graph for DllMain with a recursive depth of 1

Figure C-17. Cross-reference graph for DllMain with a recursive depth of 1

As referenced in question 14, there is a call to Sleep at 0x10001358, as shown in the following listing. Sleep takes one parameter—the number of milliseconds to sleep—and we see it pushed on the stack as EAX.

10001341         mov     eax, off_10019020
10001346         add     eax, 0Dh
10001349         push    eax     ; Str
1000134A         call    ds:atoi
10001350         imul    eax, 3E8h
10001356         pop     ecx
10001357         push    eax     ; dwMilliseconds
10001358         call    ds:Sleep

Working backward, it looks like EAX is multiplied by 0x3E8 (or 1000 in decimal), which tells us that the result of the call to atoi is multiplied by 1000 to get the number of seconds to sleep. Again working backward, we also see that off_10019020 is moved into EAX. We can see what is at the offset by double-clicking it. This is a reference to the string [This is CTI]30.

Next, we see that 0xD is added to the offset, which causes EAX to point to 30 for the call to atoi, which will convert the string 30 into the number 30. Multiplying 30 by 1000, we get 30,000 milliseconds (30 seconds), and that is how long this program will sleep if the strings are the same upon execution.

As referenced in question 15, a call to socket at 0x10001701 is shown in the left column of Table C-1. We see that 6, 1, and 2 are pushed onto the stack. These numbers correspond to symbolic constants that are described on the MSDN page for socket. Right-clicking each of the numbers and selecting Use Symbolic Constant presents a dialog listing all of the constants that IDA Pro has for a particular value. In this example, the number 2 corresponds to AF_INET, which is used for setting up an IPv4 socket; 1 stands for SOCK_STREAM, and 6 stands for IPPROTO_TCP. Therefore, this socket will be configured for TCP over IPv4 (commonly used for HTTP).

Table C-1. Applying Symbolic Constants for a Call to socket

Before symbolic constants

After symbolic constants

100016FB   push  6
100016FD   push  1
100016FF   push  2
10001701   call  ds:socket
100016FB   push  IPPROTO_TCP
100016FD   push  SOCK_STREAM
100016FF   push  AF_INET
10001701   call  ds:socket

To answer question 17, we search for the in instruction by selecting Search ▸ Text and entering in (we could also select Search ▸ Sequence of Bytes and searching for ED, the opcode for the in instruction). If we check Find All Occurrences in the search dialog, either option will present a new window listing all matches. Scrolling through the results shows only one instance of the in instruction at 0x100061DB, as follows:

100061C7         mov     eax, 564D5868h ; "VMXh"
100061CC         mov     ebx, 0
100061D1         mov     ecx, 0Ah
100061D6         mov     edx, 5658h
100061DB         in      eax, dx

The mov instruction at 0x100061C7 moves 0x564D5868 into EAX. Right-clicking this value shows that it corresponds to the ASCII string VMXh, which confirms that this snippet of code is an anti-virtual machine technique being employed by the malware. (We discuss the specifics of this technique and others in Chapter 17.) Checking the cross-references to the function that executes this technique offers further confirmation when we see Found Virtual Machine in the code after a comparison.

As referenced by question 18, we jump our cursor to 0x1001D988 using the G key. Here, we see what looks like random bytes of data and nothing readable. As suggested, we run the Python script provided by selecting File ▸ Script File and selecting the Python script, shown in the following listing.

sea = ScreenEA() 

for i in range(0x00,0x50):
     b = Byte(sea+i)
     decoded_byte = b ^ 0x55 
     PatchByte(sea+i,decoded_byte)

At , the script grabs the current location of the cursor, for use as an offset to decode the data. Next, it loops from 0 to 0x50 and grabs the value of each byte using the call to Byte. It takes each byte and XORs it with 0x55 at . Finally, it patches the byte in the IDA Pro display without modifying the original file. You can easily customize this script for your own use.

After the script runs, we see that the data at 0x1001D988 has been changed to something more readable. We can turn this into an ASCII string by pressing the A key on the keyboard with the cursor at 0x1001D988. This reveals the string xdoor is this backdoor, string decoded for Practical Malware Analysis Lab :)1234.