Table of Contents for
Practical Malware Analysis

Version ebook / Retour

Cover image for bash Cookbook, 2nd Edition Practical Malware Analysis by Andrew Honig Published by No Starch Press, 2012
  1. Cover
  2. Practical Malware Analysis: The Hands-On Guide to Dissecting Malicious Software
  3. Praise for Practical Malware Analysis
  4. Warning
  5. About the Authors
  6. About the Technical Reviewer
  7. About the Contributing Authors
  8. Foreword
  9. Acknowledgments
  10. Individual Thanks
  11. Introduction
  12. What Is Malware Analysis?
  13. Prerequisites
  14. Practical, Hands-On Learning
  15. What’s in the Book?
  16. 0. Malware Analysis Primer
  17. The Goals of Malware Analysis
  18. Malware Analysis Techniques
  19. Types of Malware
  20. General Rules for Malware Analysis
  21. I. Basic Analysis
  22. 1. Basic Static Techniques
  23. Antivirus Scanning: A Useful First Step
  24. Hashing: A Fingerprint for Malware
  25. Finding Strings
  26. Packed and Obfuscated Malware
  27. Portable Executable File Format
  28. Linked Libraries and Functions
  29. Static Analysis in Practice
  30. The PE File Headers and Sections
  31. Conclusion
  32. Labs
  33. 2. Malware Analysis in Virtual Machines
  34. The Structure of a Virtual Machine
  35. Creating Your Malware Analysis Machine
  36. Using Your Malware Analysis Machine
  37. The Risks of Using VMware for Malware Analysis
  38. Record/Replay: Running Your Computer in Reverse
  39. Conclusion
  40. 3. Basic Dynamic Analysis
  41. Sandboxes: The Quick-and-Dirty Approach
  42. Running Malware
  43. Monitoring with Process Monitor
  44. Viewing Processes with Process Explorer
  45. Comparing Registry Snapshots with Regshot
  46. Faking a Network
  47. Packet Sniffing with Wireshark
  48. Using INetSim
  49. Basic Dynamic Tools in Practice
  50. Conclusion
  51. Labs
  52. II. Advanced Static Analysis
  53. 4. A Crash Course in x86 Disassembly
  54. Levels of Abstraction
  55. Reverse-Engineering
  56. The x86 Architecture
  57. Conclusion
  58. 5. IDA Pro
  59. Loading an Executable
  60. The IDA Pro Interface
  61. Using Cross-References
  62. Analyzing Functions
  63. Using Graphing Options
  64. Enhancing Disassembly
  65. Extending IDA with Plug-ins
  66. Conclusion
  67. Labs
  68. 6. Recognizing C Code Constructs in Assembly
  69. Global vs. Local Variables
  70. Disassembling Arithmetic Operations
  71. Recognizing if Statements
  72. Recognizing Loops
  73. Understanding Function Call Conventions
  74. Analyzing switch Statements
  75. Disassembling Arrays
  76. Identifying Structs
  77. Analyzing Linked List Traversal
  78. Conclusion
  79. Labs
  80. 7. Analyzing Malicious Windows Programs
  81. The Windows API
  82. The Windows Registry
  83. Networking APIs
  84. Following Running Malware
  85. Kernel vs. User Mode
  86. The Native API
  87. Conclusion
  88. Labs
  89. III. Advanced Dynamic Analysis
  90. 8. Debugging
  91. Source-Level vs. Assembly-Level Debuggers
  92. Kernel vs. User-Mode Debugging
  93. Using a Debugger
  94. Exceptions
  95. Modifying Execution with a Debugger
  96. Modifying Program Execution in Practice
  97. Conclusion
  98. 9. OllyDbg
  99. Loading Malware
  100. The OllyDbg Interface
  101. Memory Map
  102. Viewing Threads and Stacks
  103. Executing Code
  104. Breakpoints
  105. Loading DLLs
  106. Tracing
  107. Exception Handling
  108. Patching
  109. Analyzing Shellcode
  110. Assistance Features
  111. Plug-ins
  112. Scriptable Debugging
  113. Conclusion
  114. Labs
  115. 10. Kernel Debugging with WinDbg
  116. Drivers and Kernel Code
  117. Setting Up Kernel Debugging
  118. Using WinDbg
  119. Microsoft Symbols
  120. Kernel Debugging in Practice
  121. Rootkits
  122. Loading Drivers
  123. Kernel Issues for Windows Vista, Windows 7, and x64 Versions
  124. Conclusion
  125. Labs
  126. IV. Malware Functionality
  127. 11. Malware Behavior
  128. Downloaders and Launchers
  129. Backdoors
  130. Credential Stealers
  131. Persistence Mechanisms
  132. Privilege Escalation
  133. Covering Its Tracks—User-Mode Rootkits
  134. Conclusion
  135. Labs
  136. 12. Covert Malware Launching
  137. Launchers
  138. Process Injection
  139. Process Replacement
  140. Hook Injection
  141. Detours
  142. APC Injection
  143. Conclusion
  144. Labs
  145. 13. Data Encoding
  146. The Goal of Analyzing Encoding Algorithms
  147. Simple Ciphers
  148. Common Cryptographic Algorithms
  149. Custom Encoding
  150. Decoding
  151. Conclusion
  152. Labs
  153. 14. Malware-Focused Network Signatures
  154. Network Countermeasures
  155. Safely Investigate an Attacker Online
  156. Content-Based Network Countermeasures
  157. Combining Dynamic and Static Analysis Techniques
  158. Understanding the Attacker’s Perspective
  159. Conclusion
  160. Labs
  161. V. Anti-Reverse-Engineering
  162. 15. Anti-Disassembly
  163. Understanding Anti-Disassembly
  164. Defeating Disassembly Algorithms
  165. Anti-Disassembly Techniques
  166. Obscuring Flow Control
  167. Thwarting Stack-Frame Analysis
  168. Conclusion
  169. Labs
  170. 16. Anti-Debugging
  171. Windows Debugger Detection
  172. Identifying Debugger Behavior
  173. Interfering with Debugger Functionality
  174. Debugger Vulnerabilities
  175. Conclusion
  176. Labs
  177. 17. Anti-Virtual Machine Techniques
  178. VMware Artifacts
  179. Vulnerable Instructions
  180. Tweaking Settings
  181. Escaping the Virtual Machine
  182. Conclusion
  183. Labs
  184. 18. Packers and Unpacking
  185. Packer Anatomy
  186. Identifying Packed Programs
  187. Unpacking Options
  188. Automated Unpacking
  189. Manual Unpacking
  190. Tips and Tricks for Common Packers
  191. Analyzing Without Fully Unpacking
  192. Packed DLLs
  193. Conclusion
  194. Labs
  195. VI. Special Topics
  196. 19. Shellcode Analysis
  197. Loading Shellcode for Analysis
  198. Position-Independent Code
  199. Identifying Execution Location
  200. Manual Symbol Resolution
  201. A Full Hello World Example
  202. Shellcode Encodings
  203. NOP Sleds
  204. Finding Shellcode
  205. Conclusion
  206. Labs
  207. 20. C++ Analysis
  208. Object-Oriented Programming
  209. Virtual vs. Nonvirtual Functions
  210. Creating and Destroying Objects
  211. Conclusion
  212. Labs
  213. 21. 64-Bit Malware
  214. Why 64-Bit Malware?
  215. Differences in x64 Architecture
  216. Windows 32-Bit on Windows 64-Bit
  217. 64-Bit Hints at Malware Functionality
  218. Conclusion
  219. Labs
  220. A. Important Windows Functions
  221. B. Tools for Malware Analysis
  222. C. Solutions to Labs
  223. Lab 1-1 Solutions
  224. Lab 1-2 Solutions
  225. Lab 1-3 Solutions
  226. Lab 1-4 Solutions
  227. Lab 3-1 Solutions
  228. Lab 3-2 Solutions
  229. Lab 3-3 Solutions
  230. Lab 3-4 Solutions
  231. Lab 5-1 Solutions
  232. Lab 6-1 Solutions
  233. Lab 6-2 Solutions
  234. Lab 6-3 Solutions
  235. Lab 6-4 Solutions
  236. Lab 7-1 Solutions
  237. Lab 7-2 Solutions
  238. Lab 7-3 Solutions
  239. Lab 9-1 Solutions
  240. Lab 9-2 Solutions
  241. Lab 9-3 Solutions
  242. Lab 10-1 Solutions
  243. Lab 10-2 Solutions
  244. Lab 10-3 Solutions
  245. Lab 11-1 Solutions
  246. Lab 11-2 Solutions
  247. Lab 11-3 Solutions
  248. Lab 12-1 Solutions
  249. Lab 12-2 Solutions
  250. Lab 12-3 Solutions
  251. Lab 12-4 Solutions
  252. Lab 13-1 Solutions
  253. Lab 13-2 Solutions
  254. Lab 13-3 Solutions
  255. Lab 14-1 Solutions
  256. Lab 14-2 Solutions
  257. Lab 14-3 Solutions
  258. Lab 15-1 Solutions
  259. Lab 15-2 Solutions
  260. Lab 15-3 Solutions
  261. Lab 16-1 Solutions
  262. Lab 16-2 Solutions
  263. Lab 16-3 Solutions
  264. Lab 17-1 Solutions
  265. Lab 17-2 Solutions
  266. Lab 17-3 Solutions
  267. Lab 18-1 Solutions
  268. Lab 18-2 Solutions
  269. Lab 18-3 Solutions
  270. Lab 18-4 Solutions
  271. Lab 18-5 Solutions
  272. Lab 19-1 Solutions
  273. Lab 19-2 Solutions
  274. Lab 19-3 Solutions
  275. Lab 20-1 Solutions
  276. Lab 20-2 Solutions
  277. Lab 20-3 Solutions
  278. Lab 21-1 Solutions
  279. Lab 21-2 Solutions
  280. Index
  281. Index
  282. Index
  283. Index
  284. Index
  285. Index
  286. Index
  287. Index
  288. Index
  289. Index
  290. Index
  291. Index
  292. Index
  293. Index
  294. Index
  295. Index
  296. Index
  297. Index
  298. Index
  299. Index
  300. Index
  301. Index
  302. Index
  303. Index
  304. Index
  305. Index
  306. Index
  307. Updates
  308. About the Authors
  309. Copyright

Differences in x64 Architecture

The following are the most important differences between Windows 64-bit and 32-bit architecture:

  • All addresses and pointers are 64 bits.

  • All general-purpose registers—including RAX, RBX, RCX, and so on—have increased in size, although the 32-bit versions can still be accessed. For example, the RAX register is the 64-bit version of the EAX register.

  • Some of the general-purpose registers (RDI, RSI, RBP, and RSP) have been extended to support byte accesses, by adding an L suffix to the 16-bit version. For example, BP normally accesses the lower 16 bits of RBP; now, BPL accesses the lowest 8 bits of RBP.

  • The special-purpose registers are 64-bits and have been renamed. For example, RIP is the 64-bit instruction pointer.

  • There are twice as many general-purpose registers. The new registers are labeled R8 though R15. The DWORD (32-bit) versions of these registers can be accessed as R8D, R9D, and so on. WORD (16-bit) versions are accessed with a W suffix (R8W, R9W, and so on), and byte versions are accessed with an L suffix (R8L, R9L, and so on).

x64 also supports instruction pointer–relative data addressing. This is an important difference between x64 and x86 in relation to PIC and shellcode. Specifically, in x86 assembly, anytime you want to access data at a location that is not an offset from a register, the instruction must store the entire address. This is called absolute addressing. But in x64 assembly, you can access data at a location that is an offset from the current instruction pointer. The x64 literature refers to this as RIP-relative addressing. Example 21-1 shows a simple C program that accesses a memory address.

Example 21-1. A simple C program with a data access

int x;
void foo() {
      int y = x;
      ...
}

The x86 assembly code for Example 21-1 references global data (the variable x). In order to access this data, the instruction encodes the 4 bytes representing the data’s address. This instruction is not position independent, because it will always access address 0x00403374, but if this file were to be loaded at a different location, the instruction would need to be modified so that the mov instruction accessed the correct address, as shown in Example 21-2.

Example 21-2. x86 assembly for the C program in Example 21-1

00401004 A1 74 33 40 00 mov     eax, dword_403374

You’ll notice that the bytes of the address are stored with the instruction at , , , and . Remember that the bytes are stored with the least significant byte first. The bytes 74, 33, 40, and 00 correspond to the address 0x00403374.

After recompiling for x64, Example 21-3 shows the same mov instruction that appears in Example 21-2.

Example 21-3. x64 assembly for Example 21-1

0000000140001058 8B 05 A2 D3 00 00 mov     eax, dword_14000E400

At the assembly level, there doesn’t appear to be any change. The instruction is still mov eax, dword_address, and IDA Pro automatically calculates the instruction’s address. However, the differences at the opcode level allow this code to be position-independent on x64, but not x86.

In the 64-bit version of the code, the instruction bytes do not contain the fixed address of the data. The address of the data is 14000E400, but the instruction bytes are A2 , D3 , 00 , and 00 , which correspond to the value 0x0000D3A2.

The 64-bit instruction stores the address of the data as an offset from the current instruction pointer, rather than as an absolute address, as stored in the 32-bit version. If this file were loaded at a different location, the instruction would still point to the correct address, unlike in the 32-bit version. In that case, if the file is loaded at a different address, the reference must be changed.

Instruction pointer–relative addressing is a powerful addition to the x64 instruction set that significantly decreases the number of addresses that must be relocated when a DLL is loaded. Instruction pointer–relative addressing also makes it much easier to write shellcode because it eliminates the need to obtain a pointer to EIP in order to access data. Unfortunately, this addition also makes it more difficult to detect shellcode, because it eliminates the need for a call/pop as discussed in Position-Independent Code. Many of those common shellcode techniques are unnecessary or irrelevant when working with malware written to run on the x64 architecture.

Differences in the x64 Calling Convention and Stack Usage

The calling convention used by 64-bit Windows is closest to the 32-bit fastcall calling convention discussed in Chapter 6. The first four parameters of the call are passed in the RCX, RDX, R8, and R9 registers; additional ones are stored on the stack.

Note

Most of the conventions and hints described in this section apply to compiler-generated code that runs on the Windows OS. There is no processor-enforced requirement to follow these conventions, but Microsoft’s guidelines for compilers specify certain rules in order to ensure consistency and stability. Beware, because hand-coded assembly and malicious code may disregard these rules and do the unexpected. As usual, investigate any code that doesn’t follow the rules.

In the case of 32-bit code, stack space can be allocated and unallocated in the middle of the function using push and pop instructions. However, in 64-bit code, functions cannot allocate any space in the middle of the function, regardless of whether they’re push or other stack-manipulation instructions.

Figure 21-1 compares the stack management of 32-bit and 64-bit code. Notice in the graph for a 32-bit function that the stack size grows as arguments are pushed on the stack, and then falls when the stack is cleaned up. Stack space is allocated at the beginning of the function, and moves up and down during the function call. When calling a function, the stack size grows; when the function returns, the stack size returns to normal. In contrast, the graph for a 64-bit function shows that the stack grows at the start of the function and remains at that level until the end of the function.

Stack size in the same function compiled for 32-bit and 64-bit architectures

Figure 21-1. Stack size in the same function compiled for 32-bit and 64-bit architectures

The 32-bit compiler will sometimes generate code that doesn’t change the stack size in the middle of the function, but 64-bit code never changes the stack size in the middle of the function. Although this stack restriction is not enforced by the processor, the Microsoft 64-bit exception-handling model depends on it in order to function properly. Functions that do not follow this convention may crash or cause other problems if an exception occurs.

The lack of push and pop instructions in the middle of a function can make it more difficult for an analyst to determine how many parameters a function has, because there is no easy way to tell whether a memory address is being used as a stack variable or as a parameter to a function. There’s also no way to tell whether a register is being used as a parameter. For example, if ECX is loaded with a value immediately before a function call, you can’t tell if the register is loaded as a parameter or for some other reason.

Example 21-4 shows an example of the disassembly for a function call compiled for a 32-bit processor.

Example 21-4. Call to printf compiled for a 32-bit processor

004113C0  mov     eax, [ebp+arg_0]
004113C3  push    eax
004113C4  mov     ecx, [ebp+arg_C]
004113C7  push    ecx
004113C8  mov     edx, [ebp+arg_8]
004113CB  push    edx
004113CC  mov     eax, [ebp+arg_4]
004113CF  push    eax
004113D0  push    offset aDDDD_
004113D5  call    printf
004113DB  add     esp, 14h

The 32-bit assembly has five push instructions before the call to printf, and immediately after the call to printf, 0x14 is added to the stack to clean it up. This clearly indicates that there are five parameters being passed to the printf function.

Example 21-5 shows the disassembly for the same function call compiled for a 64-bit processor:

Example 21-5. Call to printf compiled for a 64-bit processor

0000000140002C96  mov     ecx, [rsp+38h+arg_0]
0000000140002C9A  mov     eax, [rsp+38h+arg_0]
0000000140002C9E mov     [rsp+38h+var_18], eax
0000000140002CA2  mov     r9d, [rsp+38h+arg_18]
0000000140002CA7  mov     r8d, [rsp+38h+arg_10]
0000000140002CAC  mov     edx, [rsp+38h+arg_8]
0000000140002CB0  lea     rcx, aDDDD_
0000000140002CB7  call    cs:printf

In 64-bit disassembly, the number of parameters passed to printf is less evident. The pattern of load instructions in RCX, RDX, R8, and R9 appears to show parameters being moved into the registers for the printf function call, but the mov instruction at is not as clear. IDA Pro labels this as a move into a local variable, but there is no clear way to distinguish between a move into a local variable and a parameter for the function being called. In this case, we can just check the format string to see how many parameters are being passed, but in other cases, it will not be so easy.

Leaf and Nonleaf Functions

The 64-bit stack usage convention breaks functions into two categories: leaf and nonleaf functions. Any function that calls another function is called a nonleaf function, and all other functions are leaf functions.

Nonleaf functions are sometimes called frame functions because they require a stack frame. All nonleaf functions are required to allocate 0x20 bytes of stack space when they call a function. This allows the function being called to save the register parameters (RCX, RDX, R8, and R9) in that space, if necessary.

In both leaf and nonleaf functions, the stack will be modified only at the beginning or end of the function. These portions that can modify the stack frame are discussed next.

Prologue and Epilogue 64-Bit Code

Windows 64-bit assembly code has well-formed sections at the beginning and end of functions called the prologue and epilogue, which can provide useful information. Any mov instructions at the beginning of a prologue are always used to store the parameters that were passed into the function. (The compiler cannot insert mov instructions that do anything else within the prologue.) Example 21-6 shows an example of a prologue for a small function.

Example 21-6. Prologue code for a small function

00000001400010A0  mov     [rsp+arg_8], rdx
00000001400010A5  mov     [rsp+arg_0], ecx
00000001400010A9  push    rdi
00000001400010AA  sub     rsp, 20h

Here, we see that this function has two parameters: one 32-bit and one 64-bit. This function allocates 0x20 bytes from the stack, as required by all nonleaf functions as a place to provide storage for parameters. If a function has any local stack variables, it will allocate space for them in addition to the 0x20 bytes. In this case, we can tell that there are no local stack variables because only 0x20 bytes are allocated.

64-Bit Exception Handling

Unlike exception handling in 32-bit systems, structured exception handling in x64 does not use the stack. In 32-bit code, the fs:[0] is used as a pointer to the current exception handler frame, which is stored on the stack so that each function can define its own exception handler. As a result, you will often find instructions modifying fs:[0] at the beginning of a function. You will also find exploit code that overwrites the exception information on the stack in order to get control of the code executed during an exception.

Structured exception handling in x64 uses a static exception information table stored in the PE file and does not store any data on the stack. Also, there is an _IMAGE_RUNTIME_FUNCTION_ENTRY structure in the .pdata section for every function in the executable that stores the beginning and ending address of the function, as well as a pointer to exception-handling information for that function.