Table of Contents for
Practical Malware Analysis

Version ebook / Retour

Cover image for bash Cookbook, 2nd Edition Practical Malware Analysis by Andrew Honig Published by No Starch Press, 2012
  1. Cover
  2. Practical Malware Analysis: The Hands-On Guide to Dissecting Malicious Software
  3. Praise for Practical Malware Analysis
  4. Warning
  5. About the Authors
  6. About the Technical Reviewer
  7. About the Contributing Authors
  8. Foreword
  9. Acknowledgments
  10. Individual Thanks
  11. Introduction
  12. What Is Malware Analysis?
  13. Prerequisites
  14. Practical, Hands-On Learning
  15. What’s in the Book?
  16. 0. Malware Analysis Primer
  17. The Goals of Malware Analysis
  18. Malware Analysis Techniques
  19. Types of Malware
  20. General Rules for Malware Analysis
  21. I. Basic Analysis
  22. 1. Basic Static Techniques
  23. Antivirus Scanning: A Useful First Step
  24. Hashing: A Fingerprint for Malware
  25. Finding Strings
  26. Packed and Obfuscated Malware
  27. Portable Executable File Format
  28. Linked Libraries and Functions
  29. Static Analysis in Practice
  30. The PE File Headers and Sections
  31. Conclusion
  32. Labs
  33. 2. Malware Analysis in Virtual Machines
  34. The Structure of a Virtual Machine
  35. Creating Your Malware Analysis Machine
  36. Using Your Malware Analysis Machine
  37. The Risks of Using VMware for Malware Analysis
  38. Record/Replay: Running Your Computer in Reverse
  39. Conclusion
  40. 3. Basic Dynamic Analysis
  41. Sandboxes: The Quick-and-Dirty Approach
  42. Running Malware
  43. Monitoring with Process Monitor
  44. Viewing Processes with Process Explorer
  45. Comparing Registry Snapshots with Regshot
  46. Faking a Network
  47. Packet Sniffing with Wireshark
  48. Using INetSim
  49. Basic Dynamic Tools in Practice
  50. Conclusion
  51. Labs
  52. II. Advanced Static Analysis
  53. 4. A Crash Course in x86 Disassembly
  54. Levels of Abstraction
  55. Reverse-Engineering
  56. The x86 Architecture
  57. Conclusion
  58. 5. IDA Pro
  59. Loading an Executable
  60. The IDA Pro Interface
  61. Using Cross-References
  62. Analyzing Functions
  63. Using Graphing Options
  64. Enhancing Disassembly
  65. Extending IDA with Plug-ins
  66. Conclusion
  67. Labs
  68. 6. Recognizing C Code Constructs in Assembly
  69. Global vs. Local Variables
  70. Disassembling Arithmetic Operations
  71. Recognizing if Statements
  72. Recognizing Loops
  73. Understanding Function Call Conventions
  74. Analyzing switch Statements
  75. Disassembling Arrays
  76. Identifying Structs
  77. Analyzing Linked List Traversal
  78. Conclusion
  79. Labs
  80. 7. Analyzing Malicious Windows Programs
  81. The Windows API
  82. The Windows Registry
  83. Networking APIs
  84. Following Running Malware
  85. Kernel vs. User Mode
  86. The Native API
  87. Conclusion
  88. Labs
  89. III. Advanced Dynamic Analysis
  90. 8. Debugging
  91. Source-Level vs. Assembly-Level Debuggers
  92. Kernel vs. User-Mode Debugging
  93. Using a Debugger
  94. Exceptions
  95. Modifying Execution with a Debugger
  96. Modifying Program Execution in Practice
  97. Conclusion
  98. 9. OllyDbg
  99. Loading Malware
  100. The OllyDbg Interface
  101. Memory Map
  102. Viewing Threads and Stacks
  103. Executing Code
  104. Breakpoints
  105. Loading DLLs
  106. Tracing
  107. Exception Handling
  108. Patching
  109. Analyzing Shellcode
  110. Assistance Features
  111. Plug-ins
  112. Scriptable Debugging
  113. Conclusion
  114. Labs
  115. 10. Kernel Debugging with WinDbg
  116. Drivers and Kernel Code
  117. Setting Up Kernel Debugging
  118. Using WinDbg
  119. Microsoft Symbols
  120. Kernel Debugging in Practice
  121. Rootkits
  122. Loading Drivers
  123. Kernel Issues for Windows Vista, Windows 7, and x64 Versions
  124. Conclusion
  125. Labs
  126. IV. Malware Functionality
  127. 11. Malware Behavior
  128. Downloaders and Launchers
  129. Backdoors
  130. Credential Stealers
  131. Persistence Mechanisms
  132. Privilege Escalation
  133. Covering Its Tracks—User-Mode Rootkits
  134. Conclusion
  135. Labs
  136. 12. Covert Malware Launching
  137. Launchers
  138. Process Injection
  139. Process Replacement
  140. Hook Injection
  141. Detours
  142. APC Injection
  143. Conclusion
  144. Labs
  145. 13. Data Encoding
  146. The Goal of Analyzing Encoding Algorithms
  147. Simple Ciphers
  148. Common Cryptographic Algorithms
  149. Custom Encoding
  150. Decoding
  151. Conclusion
  152. Labs
  153. 14. Malware-Focused Network Signatures
  154. Network Countermeasures
  155. Safely Investigate an Attacker Online
  156. Content-Based Network Countermeasures
  157. Combining Dynamic and Static Analysis Techniques
  158. Understanding the Attacker’s Perspective
  159. Conclusion
  160. Labs
  161. V. Anti-Reverse-Engineering
  162. 15. Anti-Disassembly
  163. Understanding Anti-Disassembly
  164. Defeating Disassembly Algorithms
  165. Anti-Disassembly Techniques
  166. Obscuring Flow Control
  167. Thwarting Stack-Frame Analysis
  168. Conclusion
  169. Labs
  170. 16. Anti-Debugging
  171. Windows Debugger Detection
  172. Identifying Debugger Behavior
  173. Interfering with Debugger Functionality
  174. Debugger Vulnerabilities
  175. Conclusion
  176. Labs
  177. 17. Anti-Virtual Machine Techniques
  178. VMware Artifacts
  179. Vulnerable Instructions
  180. Tweaking Settings
  181. Escaping the Virtual Machine
  182. Conclusion
  183. Labs
  184. 18. Packers and Unpacking
  185. Packer Anatomy
  186. Identifying Packed Programs
  187. Unpacking Options
  188. Automated Unpacking
  189. Manual Unpacking
  190. Tips and Tricks for Common Packers
  191. Analyzing Without Fully Unpacking
  192. Packed DLLs
  193. Conclusion
  194. Labs
  195. VI. Special Topics
  196. 19. Shellcode Analysis
  197. Loading Shellcode for Analysis
  198. Position-Independent Code
  199. Identifying Execution Location
  200. Manual Symbol Resolution
  201. A Full Hello World Example
  202. Shellcode Encodings
  203. NOP Sleds
  204. Finding Shellcode
  205. Conclusion
  206. Labs
  207. 20. C++ Analysis
  208. Object-Oriented Programming
  209. Virtual vs. Nonvirtual Functions
  210. Creating and Destroying Objects
  211. Conclusion
  212. Labs
  213. 21. 64-Bit Malware
  214. Why 64-Bit Malware?
  215. Differences in x64 Architecture
  216. Windows 32-Bit on Windows 64-Bit
  217. 64-Bit Hints at Malware Functionality
  218. Conclusion
  219. Labs
  220. A. Important Windows Functions
  221. B. Tools for Malware Analysis
  222. C. Solutions to Labs
  223. Lab 1-1 Solutions
  224. Lab 1-2 Solutions
  225. Lab 1-3 Solutions
  226. Lab 1-4 Solutions
  227. Lab 3-1 Solutions
  228. Lab 3-2 Solutions
  229. Lab 3-3 Solutions
  230. Lab 3-4 Solutions
  231. Lab 5-1 Solutions
  232. Lab 6-1 Solutions
  233. Lab 6-2 Solutions
  234. Lab 6-3 Solutions
  235. Lab 6-4 Solutions
  236. Lab 7-1 Solutions
  237. Lab 7-2 Solutions
  238. Lab 7-3 Solutions
  239. Lab 9-1 Solutions
  240. Lab 9-2 Solutions
  241. Lab 9-3 Solutions
  242. Lab 10-1 Solutions
  243. Lab 10-2 Solutions
  244. Lab 10-3 Solutions
  245. Lab 11-1 Solutions
  246. Lab 11-2 Solutions
  247. Lab 11-3 Solutions
  248. Lab 12-1 Solutions
  249. Lab 12-2 Solutions
  250. Lab 12-3 Solutions
  251. Lab 12-4 Solutions
  252. Lab 13-1 Solutions
  253. Lab 13-2 Solutions
  254. Lab 13-3 Solutions
  255. Lab 14-1 Solutions
  256. Lab 14-2 Solutions
  257. Lab 14-3 Solutions
  258. Lab 15-1 Solutions
  259. Lab 15-2 Solutions
  260. Lab 15-3 Solutions
  261. Lab 16-1 Solutions
  262. Lab 16-2 Solutions
  263. Lab 16-3 Solutions
  264. Lab 17-1 Solutions
  265. Lab 17-2 Solutions
  266. Lab 17-3 Solutions
  267. Lab 18-1 Solutions
  268. Lab 18-2 Solutions
  269. Lab 18-3 Solutions
  270. Lab 18-4 Solutions
  271. Lab 18-5 Solutions
  272. Lab 19-1 Solutions
  273. Lab 19-2 Solutions
  274. Lab 19-3 Solutions
  275. Lab 20-1 Solutions
  276. Lab 20-2 Solutions
  277. Lab 20-3 Solutions
  278. Lab 21-1 Solutions
  279. Lab 21-2 Solutions
  280. Index
  281. Index
  282. Index
  283. Index
  284. Index
  285. Index
  286. Index
  287. Index
  288. Index
  289. Index
  290. Index
  291. Index
  292. Index
  293. Index
  294. Index
  295. Index
  296. Index
  297. Index
  298. Index
  299. Index
  300. Index
  301. Index
  302. Index
  303. Index
  304. Index
  305. Index
  306. Index
  307. Updates
  308. About the Authors
  309. Copyright

Credential Stealers

Attackers often go to great lengths to steal credentials, primarily with three types of malware:

  • Programs that wait for a user to log in in order to steal their credentials

  • Programs that dump information stored in Windows, such as password hashes, to be used directly or cracked offline

  • Programs that log keystrokes

    In this section, we will discuss each of these types of malware.

GINA Interception

On Windows XP, Microsoft’s Graphical Identification and Authentication (GINA) interception is a technique that malware uses to steal user credentials. The GINA system was intended to allow legitimate third parties to customize the logon process by adding support for things like authentication with hardware radio-frequency identification (RFID) tokens or smart cards. Malware authors take advantage of this third-party support to load their credential stealers.

GINA is implemented in a DLL, msgina.dll, and is loaded by the Winlogon executable during the login process. Winlogon also works for third-party customizations implemented in DLLs by loading them in between Winlogon and the GINA DLL (like a man-in-the-middle attack). Windows conveniently provides the following registry location where third-party DLLs will be found and loaded by Winlogon:

HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Winlogon\GinaDLL

In one instance, we found a malicious file fsgina.dll installed in this registry location as a GINA interceptor.

Figure 11-2 shows an example of the way that logon credentials flow through a system with a malicious file between Winlogon and msgina.dll. The malware (fsgina.dll) is able to capture all user credentials submitted to the system for authentication. It can log that information to disk or pass it over the network.

Malicious fsgina.dll sits in between the Windows system files to capture data.

Figure 11-2. Malicious fsgina.dll sits in between the Windows system files to capture data.

Because fsgina.dll intercepts the communication between Winlogon and msgina.dll, it must pass the credential information on to msgina.dll so that the system will continue to operate normally. In order to do so, the malware must contain all DLL exports required by GINA; specifically, it must export more than 15 functions, most of which are prepended with Wlx. Clearly, if you find that you are analyzing a DLL with many export functions that begin with the string Wlx, you have a good indicator that you are examining a GINA interceptor.

Most of these exports simply call through to the real functions in msgina.dll. In the case of fsgina.dll, all but the WlxLoggedOutSAS export call through to the real functions. Example 11-1 shows the WlxLoggedOutSAS export of fsgina.dll.

Example 11-1. GINA DLL WlxLoggedOutSAS export function for logging stolen credentials

100014A0 WlxLoggedOutSAS
100014A0         push    esi
100014A1         push    edi
100014A2         push    offset aWlxloggedout_0 ; "WlxLoggedOutSAS"
100014A7         call    Call_msgina_dll_function 
...
100014FB         push    eax ; Args
100014FC         push    offset aUSDSPSOpS ;"U: %s D: %s P: %s OP: %s"
10001501         push    offset aDRIVERS ; "drivers\tcpudp.sys"
10001503         call    Log_To_File 

As you can see at , the credential information is immediately passed to msgina.dll by the call we have labeled Call_msgina_dll_function. This function dynamically resolves and calls WlxLoggedOutSAS in msgina.dll, which is passed in as a parameter. The call at performs the logging. It takes parameters of the credential information, a format string that will be used to print the credentials, and the log filename. As a result, all successful user logons are logged to %SystemRoot%\system32\drivers\tcpudp.sys. The log includes the username, domain, password, and old password.

Hash Dumping

Dumping Windows hashes is a popular way for malware to access system credentials. Attackers try to grab these hashes in order to crack them offline or to use them in a pass-the-hash attack. A pass-the-hash attack uses LM and NTLM hashes to authenticate to a remote host (using NTLM authentication) without needing to decrypt or crack the hashes to obtain the plaintext password to log in.

Pwdump and the Pass-the-Hash (PSH) Toolkit are freely available packages that provide hash dumping. Since both of these tools are open source, a lot of malware is derived from their source code. Most antivirus programs have signatures for the default compiled versions of these tools, so attackers often try to compile their own versions in order to avoid detection. The examples in this section are derived versions of pwdump or PSH that we have encountered in the field.

Pwdump is a set of programs that outputs the LM and NTLM password hashes of local user accounts from the Security Account Manager (SAM). Pwdump works by performing DLL injection inside the Local Security Authority Subsystem Service (LSASS) process (better known as lsass.exe). We’ll discuss DLL injection in depth in Chapter 12. For now, just know that it is a way that malware can run a DLL inside another process, thereby providing that DLL with all of the privileges of that process. Hash dumping tools often target lsass.exe because it has the necessary privilege level as well as access to many useful API functions.

Standard pwdump uses the DLL lsaext.dll. Once it is running inside lsass.exe, pwdump calls GetHash, which is exported by lsaext.dll in order to perform the hash extraction. This extraction uses undocumented Windows function calls to enumerate the users on a system and get the password hashes in unencrypted form for each user.

When dealing with pwdump variants, you will need to analyze DLLs in order to determine how the hash dumping operates. Start by looking at the DLL’s exports. The default export name for pwdump is GetHash, but attackers can easily change the name to make it less obvious. Next, try to determine the API functions used by the exports. Many of these functions will be dynamically resolved, so the hash dumping exports often call GetProcAddress many times.

Example 11-2 shows the code in the exported function GrabHash from a pwdump variant DLL. Since this DLL was injected into lsass.exe, it must manually resolve numerous symbols before using them.

Example 11-2. Unique API calls used by a pwdump variant’s export function GrabHash

1000123F         push    offset LibFileName      ; "samsrv.dll" 
10001244         call    esi ; LoadLibraryA
...
10001248         push    offset aAdvapi32_dll_0  ; "advapi32.dll" 
...
10001251         call    esi ; LoadLibraryA
...
1000125B         push    offset ProcName         ; "SamIConnect"
10001260         push    ebx                     ; hModule
...
10001265         call    esi ; GetProcAddress
...
10001281         push    offset aSamrqu ; "SamrQueryInformationUser"
10001286         push    ebx                     ; hModule
...
1000128C         call    esi ; GetProcAddress
...
100012C2         push    offset aSamigetpriv ; "SamIGetPrivateData"
100012C7         push    ebx                     ; hModule
...
100012CD         call    esi ; GetProcAddress
100012CF         push    offset aSystemfuncti  ; "SystemFunction025" 
100012D4         push    edi                     ; hModule
...
100012DA         call    esi ; GetProcAddress
100012DC         push    offset aSystemfuni_0  ; "SystemFunction027" 
100012E1         push    edi                     ; hModule
...
100012E7         call    esi ; GetProcAddress

Example 11-2 shows the code obtaining handles to the libraries samsrv.dll and advapi32.dll via LoadLibrary at and . Samsrv.dll contains an API to easily access the SAM, and advapi32.dll is resolved to access functions not already imported into lsass.exe. The pwdump variant DLL uses the handles to these libraries to resolve many functions, with the most important five shown in the listing (look for the GetProcAddress calls and parameters).

The interesting imports resolved from samsrv.dll are SamIConnect, SamrQueryInformationUser, and SamIGetPrivateData. Later in the code, SamIConnect is used to connect to the SAM, followed by calling SamrQueryInformationUser for each user on the system.

The hashes will be extracted with SamIGetPrivateData and decrypted by SystemFunction025 and SystemFunction027, which are imported from advapi32.dll, as seen at and . None of the API functions in this listing are documented by Microsoft.

The PSH Toolkit contains programs that dump hashes, the most popular of which is known as whosthere-alt. whosthere-alt dumps the SAM by injecting a DLL into lsass.exe, but using a completely different set of API functions from pwdump. Example 11-3 shows code from a whosthere-alt variant that exports a function named TestDump.

Example 11-3. Unique API calls used by a whosthere-alt variant’s export function TestDump

10001119        push    offset LibFileName ; "secur32.dll"
1000111E        call    ds:LoadLibraryA
10001130        push    offset ProcName ; "LsaEnumerateLogonSessions"
10001135        push    esi             ; hModule
10001136        call    ds:GetProcAddress 
...
10001670        call    ds:GetSystemDirectoryA
10001676        mov     edi, offset aMsv1_0_dll ; \\msv1_0.dll
...
100016A6        push    eax             ; path to msv1_0.dll
100016A9        call    ds:GetModuleHandleA 

Since this DLL is injected into lsass.exe, its TestDump function performs the hash dumping. This export dynamically loads secur32.dll and resolves its LsaEnumerateLogonSessions function at to obtain a list of locally unique identifiers (known as LUIDs). This list contains the usernames and domains for each logon and is iterated through by the DLL, which gets access to the credentials by finding a nonexported function in the msv1_0.dll Windows DLL in the memory space of lsass.exe using the call to GetModuleHandle shown at . This function, NlpGetPrimaryCredential, is used to dump the NT and LM hashes.

Note

While it is important to recognize the dumping technique, it might be more critical to determine what the malware is doing with the hashes. Is it storing them on a disk, posting them to a website, or using them in a pass-the-hash attack? These details could be really important, so identifying the low-level hash dumping method should be avoided until the overall functionality is determined.

Keystroke Logging

Keylogging is a classic form of credential stealing. When keylogging, malware records keystrokes so that an attacker can observe typed data like usernames and passwords. Windows malware uses many forms of keylogging.

Kernel-Based Keyloggers

Kernel-based keyloggers are difficult to detect with user-mode applications. They are frequently part of a rootkit and they can act as keyboard drivers to capture keystrokes, bypassing user-space programs and protections.

User-Space Keyloggers

Windows user-space keyloggers typically use the Windows API and are usually implemented with either hooking or polling. Hooking uses the Windows API to notify the malware each time a key is pressed, typically with the SetWindowsHookEx function. Polling uses the Windows API to constantly poll the state of the keys, typically using the GetAsyncKeyState and GetForegroundWindow functions.

Hooking keyloggers leverage the Windows API function SetWindowsHookEx. This type of keylogger may come packaged as an executable that initiates the hook function, and may include a DLL file to handle logging that can be mapped into many processes on the system automatically. We discuss using SetWindowsHookEx in Chapter 12.

We’ll focus on polling keyloggers that use GetAsyncKeyState and GetForegroundWindow. The GetAsyncKeyState function identifies whether a key is pressed or depressed, and whether the key was pressed after the most recent call to GetAsyncKeyState. The GetForegroundWindow function identifies the foreground window—the one that has focus—which tells the keylogger which application is being used for keyboard entry (Notepad or Internet Explorer, for example).

Figure 11-3 illustrates a typical loop structure found in a polling keylogger. The program begins by calling GetForegroundWindow, which logs the active window. Next, the inner loop iterates through a list of keys on the keyboard. For each key, it calls GetAsyncKeyState to determine if a key has been pressed. If so, the program checks the SHIFT and CAPS LOCK keys to determine how to log the keystroke properly. Once the inner loop has iterated through the entire list of keys, the GetForegroundWindow function is called again to ensure the user is still in the same window. This process repeats quickly enough to keep up with a user’s typing. (The keylogger may call the Sleep function to keep the program from eating up system resources.)

Loop structure of GetAsyncKeyState and GetForegroundWindow keylogger

Figure 11-3. Loop structure of GetAsyncKeyState and GetForegroundWindow keylogger

Example 11-4 shows the loop structure in Figure 11-3 disassembled.

Example 11-4. Disassembly of GetAsyncKeyState and GetForegroundWindow keylogger

00401162         call    ds:GetForegroundWindow
...
00401272         push    10h                    ; nVirtKey Shift
00401274         call    ds:GetKeyState
0040127A         mov     esi, dword_403308[ebx] 
00401280         push    esi                     ; vKey
00401281         movsx   edi, ax
00401284         call    ds:GetAsyncKeyState
0040128A         test    ah, 80h
0040128D         jz      short loc_40130A
0040128F         push    14h                     ; nVirtKey Caps Lock
00401291         call    ds:GetKeyState
...
004013EF         add     ebx, 4 
004013F2         cmp     ebx, 368
004013F8         jl      loc_401272

The program calls GetForegroundWindow before entering the inner loop. The inner loop starts at and immediately checks the status of the SHIFT key using a call to GetKeyState. GetKeyState is a quick way to check a key status, but it does not remember whether or not the key was pressed since the last time it was called, as GetAsyncKeyState does. Next, at the keylogger indexes an array of the keys on the keyboard using EBX. If a new key is pressed, then the keystroke is logged after calling GetKeyState to see if CAPS LOCK is activated. Finally, EBX is incremented at so that the next key in the list can be checked. Once 92 keys (368/4) have been checked, the inner loop terminates, and GetForegroundWindow is called again to start the inner loop from the beginning.

Identifying Keyloggers in Strings Listings

You can recognize keylogger functionality in malware by looking at the imports for the API functions, or by examining the strings listing for indicators, which is particularly useful if the imports are obfuscated or the malware is using keylogging functionality that you have not encountered before. For example, the following listing of strings is from the keylogger described in the previous section:

[Up]
[Num Lock]
[Down]
[Right]
[UP]
[Left]
[PageDown]

If a keylogger wants to log all keystrokes, it must have a way to print keys like PAGE DOWN, and must have access to these strings. Working backward from the cross-references to these strings can be a way to recognize keylogging functionality in malware.