Table of Contents for
Learning Linux Binary Analysis

Version ebook / Retour

Cover image for bash Cookbook, 2nd Edition Learning Linux Binary Analysis by Ryan elfmaster O'Neill Published by Packt Publishing, 2016
  1. Cover
  2. Table of Contents
  3. Learning Linux Binary Analysis
  4. Learning Linux Binary Analysis
  5. Credits
  6. About the Author
  7. Acknowledgments
  8. About the Reviewers
  9. www.PacktPub.com
  10. Preface
  11. What you need for this book
  12. Who this book is for
  13. Conventions
  14. Reader feedback
  15. Customer support
  16. 1. The Linux Environment and Its Tools
  17. Useful devices and files
  18. Linker-related environment points
  19. Summary
  20. 2. The ELF Binary Format
  21. ELF program headers
  22. ELF section headers
  23. ELF symbols
  24. ELF relocations
  25. ELF dynamic linking
  26. Coding an ELF Parser
  27. Summary
  28. 3. Linux Process Tracing
  29. ptrace requests
  30. The process register state and flags
  31. A simple ptrace-based debugger
  32. A simple ptrace debugger with process attach capabilities
  33. Advanced function-tracing software
  34. ptrace and forensic analysis
  35. Process image reconstruction – from the memory to the executable
  36. Code injection with ptrace
  37. Simple examples aren't always so trivial
  38. Demonstrating the code_inject tool
  39. A ptrace anti-debugging trick
  40. Summary
  41. 4. ELF Virus Technology �� Linux/Unix Viruses
  42. ELF virus engineering challenges
  43. ELF virus parasite infection methods
  44. The PT_NOTE to PT_LOAD conversion infection method
  45. Infecting control flow
  46. Process memory viruses and rootkits – remote code injection techniques
  47. ELF anti-debugging and packing techniques
  48. ELF virus detection and disinfection
  49. Summary
  50. 5. Linux Binary Protection
  51. Stub mechanics and the userland exec
  52. Other jobs performed by protector stubs
  53. Existing ELF binary protectors
  54. Downloading Maya-protected binaries
  55. Anti-debugging for binary protection
  56. Resistance to emulation
  57. Obfuscation methods
  58. Protecting control flow integrity
  59. Other resources
  60. Summary
  61. 6. ELF Binary Forensics in Linux
  62. Detecting other forms of control flow hijacking
  63. Identifying parasite code characteristics
  64. Checking the dynamic segment for DLL injection traces
  65. Identifying reverse text padding infections
  66. Identifying text segment padding infections
  67. Identifying protected binaries
  68. IDA Pro
  69. Summary
  70. 7. Process Memory Forensics
  71. Process memory infection
  72. Detecting the ET_DYN injection
  73. Linux ELF core files
  74. Summary
  75. 8. ECFS – Extended Core File Snapshot Technology
  76. The ECFS philosophy
  77. Getting started with ECFS
  78. libecfs – a library for parsing ECFS files
  79. readecfs
  80. Examining an infected process using ECFS
  81. The ECFS reference guide
  82. Process necromancy with ECFS
  83. Learning more about ECFS
  84. Summary
  85. 9. Linux /proc/kcore Analysis
  86. stock vmlinux has no symbols
  87. /proc/kcore and GDB exploration
  88. Direct sys_call_table modifications
  89. Kprobe rootkits
  90. Debug register rootkits – DRR
  91. VFS layer rootkits
  92. Other kernel infection techniques
  93. vmlinux and .altinstructions patching
  94. Using taskverse to see hidden processes
  95. Infected LKMs – kernel drivers
  96. Notes on /dev/kmem and /dev/mem
  97. /dev/mem
  98. K-ecfs – kernel ECFS
  99. Kernel hacking goodies
  100. Summary
  101. Index

The ECFS reference guide

The ECFS file format is both simple and complicated! The ELF file format is complex in general, and ECFS inherits those complexities from a structural point of view. On the other side of the token, ECFS helps make navigating a process image quite easy if you know what specific features it has and what to look for.

In previous sections, we gave some real-life examples of utilizing ECFS that demonstrated many of its primary features. However, it is also important to have a simple and direct reference to what those characteristics are, such as which custom sections exist and what exactly they mean. In this section, we will provide a reference for the ECFS snapshot files.

ECFS symbol table reconstruction

The ECFS handler uses advanced understanding of the ELF binary format and even the dwarf debugging format—specifically with the dynamic segment and the GNU_EH_FRAME segment—to fully reconstruct the symbol tables of the program. Even if the original binary has been stripped and has no section headers, the ECFS handler is intelligent enough to rebuild the symbol tables.

I have personally never encountered a situation where symbol table reconstruction failed completely. It usually reconstructs all or most symbol table entries. The symbol tables can be accessed using a utility such as readelf or readecfs. The libecfs API also has several functions:

int get_dynamic_symbols(ecfs_elf_t *desc, ecfs_sym_t **syms)
int get_local_symbols(ecfs_elf_t *desc, ecfs_sym_t **syms)

One function gets the dynamic symbol table and the other gets the local symbol table—.dynsym and .symtab, respectively.

The following is the reading symbol table with readelf:

$ readelf -s host.6758

Symbol table '.dynsym' contains 8 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 00007f3dfd48b000     0 NOTYPE  LOCAL  DEFAULT  UND
     1: 00007f3dfd4f9730     0 FUNC    GLOBAL DEFAULT  UND fputs
     2: 00007f3dfd4acdd0     0 FUNC    GLOBAL DEFAULT  UND __libc_start_main
     3: 00007f3dfd4f9220     0 FUNC    GLOBAL DEFAULT  UND fgets
     4: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND __gmon_start__
     5: 00007f3dfd4f94e0     0 FUNC    GLOBAL DEFAULT  UND fopen
     6: 00007f3dfd54bd00     0 FUNC    GLOBAL DEFAULT  UND sleep
     7: 00007f3dfd84a870     8 OBJECT  GLOBAL DEFAULT   25 stdout

Symbol table '.symtab' contains 5 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 00000000004004f0   112 FUNC    GLOBAL DEFAULT   10 sub_4004f0
     1: 0000000000400560    42 FUNC    GLOBAL DEFAULT   10 sub_400560
     2: 000000000040064d   138 FUNC    GLOBAL DEFAULT   10 sub_40064d
     3: 00000000004006e0   101 FUNC    GLOBAL DEFAULT   10 sub_4006e0
     4: 0000000000400750     2 FUNC    GLOBAL DEFAULT   10 sub_400750

ECFS section headers

The ECFS handler reconstructs most of the original section headers that a program may have had. It also adds quite a few new sections and section types that can be very useful for forensic analysis. Section headers are identified by both name and type and contain data or code.

Parsing section headers is very easy, and therefore they are very useful for creating a map of the process memory image. Navigating the entire process layout through section headers is a lot easier than having only program headers (such as with regular core files), which don't even have string names. The program headers are what describe the segments of memory, and the section headers are what give context to each part of a given segment. Section headers help give a much higher resolution to the reverse engineer.

Section header

Description

._TEXT

This points to the text segment (not the .text section). This makes locating the text segment possible without having to parse the program headers.

._DATA

This points to the data segment (not the .data section). This makes locating the data segment possible without having to parse the program headers.

.stack

This points to one of several possible stack segments depending on the number of threads. Without a section named .stack, it would be far more difficult to know where the actual stack of the process is. You would have to look at the value of the %rsp register and then see which program header segments contain address ranges that match the stack pointer value.

.heap

Similar to the .stack section, this points to the heap segment, also making identification of the heap much easier, especially on systems where ASLR moves the heap to random locations. On older systems, it was always extended from the data segment.

.bss

This section is not new with ECFS. The only reason it is mentioned here is that with an executable or shared library, the .bss section contains nothing, since uninitialized data takes up no space on disk. ECFS represents the memory, however, and the .bss section is not actually created until runtime. The ECFS files have a .bss section that actually reflects the uninitialized data variables being used by the process.

.vdso

This points to the [vdso] segment that is mapped into every Linux process containing code that is necessary for certain glibc system call wrappers to invoke the real system call.

.vsyscall

Similar to the .vdso code, the .vsyscall page contains code for invoking only a handful of virtual system calls. It has been kept around for backwards compatibility. It may prove useful to know this location during reverse engineering.

.procfs.tgz

This section contains the entire directory structure and files for the /proc/$pid of the process that was captured by the ECFS handler. If you are an avid forensic analyst or programmer, then you probably already know how useful the information contained in the proc filesystem is. There are well over 300 files within /proc/$pid for a single process.

.prstatus

This section contains an array of struct elf_prstatus structures. Very important information pertaining to the state of the process and its registers is stored in these structures:

struct elf_prstatus
  {
    struct elf_siginfo pr_info;         /* Info associated with signal.  */
    short int pr_cursig;                /* Current signal.  */
    unsigned long int pr_sigpend;       /* Set of pending signals.  */
    unsigned long int pr_sighold;       /* Set of held signals.  */
    __pid_t pr_pid;
    __pid_t pr_ppid;
    __pid_t pr_pgrp;
    __pid_t pr_sid;
    struct timeval pr_utime;            /* User time.  */
    struct timeval pr_stime;            /* System time.  */
    struct timeval pr_cutime;           /* Cumulative user time.  */
    struct timeval pr_cstime;           /* Cumulative system time.  */
    elf_gregset_t pr_reg;               /* GP registers.  */
    int pr_fpvalid;                     /* True if math copro being used.  */
  };

.fdinfo

This section contains ECFS custom data that describes the file descriptors, sockets, and pipes being used for the processes' open files, network connections, and inter-process communication. The header file, ecfs.h, defines the fdinfo_t type:

typedef struct fdinfo {
        int fd;
        char path[MAX_PATH];
        loff_t pos;
        unsigned int perms;
        struct {
                struct in_addr src_addr;
                struct in_addr dst_addr;
                uint16_t src_port;
                uint16_t dst_port;
        } socket;
        char net;
} fd_info_t;

The readecfs utility parses and displays the file descriptor information nicely, as shown when looking at an ECFS snapshot for sshd:

        [fd: 0:0] perms: 8002 path: /dev/null
        [fd: 1:0] perms: 8002 path: /dev/null
        [fd: 2:0] perms: 8002 path: /dev/null
        [fd: 3:0] perms: 802 path: socket:[10161]
        PROTOCOL: TCP
        SRC: 0.0.0.0:22
        DST: 0.0.0.0:0

        [fd: 4:0] perms: 802 path: socket:[10163]
        PROTOCOL: TCP
        SRC: 0.0.0.0:22
        DST: 0.0.0.0:0

.siginfo

This section contains signal-specific information, such as what signal killed the process, or what the last signal code was before the snapshot was taken. The siginfo_t struct is stored in this section. The format of this struct can be seen in /usr/include/bits/siginfo.h.

.auxvector

This contains the actual auxiliary vector from the bottom of the stack (the highest memory address). The auxiliary vector is set up by the kernel at runtime, and it contains information that is passed to the dynamic linker at runtime. This information may prove valuable in a number of ways to the advanced forensic analyst.

.exepath

This holds the string of the original executable path that was invoked for this process, that is, /usr/sbin/sshd.

.personality

This contains personality information, that is, ECFS personality information. An 8-byte unsigned integer can be set with any number of personality flags:

#define ELF_STATIC (1 << 1) // if it's statically linked (instead of dynamically)
#define ELF_PIE (1 << 2)    // if it's a PIE executable
#define ELF_LOCSYM (1 << 3) // was a .symtab symbol table created by ecfs?
#define ELF_HEURISTICS (1 << 4) // were detection heuristics used by ecfs?
#define ELF_STRIPPED_SHDRS (1 << 8) // did the binary have section headers?

.arglist

Contains the original 'char **argv' stored as an array in this section.

Using an ECFS file as a regular core file

The ECFS core file format is essentially backward compatible with regular Linux core files, and can therefore be used as core files for debugging with GDB in the traditional way.

The ELF file header for ECFS files has its e_type (ELF type) set to ET_NONE instead of ET_CORE, however. This is because core files are not expected to have section headers but ECFS files do have section headers, and to make sure that they are acknowledged by certain utilities such as objdump, objcopy, and so on, we have to mark them as files other than CORE files. The quickest way to toggle the ELF type in an ECFS file is with the et_flip utility that comes with the ECFS software suite.

Here's an example of using GDB with an ECFS core file:

$ gdb -q /usr/sbin/sshd sshd.1195
Reading symbols from /usr/sbin/sshd...(no debugging symbols found)...done.
"/opt/ecfs/cores/sshd.1195" is not a core dump: File format not recognized
(gdb) quit

Then, the following is an example of changing the ELF file type to ET_CORE and trying again:

$ et_flip sshd.1195
$ gdb -q /usr/sbin/sshd sshd.1195
Reading symbols from /usr/sbin/sshd...(no debugging symbols found)...done.
[New LWP 1195]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/sbin/sshd -D'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007ff4066b8d83 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:81
81  ../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb)

The libecfs API and how to use it

The libecfs API is the key component for integrating ECFS support into your malware analysis and reverse engineering tools for Linux. There is too much to document on this library to put into a single chapter of this book. I recommend that you use the manual that is still growing right alongside the project itself:

https://github.com/elfmaster/ecfs/blob/master/Documentation/libecfs_manual.txt