Igor Zhirkov, Low-Level Programming, 10.1007/978-1-4842-2403-8_14

14. Translation Details

Igor Zhirkov¹

(1)Saint Petersburg, Russia

In this chapter we are going to revisit the notion of calling convention to deepen our understanding and work through translation details. This process requires both understanding program functioning on the assembly level and a certain degree of familiarity with C. We are also going to review some classic low-level security vulnerabilities that might be opened by a careless programmer. Understanding these low-level translation details is sometimes crucial for eradicating very subtle bugs that do not reveal themselves at every execution.

14.1 Function Calling Sequence

In Chapter 2 we studied how to call the procedures, how they return values, and how they accept arguments. The full calling sequence is described in [24] and we highly recommend you to take a look at it. We are going to revisit this process and add valuable details.

14.1.1 XMM Registers

Besides the registers we have already talked about, the modern processors have several sets of special registers that come from processor extensions. An extension provides additional circuitry, expands an instruction set, and sometimes adds usable registers. A notable extension is called SSE (Streaming SIMD Extensions) and describes a set of xmm registers: xmm0, xmm1, ..., xmm15. They are 128 bits wide and are usually used for two kinds of tasks:

Floating point arithmetic; and
SIMD instructions (such instructions are performing an action on multiple data).

The usual mov command cannot work with xmm registers. The movq command is used instead to copy data between the least significant half of xmm registers (64 bits of 128) on one side and xmm registers, general purpose registers, or memory on the other side (also 64 bits).

To fill the whole xmm register, you have two options: movdqa and movdqu. The first one is deciphered as “move aligned double quad word,” the second is the unaligned version.

Most SSE instructions require the memory operands to be properly aligned. The unaligned versions of these instructions often exist with different mnemonic and imply a performance penalty due to an unaligned read. As SSE instructions are often used in performance sensitive places, it is usually wiser to stick to the instructions requiring operand alignment.

We will use the SSE instructions to perform high-performance computations in section 16.4.1.

Question 263

Read about the movq, movdqa, and movdqu instructions in [15].

14.1.2 Calling Convention

Calling convention is a set of rules about function calling sequence a programmer willingly adheres to. If everyone is following the same rules, a smooth interoperability is guaranteed. However, once someone breaks the rules, for example, makes changes, and does not restore rbp in a certain function, anything can happen: nothing, a delayed crash, or an immediate one. The reason is that other functions are written with the implication that these rules are respected and they count on rbp being left untouched.

The calling conventions declare, among other things, the argument passing algorithm. In the case of the typical *nix x86 64 convention we are using (described fully in [24]), the description that follows is an accurate enough approximation of how the function is called.

First, the registers that need to be preserved are saved. All registers except for seven callee-saved registers (rbx, rbp, rsp, and r12-r15) can be changed by the called function, so if their value is of any importance, they should be stored (probably in a stack).
The registers and stack are populated with arguments.
The size of each argument gets rounded up to 8 bytes.
The arguments are split into three lists:
1. Integer or pointer arguments.
2. Floats and doubles.
3. Arguments passed in memory via stack (“memory”).
The first six arguments from the first list are passed in general purpose registers (rdi, rsi, rdx, rcx, r8, and r9). The first eight arguments from the second list are passed in registers xmm0 to xmm7. If there are more arguments from these lists to pass, they are passed on to the stack in reverse order. It means that the last argument will be on top of the stack before the call is performed.
While integers and floats are quite trivial to handle, structures are a bit trickier.
If a structure is bigger than 32 bytes, or has unaligned fields, it is passed in memory.
A smaller structure is decomposed in fields and each field is treated separately and, if in an inner structure, recursively. So, a structure of two elements can be passed the same way as two arguments. If one field of a structure is considered “memory,” it propagates to the structure itself.
The rbp register, as we will see, is used to address the arguments passed in memory and local variables.
What about return values? Integer and pointer values are returned in rax and rdx. Floating point values are returned in xmm0 and xmm1. Big structures are returned through a pointer, provided as an additional hidden argument, in the spirit of the following example:
```
struct s {
    char vals[100];
};

struct s f( int x ) {
    struct s mys;
    mys.vals[10] = 42;
    return mys;
}

void f( int x, struct s* ret ) {
    ret->vals[10] = 42;
}
```
Then the call instruction should be called. Its parameter is the address of the first instruction of a called function. It pushes the return address into the stack.
Each program can have multiple instances of the same function launched at the same time, not only in different threads but also due to recursion. Each such function instance is stored in the stack, because its main principle—“last in, first out”—corresponds to how functions are launched and terminated. If a function f is launched and then invokes a function g, g is terminated first (but was invoked last), and f is terminated last (while being invoked first).
Stack frame is a part of a stack dedicated to a single function instance. It stores the values of the local variables, temporal variables, and saved registers.
The function code is usually enclosed inside a pair of prologue and epilogue, which are similar for all functions. Prologue helps initialize the stack frame, and epilogue deinitializes it.
During the function execution, rbp stays unchanged and points to the beginning of its stack frame. It is possible to address local variables and stack arguments relatively to rbp. It is reflected in the function prologue shown in Listing 14-1.
Listing 14-1. prologue.asm
```
func:
push rbp
mov rbp, rsp

sub rsp, 24      ; given 24 is total size of local variables
```
The old rbp value is saved to be restored later in epilogue. Then a new rbp is set up to the current top of the stack (which stores the old rbp value now by the way). Then the memory for the local variables is allocated in the stack by subtracting their total size from rsp. This is the automatic memory allocation in C and the technique we have used in the very first assignment to allocate buffers on stack.
The functions end with an epilogue shown in Listing 14-2.
Listing 14-2. epilogue.asm
```
mov rsp, rbp
pop rbp
ret
```
By moving the stack frame the beginning address into rsp we can be sure that all memory allocated in the stack is deallocated. Then the old rbp value is restored, and now rbp points at the start of the previous stack frame. Finally, ret pops the return address from stack into rip.
A fully equivalent alternative form is sometimes chosen by the compiler. It is shown in Listing 14-3.
Listing 14-3. epilogue_alt.asm
```
Leave
ret
```
The leave instruction is made especially for stack frame destruction. Its counterpart, enter, is not always used by compilers because it is more functional than the instruction sequence shown in Listing 14-1. It is aimed at languages with inner functions support.
After leaving the function, our work is not always done. In case there were arguments that were passed in memory (stack), we have to get rid of them too.

14.1.3 Example: Simple Function and Its Stack

Let’s take a look at a simple function that calculates a maximum of two values. We are going to compile it without optimizations and see the assembly listing.

Listing 14-4 shows an example.

Listing 14-4. maximum.c

int maximum( int a, int b ) {
    char buffer[4096];
    if (a < b) return b;
    return a;
}

int main(void) {
    int x = maximum( 42, 999 );
    return 0;
}

Listing 14-5 shows the disassembly produced by objdump.

Listing 14-5. maximum.asm

00000000004004b6 <maximum>:
4004b6:       55                      push   rbp
4004b7:       48 89 e5                mov    rbp,rsp
4004ba:       48 81 ec 90 0f 00 00    sub    rsp,0xf90
4004c1:       89 bd fc ef ff ff       mov    DWORD PTR [rbp-0x1004],edi
4004c7:       89 b5 f8 ef ff ff       mov    DWORD PTR [rbp-0x1008],esi
4004cd:       8b 85 fc ef ff ff       mov    eax,DWORD PTR [rbp-0x1004]
4004d3:       3b 85 f8 ef ff ff       cmp    eax,DWORD PTR [rbp-0x1008]
4004d9:       7d 08                   jge    4004e3 <maximum+0x2d>
4004db:       8b 85 f8 ef ff ff       mov    eax,DWORD PTR [rbp-0x1008]
4004e1:       eb 06                   jmp    4004e9 <maximum+0x33>
4004e3:       8b 85 fc ef ff ff       mov    eax,DWORD PTR [rbp-0x1004]
4004e9:       c9                      leave
4004ea:       c3                      ret

00000000004004eb <main>:
4004eb:       55                      push   rbp
4004ec:       48 89 e5                mov    rbp,rsp
4004ef:       48 83 ec 10             sub    rsp,0x10
4004f3:       be e7 03 00 00          mov    esi,0x3e7
4004f8:       bf 2a 00 00 00          mov    edi,0x2a
4004fd:       e8 b4 ff ff ff          call   4004b6 <maximum>
400502:       89 45 fc                mov    DWORD PTR [rbp-0x4],eax

After a bit of cleaning, we get a pure and more readable assembly code, which is shown in Listing 14-6.

Listing 14-6. maximum_refined.asm

mov rsi, 999
mov rdi, 42
call maximum

...
maximum:
push rbp
mov rbp, rsp
sub rsp, 3984

mov [rbp-0x1004], edi
mov [rbp-0x1008], esi
mov eax, [rbp-0x1004]
...

Leave
ret

Register assignment

Refer to section 3.4.2 for the explanation about why changing esi means a change in the whole rsi.

We are going to trace the function call and its prologue (check Listing 14-6) and show the stack contents immediately after its execution.

call maximum

push rbp

mov rbp, rsp

sub rsp, 3984

14.1.4 Red Zone

The red zone is an area of 128 bytes that spans from rsp to lower addresses. It relaxes the rule “no data below rsp”; it is safe to allocate data there and it will not be overwritten by system calls or interrupts. We are speaking about direct memory writes relative to rsp without changing rsp. The function calls will, however, still overwrite the red zone.

The red zone was created to allow a specific optimization. If a function never calls other functions, it can omit stack frame creation (rbp changes). Local variables and arguments will then be addressed relative to rsp, not rbp.

The total size of local variables is less than 128 bytes.
A function is a leaf function (does not call other functions).
Function does not change rsp; otherwise it is impossible to address memory relative to it.

By moving rsp ahead you can still get more free space to allocate your data in, than 128 bytes in the stack. See also section 16.1.3.

14.1.5 Variable Number of Arguments

The calling convention that we are using supports the variable arguments count . It means that the function can accept an arbitrary number of arguments. It is possible because arguments passing (and cleaning the stack after the function termination) is the responsibility of the calling function.

The declaration of such functions contains a so-called ellipsis—three dots instead of the last argument. The typical function with variable number of arguments is our old friend printf.

void printf( char const* format, ... );

How does printf know the exact number of arguments? It knows for sure that at least one argument is passed (char const* format). By analyzing this string and counting the specifiers it will compute the total number of arguments as well as their types (in which registers they should be).

Note

In case of variable number of arguments, al should contain the number of xmm registers used by arguments.

As you see, there is absolutely no way to know how many arguments have been exactly passed. The function deduces it from the arguments that are certainly present (format in this case). If there are more format specifiers than arguments, printf will not know about it and will try to get the contents of the respective registers and memory naively.

Apparently, this functionality cannot be encoded in C by a programmer directly, because the registers cannot be accessed directly. However, there is a portable mechanism of declaring functions with variable argument count that is a part of the standard library. Each platform has its own implementation of this mechanism. It can be used after stdarg.h file is included and consists of the following:

va_list–a structure that stores information about arguments.
va_start–a macro that initializes va_list.
va_end–a macro that deinitializes va_list.
va_arg–a macro that takes a next argument from the argument list when given an instance of va_list and an argument type.

Listing 14-7 shows an example. The function printer accepts a number of arguments and an arbitrary number of them.

Listing 14-7. vararg.c

#include <stdarg.h>
#include <stdio.h>

void printer( unsigned long argcount, ... ) {
    va_list args;
    unsigned long i;
    va_start( args, argcount );
    for (i = 0; i < argcount; i++ )
        printf(" %d\n", va_arg(args, int )  );

    va_end( args );
}

int main () {
    printer(10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0 );
    return 0;
}

First, va_list is initialized with the name of the last argument before dots by va_start. Then, each call to va_arg gets the next argument. The second parameter is the name of the fresh argument’s type. In the end, va_list is deinitialized using va_end.

Since a type name becomes an argument and va_list is used by name, but is mutated, this example can look confusing.

Question 264

Can you imagine a situation in which a function, not a macro, accepts a variable by name (syntactically) and changes it? What should be the type of such variable?

14.1.6 vprintf and Friends

Functions such as printf, fprintf, etc., have special versions. Those accept va_list as their last arguments. Their names are prefixed with a letter v, for example,

int vprintf(const char *format, va_list ap);

They are being used inside custom functions which in their turn accept an arbitrary number of arguments.

Listing 14-8 shows an example.

Listing 14-8. vsprintf.c

#include <stdarg.h>
#include <stdio.h>

void logmsg( int client_id, const char* const str, ... ) {
    va_list args;
    char buffer[1024];
    char* bufptr = buffer;

va_start( args, str );

bufptr += sprintf(bufptr, "from client %d :", client_id );
vsprintf( bufptr, str, args );
fprintf( stderr, "%s", buffer );

va_end( args );
}

14.2 volatile

The volatile keyword affects greatly the way the compiler optimizes the code.

The model of computation for C is a von Neumann one. It does not support parallel program execution and the compiler usually tries to do as many optimizations as it can without changing the observable program behavior. It might include reordering of instructions and caching variables in registers. Reading a value from memory which is not written anywhere is omitted.

However, reading and writing in volatile variables always happen. The order of operations is also preserved.

The main use cases are as follows:

memory mapped IO, when the communication with external devices is performed by interacting with a certain dedicated memory region. Writing a character into video memory (which results in it displayed on screen) really means it.
Data sharing between threads. If memory is used to communicate with other threads, you do not want the writes or the reads to be optimized out.

Note that volatile alone is not enough to perform robust communication between threads.

Just like the const qualifier, in case of a pointer, volatile can be applied to the data it points to, as well as to the pointer itself. The rule is the same: volatile on the left of the asterisk relates to the data it points to, and on the right -- to the pointer itself.

14.2.1 Lazy Memory Allocation

Many operating systems map pages lazily, at the time of the first usage rather than right after mmap call (or its equivalent).

If the programmer wants no delays on the first-page usages, he might choose to address each page individually so that the operating system really creates it, as shown in Listing 14-9.

Listing 14-9. lma_bad.c

char* ptr;
for( ptr = start; ptr < start + size; ptr += pagesize )
*ptr;

However, this code has no observable effect from the point of view of the compiler, so it might be optimized away completely. However, when the pointer is marked volatile, this will not be the case. Listing 14-10 shows an example.

Listing 14-10. lma_good.c

volatile char* ptr;
for( ptr = start; ptr < start + size; ptr += pagesize )
*ptr;

Volatile pointers in the language standard

If the volatile pointer is pointing at the non-volatile memory, according to the standard there are no guarantees! They exist only when both the pointer and the memory are volatile. So, according to the standard, the example above is incorrect. However, as programmers are using the volatile pointers with exactly this reasoning, the most used compilers (MSVC, GCC, clang) do not optimize away the dereferencing of volatile pointers. There is no a standard-conforming way of doing this.

14.2.2 Generated Code

We are going to study the example shown in Listing 14-11.

Listing 14-11. volatile_ex.c

#include <stdio.h>

int main( int argc, char** argv ) {
    int ordinary = 0;
    volatile int vol = 4;
    ordinary++;
    vol++;
    printf( "%d\n", ordinary );
    printf( "%d\n", vol );
    return 0;
}

There are two variables: one is volatile, the other is not. Both are incremented and given to printf as arguments. GCC will generate the following code (with -O2 optimization level), shown in Listing 14-12:

Listing 14-12. volatile_ex.asm

; these are two arguments for `printf`
mov    esi,0x1
mov    edi,0x4005d4

; vol = 4
mov    DWORD PTR [rsp+0xc],0x4

; vol ++
mov    eax,DWORD PTR [rsp+0xc]
add    eax,0x1
mov    DWORD  PTR  [rsp+0xc],eax

xor    eax,eax

; printf( "%d\n", ordinary )
; the `ordinary` is not even created in stack frame
; its final precomputed value 1 was placed in `rsi` in the first line!
call   4003e0 <printf@plt>

; the second argument is taken from memory, it is volatile!
mov    esi,DWORD PTR [rsp+0xc]

; First argument is the address of "%d\n"
mov    edi,0x4005d4
xor    eax,eax

; printf( "%d\n", vol )
call   4003e0 <printf@plt>
xor    eax,eax

As we see, the contents of a volatile variable are really read and written each time it occurs in C. The ordinary variable will not even be created: the computations will be performed in compile time and the final result is stored in rsi, waiting to be used as the second argument of a call.

14.3 Non-Local jumps–setjmp

The standard C library contains machinery to perform a very tricky kind of hack. It allows storing a computation context and restoring it. The context describes the program execution state with the exception of the following:

Everything related to the external world (e.g., opened descriptors).
Floating point computations context.
Stack variables.

It allows saving context and jumping back to it in case we feel like we have to return. We are not limited by the same function scope.

Include the setjmp.h to gain access to the following machinery:

jmp_buf is a type of a variable which can store the context.
int setjmp(jmp_buf env) is a function that accepts a jmp_buf instance and stores the current context in it. By default it returns 0.
void longjmp(jmp_buf env, int val) is used to return to a saved context, stored in a certain variable of type jmp_buf.

When returning from the longjmp, setjmp returns not necessarily 0 but the value val fed to longjmp. Listing 14-13 shows an example. The first setjmp will return 0 by default and so will be the val value. However, the longjmp accepts 1 as its argument, and the program execution will continue from the setjmp call (because they are linked through the usage of the jb). This time setjmp will return 1 and this is the value that will be assigned to val.

Listing 14-13. longjmp.c

#include <stdio.h>
#include <setjmp.h>

int main(void) {
    jmp_buf jb;
    int val;
    val = setjmp( jb );
    puts("Hello!");
    if (val == 0) longjmp( jb, 1 );
    else puts("End");
    return 0;
}

Local variables that are not marked volatile will all hold undefined values after longjmp. This is the source of bugs as well as memory freeing related issues: it is hard to analyze the control flow in presence of longjmp and ensure that all dynamically allocated memory is freed.

In general, it is allowed to call setjmp as a part of a complex expression, but only in rare cases. In most cases, this is an undefined behavior. So, better not to do it.

It is important to remember that all this machinery is based on stack frames usage. It means that you cannot perform longjmp in a function with a deinitialized stack frame. For example, the code, shown in Listing 14-14, yields an undefined behavior for this very reason.

Listing 14-14. longjmp_ub.c

jmp_buf jb;
void f(void) {
    setjmp( jb );
}

void g(void) {
    f();
    longjmp(jb);
}

The function f has terminated already, but we are performing longjmp into it. The program behavior is undefined because we are trying to restore a context inside a destroyed stack frame.

In other words, you can only jump into the same function or into a function that is launched.

14.3.1 Volatile and setjmp

The compiler thinks that setjmp is just a function. However, this is not really so, because this is the point from which the program might start to execute again. In normal conditions, some local variables might have been cached in registers (or never allocated) before the call to setjmp. When we return to this point due to a longjmp call, they will not be restored.

Turning off optimizations changes this behavior. So optimizations turned off hide bugs related to setjmp usage.

To write correctly, remember that only volatile local variables are holding defined values after longjmp. They are not restored to their ancient values, because jmp_buf does not save stack variables but keeps the values from before longjmp.

Listing 14-15 shows an example.

Listing 14-15. setjmp_volatile.c

#include <stdio.h>
#include <setjmp.h>

jmp_buf buf;

int main( int argc, char** argv ) {
    int var = 0;
    volatile int b = 0;
    setjmp( buf );
    if (b < 3) {
        b++;
        var ++;
        printf( "\n\n%d\n", var );
        longjmp( buf, 1 );
    }

    return 0;
}

We are going to compile it without optimizations (gcc -O0, Listing 14-16) and with optimizations (gcc -O2, Listing 14-17).

Without optimizations,

Listing 14-16. volatile_setjmp_o0.asm

main:
push     rbp
mov      rbp,rsp
sub      rsp,0x20

; `argc` and `argv` are saved in stack to make `rdi` and `rsi` available
mov    DWORD PTR [rbp-0x14],edi
mov    QWORD PTR [rbp-0x20],rsi

; var = 0
mov    DWORD PTR [rbp-0x4],0x0

; b = 0
mov    DWORD PTR [rbp-0x8],0x0

; 0x600a40 is the address of `buf` (a global variable of type `jmp_buf`)
mov    edi,0x600a40
call   400470 <_setjmp@plt>

; if (b < 3), the good branch is executed
; This is encoded by skipping several instructions to the `.endlabel` if b > 2
mov    eax,DWORD PTR [rbp-0x8]
cmp    eax,0x2
jg     .endlabel

; A fair increment
; b++
mov    eax,DWORD PTR [rbp-0x8]
add    eax,0x1
mov    DWORD PTR [rbp-0x8],eax

; var++
add    DWORD PTR [rbp-0x4],0x1

; `printf` call
mov    eax,DWORD PTR [rbp-0x4]
mov    esi,eax
mov    edi,0x400684
; There are no floating point arguments, thus rax = 0
mov    eax,0x0
call   400450 <printf@plt>

; calling `longjmp`
mov    esi,0x1
mov    edi,0x600a40
call   400490 <longjmp@plt>

.endlabel:
Mov    eax,0x0                                                
Leave
ret

The program output will be

1
2
3

With optimizations,

Listing 14-17. volatile_setjmp_o2.asm

main:

; allocating memory in stack
sub    rsp,0x18

; a `setjmp` argument, the address of `buf`

mov    edi,0x600a40

; b = 0
mov    DWORD PTR [rsp+0xc],0x0
; instructions are placed in the order different
; from C statements to make better use of pipeline and other inner
; CPU mechanisms.
call   400470 <_setjmp@plt>

; `b` is read and checked in a fair way
mov    eax,DWORD PTR [rsp+0xc]
cmp    eax,0x2
jle    .branch

; return 0
xor    eax,eax
add    rsp,0x18
ret

.branch:

mov    eax,DWORD PTR [rsp+0xc]

; the second argument of `printf` is var + 1
; It was not even read from memory nor allocated.
; The computations were performed in compile time
mov    esi,0x1

; The first argument of `printf`
mov    edi,0x400674                                                

; b = b + 1
add    eax,0x1
mov    DWORD PTR [rsp+0xc],eax

xor    eax,eax
call   400450 <printf@plt>

; longjmp( buf, 1 )
mov    esi,0x1
mov    edi,0x600a40
call   400490 <longjmp@plt>

The program output will be

1
1
1

The volatile variable b, as you see, behaved as intended (otherwise, the cycle would have never ended). The variable var was always equal to 1, despite being “incremented” according to the program text.

Question 265

How do you implement “try–catch”-alike constructions using setjmp and longjmp?

14.4 inline

inline is a function qualifier introduced in C99. It mimics the behavior of its C++ counterpart.

Before you read an explanation, please, do not assume that this keyword is used to force function inlining!

Before C99, there was a static qualifier, which was often used in the following scenario:

The header file includes not the function declaration but the full function definition, marked as static.
The header is then included in multiple translation units. Each of them receives a copy of the emitted code, but as the corresponding symbol is object-local, the linker does not see it as a multiple definition conflict.

In a big project, this gives the compiler the access to the function source code, which enables it to really inline the function if needed. Obviously, the compiler might also decide that the function is better left not inlined. In this case we start getting the clones of this function pretty much everywhere. Each file is calling its own copy, which is bad for locality and bloats the memory image as well as the executable itself.

The inline keyword addresses this issue. Its correct usage is as follows:

Describe an inline function in a relevant header, for example,
```
inline int inc( int x ) { return x+1; }
```
In exactly one translation unit (i.e., a .c file), add the external declaration
```
extern inline int inc( int x ) ;
```

This file will contain the function code, which will be referenced by every other file, where the function was not inlined.

Semantics change

In GCC prior to 4.2.1 the inline keyword had a slightly other meaning. See the post [14] for an in-depth analysis .

14.5 restrict

restrict is a keyword akin to volatile and const which first appeared in the C99 standard. It is used to mark pointers and is thus placed to the right of the asterisk, as follows:

int x;
int* restrict p_x = &x;

If we create a restricted pointer to an object, we make a promise that all accesses to this object will pass through the value of this pointer. A compiler can either ignore this or make use of it for certain optimizations, which is often possible.

In other words, any write by another pointer will not affect the value stored by a restricted pointer.

Breaking this promise leads to subtle bugs and is a clear case of undefined behavior.

Without restrict, every pointer is a source of possible memory aliasing, when you can access the same memory cells by using different names for them. Consider a very simple example, shown in Listing 14-18. Is the body of f equal to *x += 2 * (*add);?

Listing 14-18. restrict_motiv.c

void f(int* x, int* add) {
    *x += *add;
    *x += *add;
}

The answer is, surprisingly, no, they are not equal. What if add and x are pointing to the same address? In this case, changing *x changes *add as well. So, in case x == add, the function will add *x to *x making it two times the initial value, and then repeat it making it four times the initial value. However, when x != add, even if *x == *add the final *x will be three times the initial value.

The compiler is well aware of it, and even with optimizations turned on it will not optimize away two reads, as shown in Listing 14-19.

Listing 14-19. restrict_motiv_dump.asm

0000000000000000 <f>:
0:   8b 06                     mov   eax,DWORD PTR [rsi]
2:   03 07                     add   eax,DWORD PTR [rdi]
4:   89 07                     mov   DWORD PTR [rdi],eax
6:   03 06                     add   eax,DWORD PTR [rsi]
8:   89 07                     mov   DWORD PTR [rdi],eax
a:   c3                        ret

However, add restrict, as shown in Listing 14-20, and the disassembly will demonstrate an improvement, as shown in Listing 14-21. The second argument is read exactly once, multiplied by 2, and added to the dereferenced first argument .

Listing 14-20. restrict_motiv1.c

void f(int* restrict x, int* restrict add) {
    *x += *add;
    *x += *add;
}

Listing 14-21. restrict_motiv_dump1.asm

0000000000000000 <f>:
   0:   8b 06                    mov   eax,DWORD PTR [rsi]
   2:   01 c0                    add   eax,eax
   4:   01 07                    add   DWORD PTR [rdi],eax
   6:   c3                       ret

Only use restrict if you are sure what you are doing. Writing a slightly ineffective program is much better than writing an incorrect one.

It is important to use restrict also to document code. For example, the signature for memcpy, a function that copies n bytes from some starting address s2 to a block starting with s1, has changed in C99:

void*
memcpy(void*       restrict s1,
       const void* restrict s2,

       size_t               n );

This reflects the fact that these two areas should not overlap; otherwise the correctness is not guaranteed.

Restricted pointers can be copied from one to another to create a hierarchy of pointers. However, the standard limits this by cases when the copy is not residing in the same block with the original pointer. Listing 14-22 shows an example.

Listing 14-22. restrict_hierarchy.c

struct s {
    int* x;
} inst;

void f(void) {
    struct s* restrict p_s = &inst;
    int* restrict p_x = p_s->x; /* Bad */
    {
        int* restrict p_x2 = p_s->x; /* Fine, other block scope */
    }
}

14.6 Strict Aliasing

Before restrict was introduced, programmers sometimes achieved the same effect by using different structure names. The compiler thinks that different data types imply that the respective pointers cannot point to the same data (which is known as the strict aliasing rule).

The assumptions include the following:

Pointers to different built-in types do not alias.
Pointers to structures or unions with different tags do not alias (so struct foo and struct bar are never used one for another).
Type aliases, created using typedef, can refer to the same data.
The type char* is exceptional (signed or not). The compiler always assumes that char* can alias other types, but not vice versa. It means that we can create a char buffer, use it to get data, and then alias it as an instance of some struct packet.

Breaking these rules can lead to subtle optimization bugs, because it triggers undefined behavior.

The example shown in Listing 14-18, can be rewritten to achieve the same effect without the restrict keyword. The idea is to use the strict aliasing rules to our benefit, packing both parameters into the structures with different tags.

Listing 14-23 shows the modified source.

Listing 14-23. restrict-hack.c

struct a {
    int v;
};
struct b {
    int v;
};

void f(struct a* x, struct b* add) {
    x->v += add->v;
    x->v += add->v;
}

To our satisfaction, the compiler optimizes the reads away just as we wanted. Listing 14-24 shows the disassembly.

Listing 14-24. restrict-hack- dump

0000000000000000 <f>:
   0:   8b 06                     mov   eax,DWORD PTR [rsi]
   2:   01 c0                     add   eax,eax
   4:   01 07                     add   DWORD PTR [rdi],eax
   6:   c3                        ret

We discourage using aliasing rules for optimization purposes in code for C99 and newer standards because restrict makes the intention more obvious and does not introduce unnecessary type names.

14.7 Security Issues

C was not created as a language to create robust software. It allows working with memory directly and has no means of controlling the correctness, neither static, like Rust, nor dynamic, like Java. We are going to review some classical security holes, which we now can explain in full detail.

14.7.1 Stack Buffer Overrun

Suppose that the program uses a function f with a local buffer, as shown in Listing 14-25.

Listing 14-25. buffer_overrun.c

#include <stdio.h>

void f( void ) {
    char buffer[16];
    gets( buffer );
}

int main( int argc, char** argv ) {
    f();
    return 0;
}

After being initialized, the layout of the stack frame will look as follows:

The gets function reads a line from stdin and places it in the buffer, whose address is accepted as an argument. Unfortunately, it does not control the buffer size at all and thus can surpass it.

If the line is too long, it will overwrite the buffer, then the saved rbp value, and then the return address. When the ret instruction is executed, the program will most probably crash. Even worse, if the attacker forms a clever line, it can rewrite the return address with specific bytes forming a valid address.

Should the attacker choose to redirect the return address directly into the buffer being overrun, he can transmit the executable code directly in this buffer. Such code is often called shellcode, because it is small and usually only opens a remote shell to work with.

Obviously, this is not only the flaw in gets but the feature of the language itself. The moral is never to use gets and always to provide a way to check the bounds of the target memory block.

14.7.2 return-to-libc

As we have already elaborated , the malevolent user can rewrite the return address if the program allows him to overrun the stack buffer. The return-to-libc attack is performed when the return address is the address of a function in the standard C library. One function is of a particular interest, int system(const char* command). This function allows you to execute an arbitrary shell command. What’s even worse, it will be executed with the same privileges as the attacked program.

When the current function terminates by executing the ret command, we will start executing the function from libc. It is yet a question, how do we form a valid argument for it?

In the presence of ASLR (address space layout randomization), doing this attack is nontrivial (but still possible).

14.7.3 Format Output Vulnerabilities

Format output functions can be a source of very nasty bugs. There are several such functions in standard library; Table 14-1 shows them.

Table 14-1. String Format Functions

Function	Description
printf	Outputs a formatted string.
fprintf	Writes the printf to a file.
sprintf	Prints into a string.
snprintf	Prints into a string checking the length.
vfprintf	Prints the va_arg structure to a file.
vprintf	Prints the va_arg structure to stdout.
vsprintf	Prints the va_arg to a string.
vsnprintf	Prints the va_arg to a string checking the length.

Listing 14-26 shows an example. Suppose that the user inputs less than 100 symbols. Can you crash this program or produce other interesting effects?

Listing 14-26. printf_vuln.c

#include <stdio.h>
int main(void) {
    char buffer[1024];
    gets(buffer);
    printf( buffer );
    return 0;

}

The vulnerability does not come from gets usage but from usage of the format string taken from the user. The user can provide a string that contains format specifiers, which will lead to an interesting behavior. We will mention several potentially unwanted types of behavior.

The "%x" specifiers and its likes can be used to view the stack contents. First 5 "%x" will take arguments from registers (rdi is already occupied with the format string address), then the following ones will show the stack contents. Let’s compile the example shown in Listing 14-26 and see its reaction on an input "%x %x %x %x %x %x %x %x %x %x %x".
```
> %x %x %x %x %x %x %x %x %x %x
b1b6701d b19467b0 fbad2088 b1b6701e 0 25207825 20782520 78252078 25207825
```
As we see, it actually gave us four numbers that share a certain informal similarity, a 0 and two more numbers. Our hypothesis is that the last two numbers are taken from the stack already.
Getting into gdb and exploring the memory near the stack top right after printf call we are going to get results that prove our point. Listing 14-27 shows the output.
Listing 14-27. gdb_printf
```
(gdb) x/10 $rsp
0x7fffffffdfe0: 0x25207825   0x78252078   0x20782520   0x25207825
0x7fffffffdff0: 0x78252078   0x20782520   0x25207825   0x00000078
0x7fffffffe000: 0x00000000   0x00000000
```
The "%s" format specifier is used to print strings. As a string is defined by the address of its start, this means addressing memory by a pointer. So, if no valid pointer is given, the invalid pointer will be dereferenced.
Question 266 What will be the result of launching the code shown in Listing 14-26 on input "%s %s %s %s %s"?
The "%n" format specifier is a bit exotic but still harmful. It allows one to write an integer into memory. The printf function accepts a pointer to an integer which will be rewritten with an amount of symbols written so far (before "%n" occurs). Listing 14-28 shows an example of its usage.
Listing 14-28. printf_n.c
```
#include <stdio.h>

int main(void) {
    int count;
    printf( "hello%n world\n", &count);
    printf( "%d\n", count );
    return 0;
}
```

This will output 5, because there were five symbols output before "%n". This is not a trivial string length because there can be other format specifiers before, which will result in an output of variable length (e.g., printing an integer can emit seven or ten symbols). Listing 14-29 shows an example.

Listing 14-29. printf_n_ex.c

int x;
printf("%d %n", 10, &x);  /* x = 3 */
printf("%d %n", 200, &x); /* x = 4 */

To avoid that, do not use the string accepted from the user as a format string. You can always write printf("%s", buffer), which is safe as long as the buffer is not NULL and is a valid null-terminated string. Do not forget about such functions as puts of fputs, which are not only faster but also safer.

14.8 Protection Mechanisms

Rewriting a return address can lead to one of the following two consequences:

The program abnormally terminates.
Attacker executes arbitrary code.

In the first case, we can fall victim to a DoS (Denial of Service) attack, when the program, providing a specific service, becomes unavailable. However, the second option is much worse.

14.8.1 Security Cookie

The security cookie (stack guard, canary) is supposed to protect us from arbitrary code execution by forcing abnormal program termination once the return address is changed.

The security cookie is a random value residing in the stack frame near the saved rbp and return address.

Overrunning the buffer will rewrite the security cookie. Before the ret instruction, the compiler emits a special check that verifies the integrity of the security cookie, and if it is changed, it crashes the program. The ret instruction does not get to be executed.

Both MSVC and GCC have this mechanism turned on by default.

14.8.2 Address Space Layout Randomization

Loading each program section to a random place in an address space makes it nearly impossible to guess a correct return address to perform an intelligent jump. Most commonly used operating systems support it; however, that feature should be enabled during the compilation. In this case, the information about ASLR support will be stored in the executable file itself, which will force the loader to perform a correct relocation.

14.8.3 DEP

We have already discussed Data Execution Prevention in Chapter 4 This technology protects some pages from executing instructions stored on these pages. To enable it, programs should be also compiled with support turned on.

The sad fact is that it does not work well with programs that use just-in-time compilation, which forms executable code during the program execution itself. This is not as rare as it might seem; for example, virtually all browsers are using JavaScript engines which support just-in-time compilation.

14.9 Summary

In this chapter we have revisited the calling convention used in *nix on Intel 64. We have seen the example usages of the more advanced C features, namely, volatile and restrict type qualifiers and non-local jumps. Finally, we have given a brief overview of several classical vulnerabilities that are possible because of the way stack frames are organized, and the compiler features that were designed to automatically cope with them. The next chapter will explain more low-level details related to the creation and usage of dynamic libraries to strengthen our understanding of them.

Question 267

What are xmm registers? How many are they?

Question 268

What are SIMD instructions?

Question 269

Why do some SSE instructions require the memory operands to be aligned?

Question 270

What registers are used to pass arguments to functions?

Question 271

When passing arguments to the function, why is rax sometimes used?

Question 272

How is rbp register used?

Question 273

What is a stack frame?

Question 274

Why aren’t we addressing the local variables relative to rsp?

Question 275

What are prologue and epilogue?

Question 276

What is the purpose of enter and leave instructions?

Question 277

Describe in details, how is the stack frame changing during the function execution.

Question 278

What is the red zone?

Question 279

How do we declare and use a function with a variable number of arguments?

Question 280

Which kind of context is va_list holding?

Question 281

Why are functions such as vfprintf used?

Question 282

What is the purpose of volatile variables?

Question 283

Why do only volatile stack variables persist after longjmp?

Question 284

Are all local variables allocated on stack?

Question 285

What is setjmp used for?

Question 286

What is the return value of setjmp?

Question 287

What is the use of restrict?

Question 288

Can restrict be ignored by the compiler?

Question 289

How can we achieve the same result without using the restrict keyword?

Question 290

Explain the mechanism of exploiting stack buffer overrun.

Question 291

When is the printf usage unsafe?

Question 292

What is a security cookie? Does it solve program crashes on buffer overflow?

Previous Chapter

3. Between C and Assembly

Next Chapter

15. Shared Objects and Code Models

Table of Contents for Low-Level Programming: C, Assembly, and Program Execution on Intel® 64 Architecture