Igor Zhirkov, Low-Level Programming, 10.1007/978-1-4842-2403-8_8

8. Basics

Igor Zhirkov¹

(1)Saint Petersburg, Russia

In this chapter we are going to start exploring another language called C. It is a low-level language with quite minimal abstractions over assembly. At the same time it is expressive enough so we could illustrate some very general concepts and ideas applicable to all programming languages (such as type system or polymorphism).

C provides almost no abstraction over memory, so the memory management task is the programmer’s responsibility. Unlike in higher-level languages, such as C# or Java, the programmer must allocate and free the reserved memory himself, instead of relying on an automated system of garbage collection.

C is a portable language, so if you write correctly, your code can often be executed on other architectures after a simple recompilation. The reason is that the model of computation in C is practically the same old von Neumann model, which makes it close to the programming models of most processors.

When learning C remember that despite the illusion of being a higher-level language, it does not tolerate errors, nor will the system be kind enough to always notify you about things in your program that were broken. An error can show itself much later, on another input, in a completely irrelevant part of the program.

Language standard described

The very important document about the language is the C language standard. You can acquire a PDF file of the standard draft online for free [7]. This document is just as important for us as the Intel Software Developer’s Manual [15].

8.1 Introduction

Before we start, we need to state several important points.

C is always case sensitive.
C does not care about spacing as long as the parser can separate lexemes from one another. The programs shown in Listing 8-1 and Listing 8-2 are equivalent.
Listing 8-1. spacing_1.c
```
int main      (int argc ,   char * * argv)
{
    return 0;
}
```
Listing 8-2. spacing_2.c
```
int main(int argc, char** argv)
{
    return 0;
}
```
There are different C language standards. We do not study the GNU C (a version possessing various extensions), which is supported mostly by GCC. Instead, we concentrate on C89 (also known as ANSI C or C90) and C99, which are supported by many different compilers. We will also mention several new features of C11, some of which are not mandatory to implement in compilers.
Unfortunately C89 still remains the most pervasive standard, so there are compilers that support C89 for virtually every existing platform. This is why we will focus on this specific revision first and then extend it with the newer features.
To force the compiler to use only those features supported by a certain standard we use the following set of flags:
- -std=c89 or -std=c99 to select either the C89 or C99 standard.
- -pedantic-errors to disable non-standard language extensions.
- -Wall to show all warnings no matter how important they are.
- -Werror to transform warnings into errors so you would not be able to compile code with warnings.

Warnings are errors

It is a very bad practice to ship code that does not compile without warnings. Warnings are emitted for a reason.

Sometimes there are very specific cases in which people are forced to do non-standard things, such as calling a function with more arguments than it accepts, but such cases are extremely rare. In these cases it is much better to turn off one specific warning type for one specific file via a corresponding compiler key. Sometimes compiler directives can make the compiler omit a certain warning for a selected code region, which is even better.

For example, to compile an executable file main from source files file1.c and file2.c you could use the following command:

> gcc -o main -ansi -pedantic-errors -Wall -Werror file1.c file2.c

This command will make a full compilation pass including object file generation and linking.

8.2 Program Structure

Any program in C consists of

Data types definitions (structures, new types, etc.) which are based on other existing types. For example, we can create a new name new_int_type_name_t for an integer type int.
```
typedef int new_int_type_name_t;
```
Global variables (declared outside functions). For example, we can create a global variable i_am_global of type int initialized to 42 outside all function scopes. Note that global variables can only be initialized with constant values.
```
int i_am_global = 42;
```
Functions. For example, a function named square, which accepts an argument x of type int and returns its square.
```
int square( int x ) { return x * x; }
```

Comments between /* and */.

/* this is a rather complex comment
which span over multiple lines */

Comments starting at // until the end of the line (in C99 and more recent).
```
int x; // this is a one line comment, which ends at the end of the line
```
Preprocessor and compiler directives. They often start with #.
```
#define CATS_COUNT 42
#define ADD(x, y) (x) + (y)
```

Inside functions, we can define variables or data types local to this function, or perform actions. Each action is a statement; these are usually separated by a semicolon. The actions are performed sequentially.

You cannot define functions inside other functions.

Statements will declare variables, perform computations and assignments, and execute different branches of code depending on conditions. A special case is a blo ck between curly braces {}, which is used to group statements.

Listing 8-3 shows an exemplary C program. It outputs Hello, world! y=42 x=43. It defines a function main, which declares two variables x and y, the first is equal to 43, and the second is computed as the value of x minus one. Then a call to function printf is performed.

The function printf is used to output strings into stdout . The string has some parts (so-called format specifiers) replaced by the following arguments. The format specifier, as its name suggests, provides information about the argument nature, which usually includes its size and a presence of sign. For now, we will use very few format specifiers.

%d for int arguments, as in the example.
%f for float arguments.

Variable declarations, assignment, and a function call all ended by semicolons are statements.

Spare printf for format output

Whenever possible, use puts instead of printf. This function can only output a single string (and ends it with a newline); no format specifiers are taken into account. Not only is it faster but it works uniformly with all strings and lacks security flaws described in section 14.7.3.

For now, we will always start our programs with line #include <stdio.h>. It allows us to access a part of standard C library. However, we state firmly that this is not a library import of any sort and should never be treated as one.

Listing 8-3. hello.c

/* This is a comment. The next line has a preprocessor directive */
#include <stdio.h>

/* `main` is the entry point for the program, like _start in assembly
 * Actually, the hidden function _start is calling `main`.
 * `main` returns the `return code` which is then given to the `exit` system
 * call.
 * The `void` keyword instead of argument list means that `main` accepts no

 * arguments */
int main(void) {
   /* A  variable local to `main`. Will be destructed as soon as `main` ends*/
   int x = 43;
   int y;
   y = x - 1;
   /* Calling a standard function `printf` with three arguments.
    * It will print 'Hello, world! y=42 x=43
    * All %d  will be replaced by the consecutive arguments */
   printf( "Hello, world! y=%d  x=%d\n", y, x);

   return 0;
}

Literal is a sequence of characters in the source code which represents an immediate value. In C, literals exist for

Integers, for example, 42.
Floating point numbers, for example, 42.0.
ASCII-code of characters, written in single quotes, for example, 'a'.
Pointers to null-terminated strings, for example, "abcde".

The execution of any C program is essentially a data manipulation.

The C abstract machine has a von Neumann architecture. It is done on purpose, because C is a language that should be as close to the hardware as possible. The variables are stored in the linear memory and each of them has a starting address.

You can think of variables like labels in assembly.

8.2.1 Data Types

As pretty much everything that happens is a manipulation on data, the nature of the said data is of a particular interest to us. All kinds of data in C has a type, which means that it falls into one of (usually) distinct categories. The typing in C is weak and static.

Static typing means that all types are known in compile time. There can be absolutely incertitude about data types. Whether you are using a variable, a literal, or a more complex expression, which evaluates to some data, its type will be known.

Weak typing means that sometimes a data element can be implicitly converted to another type when appropriate.

For example, when evaluating 1 + 3.0 it is apparent that these two numbers have different types. One of them is integer; the other is a real number. You cannot directly add one to another, because their binary representation differs. You need to convert them both to the same type (probably, floating point number). Only then will you be able to perform an addition. In strongly typed languages, such, as OCaml, this operation is not permitted; instead, there are two separate operations to add numbers: one acts on integers (and is written +), the other on real numbers (is written +. in OCaml).

Weak typing is in C for a reason: in assembly, it is absolutely possible to take virtually any data and interpret it as data of another type (pointer as an integer, part of the string as an integer, etc.)

Let’s see what happens when we try to output a floating point value as an integer (see Listing 8-4). The result will be the floating point value reinterpreted as an integer, which does not make much sense.

Listing 8-4. float_reinterpret.c

#include <stdio.h>

int main(void) {
    printf("42.0 as an integer %d  \n", 42.0);
    return 0;
}

This program’s output depends on the target architecture. In our case, the output was

42.0 as an integer -266654968

For this brief introductory section, we will consider that all types in C fall into one of these categories:

Integer numbers (int, char, …).
Floating point numbers (double and float).
Pointer types.
Composite types: structures and unions.
Enumerations.

In Chapter 9 we are going to explore the type system in more detail. If you come with a background in a higher-level language, you might find some commonly known items missing from this block. Unfortunately, there are no string and Boolean types in C89. An integer value equal to zero is considered false; any non-zero value is considered truth.

8.3 Control Flow

According to von Neumann principles , the program execution is sequential. Each statement is executed one after another. There are several statements to change control flow.

8.3.1 if

Listing 8-5 shows an if statement with an optional else part. If the condition is satisfied, the first block is executed. If the condition is not satisfied, the second block is executed, but the second block is not mandatory.

Listing 8-5. if_example.c

int x  =  100;
if (42) {
    puts("42 is not equal to zero and thus considered truth");
}

if (x > 3) {
    puts("X is greater than 3");
}
else
{
    puts("X is less than 3");
}

The braces are optional. Without braces, only one statement will be considered part of each branch, as shown in Listing 8-6.

Listing 8-6. if_no_braces.c

if (x == 0)
    puts("X is zero");
else
    puts("X is not zero");

Notice that there is a syntax fault, called dangling else. Check Listing 8-7 and see if you can certainly attribute the else branch to the first or the second if. To solve this disambiguation in case of nested ifs, use braces.

Listing 8-7. dangling_else.c

if (x == 0)   if (y == 0) { puts("A"); }  else { puts("B"); }

/* You might have considered one of the following interpretations.
 * The compiler can issue a warning to prevent you */

if (x == 0) {
    if (y == 0) { printf("A"); }
    else { puts("B"); }
}

if (x == 0) {
    if (y == 0) { puts("A"); }
} else { puts("B"); }

8.3.2 while

A while statement is used to make cycles.

Listing 8-8. while_example.c

int x = 10;
while ( x != 0 ) {
    puts("Hello");
    x = x - 1;
}

If the condition is satisfied, then the body is executed. Then the condition is checked once again, and if it is satisfied, then the body is executed again, and so on.

An alternative form do ... while ( condition ); allows you to check conditions after executing the loop body, thus guaranteeing at least one iteration. Listing 8-9 shows an example.

Notice that a body can be empty, as follows: while (x == 0);. The semicolon after the parentheses ends this statement.

Listing 8-9. do_while_example.c

int x = 10;
do {
    printf("Hello\n");     x = x - 1;
}                                                                                                  
while ( x != 0 );

8.3.3 for

A for statement is ideal to iterate over finite collections, such as linked lists or arrays. It has the following form: for ( initializer ; condition; step ) body. Listing 8-10 shows an example.

Listing 8-10. for_example.c

int a[] = {1, 2, 3, 4}; /* an array of 4 elements */
int i = 0;
for ( i = 0; i < 4; i++ ) {
    printf( "%d",  a[i])
}

First, the initializer is executed. Then there is a condition check, and if it holds, the loop body is executed, and then the step statement.

In this case, the step statement is an increment operator ++, which modifies a variable by increasing its value by one. After that, the loop begins again by checking the condition, and so on. Listing 8-11 shows two equivalent loops.

Listing 8-11. while_for_equiv.c

int i;

/* as a `while` loop */
i = 0;
while ( i < 10 ) {
    puts("Hello!");
    i = i + 1;
}

/* as a `for` loop */
for( i = 0; i < 10; i = i + 1 ) {
    puts("Hello!");
}

The break statement is used to end the cycle prematurely and fall to the next statement in the code. continue ends the current iteration and starts the next iteration right away. Listing 8-12 shows an example.

Listing 8-12. loop_cont.c

int n = 0;
for( n = 0; n < 20; n++ ) {
    if (n % 2) continue;
    printf("%d is odd", n );
}

Note also that in the for loop, the initializer, step, or condition expressions can be left empty. Listing 8-13 shows an example.

Listing 8-13. infinite_for.c

for( ; ; ) {
    /* this cycle will loop forever                                              , unless `break` is issued in its body */
    break; /* `break` is here, so we stop iterating */
}

8.3.4 goto

A goto statement allows you to make jumps to a label inside the same function. As in assembly, labels can mark any statement, and the syntax is the same: label: statement. This is often described a bad codestyle; however, it might be quite handy when encoding finite state machines . What you should not do is to abandon well-thought-out conditionals and loops for goto-spaghetti.

The goto statement is sometimes used as a way to break from several nested cycles. However, this is often a symptom of a bad design, because the inner loops can be abstracted away inside a function (thanks to the compiler optimizations, probably for no runtime cost at all). Listing 8-14 shows how to use goto to break out of all inner loops.

Listing 8-14. goto.c

int i;
int j;
for (i = 0; i < 100; i++ )
for( j = 0; j < 100; j++ ) {
    if (i * j == 432)
        goto end;
    else
        printf("%d * %d != 432\n", i, j );
}
end:

The goto statement mixed with the imperative style makes analyzing the program behavior harder for both humans and machines (compilers), so the cheesy optimizations the modern compilers are capable of become less likely, and the code becomes harder to maintain. We advocate restricting goto usage to the pieces of code that perform no assignments, like the implementations of finite state machines . This way you won’t have to trace all the possible program execution routes and how the values of certain variables change when the program executes one way or another.

8.3.5 switch

A switch statement is used like multiple nested if’s when the condition is some integer variable being equal to one or another value. Listing 8-15 shows an example.

Listing 8-15. case_example.c

int i = 10;
switch ( i ) {
    case 1: /* if i is equal to 1...*/
        puts( "It is one" );
        break; /* Break is mandatory */

    case 2: /* if i is equal to 2...*/
        puts( "It is two" );
        break;

    default: /* otherwise... */
        puts( "It is not one nor two" );
        break;
}

Every case is, in fact, a label. The cases are not limited by anything but an optional break statement to leave the switch block. It allows for some interesting hacks.¹ However, a forgotten break is usually a source of bugs. Listing 8-16 shows these two behaviors: first, several labels are attributed to the same case, meaning no matter whether x is 0, 1 or 10, the code executed will be the same. Then, as the break is not ending this case, after executing the first printf the control will fall to the next instruction labeled case 15, another printf.

Listing 8-16. case_magic.c

switch ( x ) {
    case 0:
    case 1:
    case 10:
        puts( "First case: x = 0, 1 or 10" );
        /* Notice the absence of `break`! */
    case 15:
        puts( "Second case: x = 0, 1, 10 or 15" );
        break;
}

8.3.6 Example: Divisor

Listing 8-17 showcases a program that searches for the first divisor, which is then printed to stdout. The function first_divisor accepts an argument n and searches for an integer r from 1 exclusive to n inclusive, such that n is a multiple of r. If r = n, we have obviously found a prime number .

Notice how the statement after for was not put between curly braces because it is the only statement inside the loop. The same happened with the if body, which consists of a sole return i. You can of course put it inside braces, and some programmers actually encourage it.

Listing 8-17. divisor.c

#include <stdio.h>

int first_divisor( int n ) {
    int i;
    if ( n == 1 ) return 1;
    for( i = 2; i <= n; i++ )
        if ( n %  i == 0 ) return i;
    return 0;
}

int main(void) {
    int i;
    for( i = 1; i < 11; i++ )
        printf( "%d \n", first_divisor( i ) );

    return 0;
}

8.3.7 Example: Is It a Fibonacci Number?

Listing 8-18 shows a program that checks whether a number is a Fibonacci number or not. The Fibonacci series is defined recursively as follows:

f ₁ = 1

f ₂ = 1

f _n = f _n−1 + f _n−2

This series has a large number of applications, notably in combinatorics. Fibonacci sequences appear even in biological settings, such as branching in trees, arrangement of the leaves on a stem, etc.

The first Fibonacci numbers are 1, 1, 2, 3, 5, 8, etc. As you see, each number is the sum of two previous numbers.

In order to check whether a given number n is contained in a Fibonacci sequence, we adopt a straightforward (not necessarily optimal) approach of calculating all sequence members prior to n. The nature of a Fibonacci sequence implies that it is ascending, so if we found a member greater than n and still have not enumerated n, we conclude, that n is not in the sequence. The function is_fib accepts an integer n and calculates all elements less or equal to n. If the last element of this sequence is n, then n is a Fibonacci number and it returns 1; otherwise, it returns 0.

Listing 8-18. is_fib.c

#include <stdio.h>

int is_fib( int n ) {

    int a = 1;
    int b = 1;
    if ( n == 1 ) return 1;

    while ( a <= n && b <= n ) {
        int t = b;

        if (n == a || n == b) return 1;
        b = a;
        a = t + a;
    }
    return 0;

}

void check(int n) { printf( "%d -> %d\n", n, is_fib( n ) ); }

int main(void) {
    int i;
    for( i = 1; i < 11; i = i + 1 ) {
        check( i );
    }
    return 0;
}

8.4 Statements and Expressions

The C language is based on notions of statements and expressions. Expressions correspond to data entities.

All literals and variable names are expressions. Additionally, complex expressions can be constructed using operations (+, -, and other logical, arithmetic, and bit operations) and function calls (with the exception of routines returning void). Listing 8-19 shows some exemplary expressions.

Listing 8-19. expr_example.c

1
13  +  37
17 + 89 * square( 1 )
x

Expressions are data, so they can be used at the right side of the assignment operator =. Some of the expressions can be also used at the left side of the assignment. They should correspond to data entities having an address in memory.²

Such expressions are called lvalue; all other expressions, which have no address, are called rvalue. This difference is actually very intuitive as long as you think in terms of abstract machine. Expressions such as shown in Listing 8-20 bear no meaning, because an assignment means memory change.

Listing 8-20. rvalue_example.c

4 = 2;
"abc"="bcd";
square(3)  =  9;

8.4.1 Statement Types

Statements are commands to the C abstract machine. Each command is an imperative: do something! Thus the name“imperative programming”: it is a sequence of commands.

There are three types of statements:

Expressions terminated by a semicolon.
```
1 + 3;
42;
square(3);
```
The purpose of these statements is the computation of the given expressions. If these invoke no assignments (directly as a part of the expression itself or inside one of invoked functions) or input/output operations, their impact on the program state is not observable.
A block delimited by { and }. It contains an arbitrary number of sentences. A block should not be ended by a semicolon itself (but the statements inside it likely should). Listing 8-21 shows a typical block.
Listing 8-21. block_example.c
```
int y = 1 + 3;
{
    int x;
    x = square( 2 ) + y;
    printf( "%d\n", x );
}
```
Control flow statements: if, while, for, switch. They do not require a semicolon.

We have already talked about assignments; the evil truth is that assignments are expressions themselves, which means that they can be chained. For example, a = b = c means

Assign c to b;
Assign the new b value to a.

A typical assignment is thus a statement from the first category: expression ended by a semicolon.

Assignment is a right-associative operation. It means that when being parsed by a compiler (or your eye) the parentheses are implicitly put from right to left, the rightmost part becoming the most deeply nested. Listing 8-22 provides an example of two equivalent ways to write a complex assignment.

Listing 8-22. assignment_assoc.c

x = y = z;
(x = (y = z));

On the other hand, the left-associative operations consider the opposite nesting order , as shown in Listing 8-23

Listing 8-23. div_assoc.c

40 / 2 / 4
((40 / 2) / 4)

8.4.2 Building Expressions

An expression is built using other expressions connected with operators and function calls. The operators can be classified

Based on arity (operand count)
- Unary (like unary minus: - expr)
- Binary (like binary multiplication: expr1 * expr2)
- Ternary. There is only one ternary operator: cond ? expr1 : expr2. If the condition holds, the value is equal to expr1, otherwise expr2
Based on meaning
- Arithmetic Operators: * / + - % ++ --
- Relational Operators: == != > < >= <=
- Logical Operators: ! && || << >>
- Bitwise Operators: ∼ ˆ & |
- Assignment Operators = += -= *= /= %= <<= >>= &= ˆ= |=
- Misc Operators:
  1. sizeof(var) as “replace this with the size of var in bytes”
  2. & as “take address of an operand”
  3. as “dereference this pointer”
  4. ?: which is the ternary operator we have spoken about before.
  5. ->, which is used to refer to a field of a structural or union type.
Most operators have an evident meaning. We will mention some of the less used and more obscure ones.
- The increment and decrement operators can be used in either prefix or postfix form: either for a variable i it is i++ or ++i. Both expressions will have an immediate effect on i, meaning it is incremented by 1. However, the value of i++ is the “old” i, while the value of ++i is the “new,” incremented i.
- There is a difference between logical and bit-wise operators. For logical operators, any non-zero number is essentially the same in its meaning, while the bit-wise operations are applied to each bit separately. For example, 2 & 4 is equal to zero, because no bits are set in both 2 and 4. However, 2 && 4 will return 1, because both 2 and 4 are non-zero numbers (truth values).
- Logical operators are evaluated in a lazy way. Consider the logical and operator &&. When applied to two expressions, the first expression will be computed. If its value is zero, the computation ends immediately, because of the nature of AND operation. If any of its operands is zero, the result of the big conjunction will be zero as well, so there is no need to evaluate it further. It is important for us because this behavior is noticeable. Listing 8-24 shows an example where the program will output F and will never execute the function g.
  Listing 8-24. logic_lazy.c
```
#include <stdio.h>

int f(void) { puts( "F" ); return 0; }
int g(void) { puts( "G" ); return 1; }

int main(void) {
    f() && g();
    return  0;
}
```
- Tilde (∼) is a bit-wise unary negation, hat (ˆ) is a bitwise binary xor .

In the following chapters we will revisit some of these, such as address manipulation operands and sizeof.

8.5 Functions

We can draw a line between procedures (which do not return a value) and functions (which return a value of a certain type). The procedure call cannot be embedded into a more complex expression, unlike the function call.

Listing 8-25 shows an exemplary procedure. Its name is myproc; it returns void, so it does not return anything. It accepts two integer parameters named a and b.

Listing 8-25. proc_example.c

void myproc ( int a, int b )
{
    printf("%d",  a+b);
}

Listing 8-26 shows an exemplary function. It accepts two arguments and returns a value of type int. A call to this function is used as a part of a more complex expression later.

Listing 8-26. function_example.c

int myfunc ( int a, int b )
{
    return a + b;
}

int other( int x ) {
    return 1 + myfunc( 4, 5 );
}

Every function’s execution is ended with return statement; otherwise which value it will return is undefined. Procedures can have the return keyword omitted; it might be still used without an operand to immediately return from the procedure.

When there are no arguments, a keyword void should be used in function declaration, as shown in Listing 8-27.

Listing 8-27. no_arguments_ex.c

int always_return_0( void ) { return 0; }

The body of function is a block statement , so it is enclosed in braces and is not ended with a semicolon. Each block defines a lexical scope for variables.

All variables should be declared in the block start, before any statements. That restriction is present in C89 but not in C99. We will adhere to it to make the code more portable.

Additionally, it forces a certain self-discipline. If you have a large amount of local variables declared at the scope start, it will look cluttered. At the same time it is usually sign of bad program decomposition and/or poor choice of data structures.

Listing 8-28 shows examples of good and bad variable declarations.

Listing 8-28. block_variables.c

/* Good */
void f(void) {
    int x;
    ...
}

/* Bad: `x` is declared after `printf` call */

void f(void) {
    int y = 12;
    printf( "%d", y);
    int x = 10;
    ...
}

/* Bad: `i` can not be declared in `for` initializer */
for( int i = 0; i < 10; i++ ) {
    ...
}

/* Good: `i` is declared before `for` */
int f(void) {
    int i;
    for( i = 0; i < 10; i++ ) {
        ...
    }
}

/* Good: any block can have additional variables declared in its beginning */
/* `x` is local to one `for` iteration and is always reinitialized to 10 */
for( i = 0; i < 10; i++ ) {
    int x = 10;
}

If a variable in a certain scope has the same name as the variable already declared in a higher scope , the more recent variable hides the ancient one. There is no way to address the hidden variable syntactically (by not storing its address somewhere and using the address).

The local variables in different functions can of course have the same names.

Note

The variables are visible until the end of their respective blocks. So a commonly used notion of ‘local‘ variables is in fact block-local, not function-local. The rule of thumb is: make variables as local as you can (including variables local to loop bodies, for example. It greatly reduces program complexity, especially in large projects.

8.6 Preprocessor

The C preprocessor is acting similar to the NASM preprocessor. Its power, though, is much more limited. The most important preprocessor directives you are going to see are

#define
#include
#ifndef
#endif

The #define directive is very similar to its NASM %define counterpart. It has three main usages.

Defining global constants (see Listing 8-29 for an example).
Listing 8-29. define_example1.c
```
#define MY_CONST_VALUE 42
```
Defining parameterized macro substitutions (as shown in Listing 8-30).
Listing 8-30. define_example2.c
```
#define MACRO_SQUARE( x ) ((x) * (x))
```
Defining flags; depending on which, some additional code can be included or excluded from sources.

It is important to enclose in parentheses all argument occurrences inside macro definitions. The reason behind it is that C macros are not syntactic, which means that the preprocessor is not aware of the code structure. Sometimes this results in an unexpected behavior, as shown in Listing 8-31. Listing 8-32 shows the preprocessed code.

Listing 8-31. define_parentheses.c

#define SQUARE( x ) (x * x)

int x = SQUARE( 4+1 )

As you see, the value of x will not be 25 but 4+(1∗4)+1 because of multiplication having a higher priority comparing to addition.

Listing 8-32. define_parentheses_preprocessed.c

int x = 4+1 * 4+1

The #include directive pastes the given file contents in place of itself. The file name is enclosed in either quotes (#include "file.h") or angle brackets (#include <stdio.h>).

In case of angle brackets, the file is searched in a set of predefined directories. For GCC it is usually:
- /usr/local/include
- <libdir>/gcc/target/version/include
  Here <libdir> stands for the directory that holds libraries (a GCC setting) and is usually /usr/lib or /usr/local/lib by default.
- /usr/target/include
- /usr/include
Using the -I key one can add directories to this list. You can make a special include/ directory in your project root and add it to the GCC include search list.
In case of quotes, the files are also searched in the current directory.

You can get the preprocessor output by evaluating a file filename.c in the same way as when working with NASM: gcc -E filename.c. This will execute all preprocessor directives and flush the results into stdout without doing anything.

8.7 Summary

In this chapter we have elaborated the C basics. All variables are labels in memory of the C language abstract machine, whose architecture greatly resembles the von Neumann architecture. After describing a universal program structure (functions, data types, global variables, . . . ), we have defined two syntactical categories: statements and expressions. We have seen that expressions are either lvalues or rvalues and learned to control the program execution using function calls and control statements such as if and while. We are already able to write simple programs which perform computations on integers. In the next chapter we are going to discuss the type system in C and the types in general to get a bigger picture of how types are used in different programming languages. Thanks to the notion of arrays our possible input and output data will become much more diverse.

Question 148

What is a literal?

Question 149

What are lvalue and rvalue?

Question 150

What is the difference between the statements and expressions?

Question 151

What is a block of statements?

Question 152

How do you define a preprocessor symbol?

Question 153

Why is break necessary at the end of each switch case?

Question 154

How are truth and false values encoded in C89?

Question 155

What is the first argument of printf function?

Question 156

Is printf checking the types of its arguments?

Question 157

Where can you declare variables in C89?

Footnotes

1 One of the most known hacks is called Duff’s device and incorporates a cycle which is defined inside a switch and contains several cases.

2 We are talking about abstract C machine memory here. Of course, the compiler has the right to optimize variables and never allocate real memory for them on the assembly level. The programmer, however, is not constrained by it and can think that every variable is an address of a memory cell.

Previous Chapter

2. The C Programming Language

Next Chapter

9. Type System

Table of Contents for Low-Level Programming: C, Assembly, and Program Execution on Intel® 64 Architecture