Igor Zhirkov, Low-Level Programming, 10.1007/978-1-4842-2403-8_9

9. Type System

Igor Zhirkov¹

(1)Saint Petersburg, Russia

The notion of type is one of the key ones. A type is essentially a tag assigned to a data entity. Every data transformation is defined for specific data types, which ensures their correctness (you would not want to add the amount of active Reddit users to the average temperature at noon in Sahara, because it makes no sense).

This chapter will study the C type system in depth.

9.1 Basic Type System of C

All types in C fall into one of these categories:

Predefined numeric types (int, char, float, etc.).
Arrays, multiple elements of the same type occupying consequent memory cells.
Pointers, which are essentially the cells storing other cells’ addresses. The pointer type encodes the type of cell it is pointing to. A particular case of pointers are function pointers.
Structures, which are packs of data of different types. For example, a structure can store an integer and a floating point number. Each of the data elements has its own name.
Enumerations, which are essentially integers, take one of explicitly defined values. Each of these values has a symbolic name to refer to.
Functional types.
Constant types, built on top of some other type and making the data immutable.
Type aliases for other types.

9.1.1 Numeric Types

The most basic C types are the numeric ones. They have different sizes and are either signed or unsigned. Because of a long and loosely controlled language evolution, their description may seem sometimes arcane and quite often very ad hoc. Following is a list of the basic types:

char
- Can be signed and unsigned. By default it is usually signed number, but it is not required by the language standard.
- Its size is always 1 byte;
- Despite the name making a direct reference to the word “character,” this is an integer type and should be treated as such. It is often used to store the ASCII code of a character, but it can be used to store any 1-byte number.
- A literal 'x' and corresponds to an ASCII code of the character “x.” Its type is int but it is safe to assign it to a variable of type char.¹
Listing 9-1 shows an example.
Listing 9-1. char_example.c
```
char number = 5;
char symbol_code = 'x';
char null_terminator = '\0';
```
int
- An integer number.
- Can be signed and unsigned. It is signed by default.
- It can be aliased simply as: signed, signed int (similar for unsigned).
- Can be short (2 bytes), long (4 bytes on 32-bit architectures, 8 bytes in Intel 64). Most compilers also support long long, but up to C99 it was not part of standard.
- Other aliases : short, short int, signed short, signed short int.
- The size of int without modifiers varies depending on architecture. It was designed to be equal to the machine word size. In the 16-bit era the int size was obviously 2 bytes, in 32-bit machines it is 4 bytes. Unfortunately, this did not prevent programmers from relying on an int of size 4 in the era of 32-bit computing. Because of the large pool of software that would break if we change the size of int, its size is left untouched and remains 4 bytes.
- It is important to note that all integer literals have the int format by default. If we add suffixes L or UL we will explicitly state that these numbers are of type long int or unsigned int. Sometimes it is of utter importance not to forget these suffixes.
  Consider an expression 1 << 48. Its value is not 2⁴⁸ as you might have thought, but 0. Why? The reason is that 1 is a literal of the type int, which occupies 4 bytes and thus can vary from −2³¹ to 2³¹ − 1. By shifting 1 to the left 48 times, we are moving the only bit set outside of integer format. Thus the result is zero. However, if we do add a correct suffix, the answer will be more evident. An expression 1L << 48 is evaluated to 2⁴⁸, because 1L is now 8 bytes long.
long long
- In x64 architecture it is the same as a long (except for Windows, where long is 4 bytes).
- Its size is 8 bytes.
- Its range is : −2⁶³ … 2⁶³ – 1 for signed and 0...2⁶⁴ –1 for unsigned.
float
- Floating point number.
- Its size is 4 bytes.
- Its range is : ±1, 17549 × 10⁻³⁸ … ± 3, 40282 × 10³⁸ (approximately six digits precision).
double
- Floating point number.
- Its size is 8 bytes.
- Its range is: ±2, 22507 × 10⁻³⁰⁸ … ± 1, 79769 × 10³⁰⁸ (approximately 15 digits precision).
long double
- Floating point number.
- Its size is usually 80 bits.
- It was only introduced in C99 standard.

Note

On floating point arithmetic

First of all, remember , that floating point types are a very rough approximation of the real numbers. For example, they are more precise near 0 and less precise for big values. This is exactly the reason their range is so great compared even to longs.

As a consequence, doing floating point arithmetic with values closer to zero yields more precise results.

Finally, in certain contexts (e.g., kernel programming) the floating point arithmetic is not available. As a rule of thumb, avoid it when you do not need it. For example, if your computations can be performed by manipulating a quotient and a remainder, calculated by using / and % operators, you should stick with them.

9.1.2 Type Casting

The language allows you to relatively freely convert data between types. To do it you have to write the new type name in parentheses before the expression you want to convert.

Listing 9-2 shows an example.

Listing 9-2. type_cast.c

int a = 4;

double b = 10.5 * (double)a; /* now a is a double */

int b = 129;
char k = (char)b; //???

Surely, this wonderful open world of possibilities is better controlled by your benevolent dictatorship because these implicit conversions often lead to subtle bugs when an expression is not evaluated to what it “should” be evaluated.

For example, as char is a (usually) signed number in range -128 . . . 127, the number 129 is too big to fit into this range. The result of an action, shown in Listing 9-2, is not described in the language standard, but given how typical processors and compilers function, the result will be probably a negative number, consisting of the same bits as an unsigned representation of 129.

Question 158

What will be the value of k? Try to compile and see in your own computer.

9.1.3 Boolean Type

We have already stated that the C89 lacks Booleans . However, C99 introduced Booleans as a type _Bool. If you include stdbool.h, you will have access to the values true / false and the type bool, which is an alias of _Bool. The reasoning behind this is simple. Many existing projects already have Boolean type defined for themselves, usually as bool. To prevent naming conflicts, the C99 type name for Booleans is _Bool. Including the file stdbool.h signifies that your code is free from any custom bool definition, and you are picking the one conforming to the standard, but with a more humane name. We encourage you to use the aliased type bool whenever possible. In the future, the _Bool type name will be probably declared deprecated, and after several standard versions it will not be used anymore.

9.1.4 Implicit Conversions

As a weakly typed language , C allows one to omit casts sometimes even when using data of different type than intended.

When the required numeric type is not equal as the actual type, an implicit conversion is performed, which is called integer promotion. If the type is lesser than an int, it gets promoted to signed int or unsigned int, depending on its initial signed or unsigned nature.² Then if they are still different, we climb up the ladder, shown in Figure 9-1

Figure 9-1. Integer conversions

Note

Remember that long long and long double have appeared only in C99. They are, however, supported as a language extension by many compilers that do not support C99 yet.

The “convert to int first” rule means that the overflows in lesser types can be handled differently than in int type itself. The example shown in Listing 9-3 assumes that sizeof(int) == 4.

Listing 9-3. int_promotion_pitfall.c

/* The lesser types */
unsigned char  x = 100, y = 100, z = 100;
unsigned char r = x + y + z; /* will give you 300 % 256 = 44 */

unsigned int r_int = x + y + z; /* equals to 300, because the promotion to
                                   integers is performed first */

/* Now with the greater types */

unsigned int x = 1e9, y = 2e9, z = 3e9;

unsigned int r_int = x + y + z;   /* 1705032704 equals 6000000000 % (2ˆ32) */

unsigned long r_long = x + y + z;   /*  the same result: 1705032704 */

In the last line, neither x, y, nor z is promoted to long, because it is not required by standard. The arithmetic will be performed within the int type and then the result will be converted to long.

Be understood

As a rule of thumb, when uncertain, always provide the types explicitly! For example, you can write long x = (long)a + (long)b + (long)c.

While the code might seem more verbose after that, it will at least work as intended.

Let’s look at an example shown in Listing 9-4. The expression in the third line will be computed as follows:

The value of i will be converted to float (of course, the variable itself will not change);
This value is added to the value of f, the resulting type is float again; and
This result is converted to double to be stored in d.

Listing 9-4. int_float_conv.c

int i;
float f;
double d = f + i;

All these operations are not free and are encoded as assembly instructions. It means that whenever you are acting on numbers of different formats, it probably has runtime costs. Try to avoid it especially in cycles.

9.1.5 Pointers

Given a type T, one can always construct a type T*. This new type corresponds to data units which hold address of another entity of type T.

As all addresses have the same size, all pointer types have the same size as well. It is specific for architecture and, in our case, is 8 bytes wide.

Using operands & and * one can take an address of a variable or dereference a pointer (look into the memory by the address this pointer stores). Listing 9-5 shows an example.

In section 2.5.4 we discussed a subtle problem: if a pointer is just an address, how do we know, the size of a data entity we are trying to read starting from this address? In assembly, it was straightforward: either the size could have been deduced based on the fact that two mov operands should have the same size or the size should have been explicitly given, for example, mov qword [rax], 0xABCDE. Here the type system takes care of it: if a pointer is of a type int*, we surely know that dereferencing it produces a value of size sizeof(int).

Listing 9-5. ptr_deref.c

int x = 10;
int* px = &x; /* Took address of `x` and assigned it to `px` */

*px = 42; /* We modified `x` here! */
printf( "*px = %d\n", *px ); /* outputs: '*px = 42' */
printf( "x = %d\n", x ); /* outputs: 'x = 42' */

When you program in C, pointers are your bread and butter. As long as you do not introduce a pointer to non-existing data, the pointers will serve you right.

A special pointer value is 0. When used in pointer context (specifically, comparison with 0), 0 signifies “a special value for a pointer to nowhere.” In place of 0 you can also write NULL, and you are advised to do so. It is a common practice to assign NULL to the pointers which are not yet initialized with a valid object address, or return NULL from functions returning an address of something to make the caller aware of an error.

Is zero a zero?

There are two contexts in which you might use the 0 expression in C. The first context expects just a normal integer number. The second one is a pointer context, when you assign a pointer to 0 or compare it with 0. In the second context 0 does not always mean an integer value with all bits cleared, but will always be equal to this “invalid pointer” value. In some architectures it can be, for example, a value with all bits set. But this code will work no matter the architecture because of this rule:

int* px = ... ;

if ( px ) /* if `px` is not NULL */

if ( px == 0 ) /* same thing as the following: */
if (!px ) /* if `px` is NULL */

There is a special kind of pointer type: void*. This is the pointer to any kind of data. C allows us to assign any type of pointer to a variable of type void*; however, this variable cannot be dereferenced. Before we do it, we need to take its value and convert to a legit pointer type (e.g., int*). A simple cast is used to do it (see section 9.1.2). Listing 9-6 shows an example.

Listing 9-6. void_deref.c

int a = 10;
void* pa = &a;

printf("%d\n", *( (int*) pa) );

You can also pass a pointer of type void* to any function that accepts a pointer to some other type. Pointers have many purposes, and we are going to list a couple of them.

Changing a variable created outside a function.
Creating and navigating complex data structures (e.g., linked lists).
Calling functions by pointers means that by changing pointer we switch between different functions being called. This allows for pretty elegant architectural solutions.

Pointers are closely tied with arrays, which are discussed in the next section.

9.1.6 Arrays

In C, an array is a data structure that holds a fixed amount of data of the same type. So, to work with an array we need to know its start, size of a single element and the amount of elements that it can store. Refer to Listing 9-7 to see several variations of array declaration.

Listing 9-7. array_decl.c

/* This array's size is computed by compiler */
int arr[] = {1,2,3,4,5};

/* This array is initialized with zeros, its size is 256 bytes */
long array[32] = {0};

As the amount of elements should be fixed, it cannot be read from a variable.³To allocate memory for such arrays whose dimensions we do not know in advance, memory allocators are used (which are even not always at your disposal, for example, when programming kernels). We will learn to use the standard C memory allocator (malloc / free) and will even write our own.

You can address elements by index. Indices start from 0. The origins of this solution is in the nature of address space. The zero-th element is located at an array’s starting address plus 0 times the element size.

Listing 9-8 shows an array declaration , two reads and one write.

Listing 9-8. array_example_rw.c

int myarray[1024];
int y = myarray[64];

int first = myarray[0];

myarray[10] = 42;

If we think for a bit about the C abstract machine, the arrays are just continuous memory regions holding the data of the same type. There is no information about type itself or about the array length. It is fully a programmer’s responsibility to never address an element outside an allocated array.

Whenever you write the allocated array’s name, you are actually referring to its address. You can think about it as a constant pointer value. Here is the place where the analogy between assembly labels and variables is the strongest. So, in Listing 9-8, an expression myarray has actually a type int*, because it is a pointer to the first array element!

It also means that an expression *myarray will be evaluated to its first element, just as myarray[0].

9.1.7 Arrays as Function Arguments

Let’s talk about functions accepting arrays as arguments. Listing 9-9 shows a function returning a first array element (or -1 if the array is empty).

Listing 9-9. fun_array1.c

int first (int array[], size_t sz ) {
    if ( sz == 0 ) return -1;
    return array[0];
}

Unsurprisingly, the same function can be rewritten keeping the same behavior, as shown in Listing 9-10.

Listing 9-10. fun_array2.c

int first (int* array, size_t sz ) {
    if ( sz == 0 ) return -1;
    return *array;
}

But that’s not all. You can actually mix these and use the indexing notation with pointers, as shown in Listing 9-11.

Listing 9-11. fun_array3.c

int first (int* array, size_t sz ) {
    if ( sz == 0 ) return -1;
    return array[0];
}

The compiler immediately demotes constructions such as int array[] in the arguments list to a pointer int* array, and then works with it as such. Syntactically, however, you can still specify the array length, as shown in Listing 9-12. This number indicates that the given array should have at least that many elements. However, the compiler treats it as a commentary and performs no runtime or compile-time checks.

Listing 9-12. array_param_size.c

int first( int array[10], size_t sz ) { ... }

C99 introduced a special syntax, which corresponds essentially to your promise given to a compiler, that the corresponding array will have at least that many elements. It allows the compiler to perform some specific optimizations based on this assumption. Listing 9-13 shows an example.

Listing 9-13. array_param_size_static.c

int fun(int array[static 10] ) {...}

9.1.8 Designated Initializers in Arrays

C99 introduces an interesting way to initialize the arrays. It is possible to implicitly initialize an array to default values except for those on several designated positions, for which other values are provided. For example, to initialize an array of eight int elements to all zeros, except for the indices 1 and 5 which will hold values 15 and 29, respectively, the following code might be used:

int a[8] = { [5] = 29, [1] = 15 };

The initialization order is irrelevant. It is often useful to use enum values or character values as indices. Listing 9-14 shows an example.

Listing 9-14. designated_initializers_arrays.c

int whitespace[256] = {
    [' ' ] = 1,
    ['\t'] = 1,
    ['\f'] = 1,
    ['\n'] = 1,
    ['\r'] = 1 };

enum colors {
    RED,
    GREEN,
    BLUE,
    MAGENTA,
    YELLOW
};

int good[5] = { [ RED ] = 1, [ MAGENTA ] = 1 };

9.1.9 Type Aliases

You can define your own types using existing types via the typedef keyword.

The code shown in Listing 9-15 is creating a new type mytype_t. It is absolutely equivalent to unsigned short int except for its name. These two types become fully interchangeable (unless later someone changes the typedef).

Listing 9-15. typedef_example.c

typedef unsigned short int mytype_t;

You can see the suffix _t in type names quite often. All names ending with _t are reserved by POSIX standard.⁴

This way newer standards will be able to introduce new types without the fear of colliding with types in existing projects. So, using these type names is discouraged. We will speak about practical naming conventions later.

What are these new types for?

Sometimes they improve the ease of reading code.
They may enhance portability, because to change the format of all variables of your custom type you should only change the typedef.
Types are essentially another way of documenting program.
Type aliases are extremely useful when dealing with function pointer types because of their cumbersome syntax.

A very important example of a type alias is size_t. This is a type defined in the language standard (it requires including one of the standard library headers, for example, #include <stddef.h>). Its purpose is to hold array lengths and array indices. It is usually an alias for unsigned long; thus, in Intel 64 it typically is an unsigned 8-byte integer.

Never use int for array indices

Unless you are dealing with a poorly designed library which forces you to use int as an index, always favor size_t.

Always use types appropriately. Most standard library functions that deal with sizes return a value of type size_t (even the sizeof() operator returns size_t!). Let’s take a look at the example shown in Listing 9-16. An expression s of type size_t could have been obtained from one of library calls such as strlen. There are several problems that arise because of int usage:

int is 4 bytes long and signed, so its maximal value is 2³¹ − 1. What if i is used as an array index? It is more than possible to create a bigger array on modern systems, so all elements may not be indexed. The standard says that arrays are limited in size by an amount of elements encodable using a size_t variable (unsigned 64-bit integer).
Every iteration is only performed if the current i value is less than s. Thus a comparison is needed, but these two variables have a different format! Because of it, a special number conversion code will be executed by each iteration, which can be quite significant for small loops with a lot of iterations.
When dealing with bit arrays (not so uncommon) a programmer is likely to compute i/8 for a byte offset in a byte array and i%8 to see which specific bit we are referring to. These operations can be optimized into shifts instead of actual division, but only for unsigned integers. The performance difference between shifts and “fair” division is radical.

Listing 9-16. size_int_difference.c

size_t s;
int i;
...
for( i = 0; i < s; i++ ) {
    ...
}

9.1.10 The Main Function Revisited

We are already used to writing the main function , which serves as an entry point, as a parameterless function. However, it should in fact accept two parameters: the command-line argument count and an array of arguments themselves. What are command-line arguments? Well, every time you launch a program (like ls) you might specify additional arguments, for example, ls -l -a. The ls application will be launched and it will have access to these arguments in its main function. In this case

argv will contain three pointers to char sequences:
```
INDEX STRING

  0   "ls"
  1   "-l"
  2   "-a"
```
The shell will split the whole calling string into pieces by spaces, tabs, and newline symbols and the loader and C standard library will ensure that main gets this information.
argc will be equal to 3 as it is a number of elements in argv.

Listing 9-17 shows an example. This program prints all given arguments, each in a separate line.

Listing 9-17. main_revisited.c

#include <stdio.h>

int main( int argc, char* argv[] ) {
    int i;
    for( i = 0; i < argc; i++ )
        puts( argv[i] );
    return 0;
}

9.1.11 Operator sizeof

We already mentioned the operator sizeof in section 8.4.2. It returns a value of type size_t which holds the operand size in bytes. For example, sizeof(long) will return 8 on x64 computers.

sizeof is not a function because it has to be computed in compile time.

sizeof has an interesting usage: you can compute the total size of an array but only if the argument is in this exact array. Listing 9-18 shows an example.

Listing 9-18. sizeof_array.c

#include <stdio.h>

long array[] = { 1, 2, 3 };

int main(void) {
    printf( "%zu \n", sizeof( array    ) ); /* output: 24 */
    printf( "%zu \n", sizeof( array[0] ) ); /* output: 8 */
    return 0;
}

Notice, how you cannot use sizeof to get the size of an array accepted by a function as an argument. Listing 9-19 shows an example. This program will output 8 in our architecture

Listing 9-19. sizeof_array_fun.c

#include <stdio.h>
const int arr[] = {1, 2, 3, 4};
void f(int const arr[]) {
    printf("%zu\n", sizeof( arr ) );
}
int main( void ) {
    f(arr);
    return 0;
}

Which format specifier?

Starting at C99 you can use a format specifier %zu for size_t. In earlier versions you should use %lu which stands for unsigned long.

Question 159

Create sample programs to study the values of these expressions:

sizeof(void)
sizeof(0)
sizeof('x')
sizeof("hello")

Question 160

What will be the value of x?

int x = 10;                    
size_t t = sizeof(x=90);

Question 161

How do you compute how many elements an array stores using sizeof?

9.1.12 Const Types

For every type T we can also use a type T const (or, equivalently, const T). Variables of such type cannot be changed directly, so they are immutable. It means that such data should be initialized simultaneously with a declaration. Listing 9-20 shows an example of initializing and working with constant variables.

Listing 9-20. const_def.c

int a;
a = 42 ;      /* ok */

...

const int a; /* compilation error                 */

...

const int a = 42; /* ok */
a = 99;  /* compilation error, should not change constant value */

int const a = 42;  /* ok */
const int b = 99;  /* ok, const int === int const */

It is interesting to note how the const modifier interacts with the asterisk * modifier. The type is read from right to left and so the const modifiers as well as the asterisk are applied in this order. Following are the options:

int const* x means “a mutable pointer to an immutable int.” Thus, *x = 10 is not allowed, but modifying x itself is allowed.

An alternate syntax is const int* x.

int* const x = &y; means “an immutable pointer to a mutable int y.” In other words, x will never be pointing at anything but y.
A superposition of the two cases: int const* const x = &y; is “an immutable pointer to an immutable int y.”

Simple rule

The const modifier on the left of the asterisk protects the data we point at; the const modifier on the right protects the pointer itself.

Making a variable constant is not foolproof. There is still a way to modify it. Let’s demonstrate it for a variable const int x (see Listing 9-21).

Take a pointer to it. It will have type const int*.
Cast this pointer to int*.
Dereference this new pointer . Now you can assign a new value to x.

Listing 9-21. const_cast.c

#include <stdio.h>

int main(void) {
    const int x = 10;
    *( (int*)&x ) = 30;

    printf( "%d\n", x );
    return 0;
}

This technique is strongly discouraged but you might need it when dealing with poorly designed legacy code. const modifiers are made for a reason, and if your code does not compile it, it is by no means a justification for such hacks.

Note that you cannot assign a int const* pointer to int* (this is true for all types). The first pointer guarantees that its contents will never be changed, while the second one does not. Listing 9-22 shows an example.

Listing 9-22. const_discard.c

int x;
int y;

int const* px = &x;
int * py = &y;

py = px; /* Error, const qualifier is discarded */
px = py; /* OK  */

Should I use const at all? It is cumbersome.

Absolutely. In large projects it can save you a lifetime of debugging. I myself recall several very subtle bugs that were caught by the compiler and resulted in compilation error. Without the variables being protected by const, the compiler would have accepted the program which would have resulted in the wrong behavior.

Additionally, the compiler may use this information to perform useful optimizations.

9.1.13 Strings

In C, strings are null-terminated. A single character is represented by its ASCII code of type char. A string is defined by a pointer to its start, which means that the equivalent of a string type would be char*. Strings can also be thought of as character arrays, whose last element is always equal to zero.

The type of string literals is char*. Modifying them, however, while being syntactically possible (e.g., "hello"[1] = 32), yields an undefined result. It is one of the cases of undefined behavior in C. This usually results in a runtime error, which we will explain in the next chapter.

When two string literals are written one after another, they are concatenated (even if they are separated with line breaks). Listing 9-23 shows an example.

Listing 9-23. string_literal_breaks.c

char const* hello = "Hel" "lo"
"world!";

Note

The C++ language (unlike C) forces the string literal type to char const*, so if you want your code to be portable, consider it. Additionally, it forces the immutability of the strings (which is what you will often want) on the syntax level. So whenever you can, assign string literals to const char* variables.

9.1.14 Functional Types

A rather obscure part of C are the functional types. Unlike most types, they cannot be instantiated as variables, but in a way functions themselves are literals of these types. However, you can declare function arguments of functional types, which will be automatically converted to function pointers.

Listing 9-24 shows an example of a function argument f of a functional type.

Listing 9-24. fun_type_example.c

#include <stdio.h>

double g( int number ) { return 0.5 + number; }

double apply( double (f)(int), int x ) {
    return f( x ) ;
}

int main( void ) {
    printf( "%f\n",  apply( g, 10 ) );
    return 0;
}

The syntax , as you see, is quite particular. The type declaration is mixed with the argument name itself, so the general pattern is:

return_type (pointer_name) ( arg1, arg2, ... )

You see an equivalent program in Listing 9-25.

Listing 9-25. fun_type_example_alt.c

#include <stdio.h>

double g( int number ) { return 0.5 + number; }

double apply( double (*f)(int), int x ) {
    return f( x ) ;
}

int main( void ) {
    printf( "%f\n",  apply( g, 10 ) );
    return 0;
}

What are these types useful for? As the function pointer types are rather difficult to write and read, they are often hidden in a typedef. The bad (but very common) practice is to add an asterisk inside the type alias declaration. Listing 9-26 shows an example where a type to a procedure returning nothing is created.

Listing 9-26. typedef_bad_fun_ptr.c

Typedef  void(*proc)(void);

In this case you can write directly proc my_pointer = &some_proc. However, this hides an information about proc being a pointer: you can deduce it but you do not see it right away, which is bad. The nature of the C language is, of course, to abstract things as much as you can, but pointers are such a fundamental concept and so pervasive in C that you should not abstract them, especially in the presence of weak typing .

So, a better solution would be to write down what is shown in Listing 9-27.

Listing 9-27. typedef_good_fun_ptr.c

typedef void(proc)(void);

...

proc*  my_ptr  =  &some_proc;

Additionally, these types can be used to write function declarations. Listing 9-28 shows an example.

Listing 9-28. fun_types_decl.c

typedef double (proc)(int);

/* declaration */
proc myproc;

/* ... */

/* definition */
double myproc( int x ) { return 42.0 + x; }

9.1.15 Coding Well

9.1.15.1 General Considerations

In this book we are going to provide several assignments to be written in C. But first we want to state several rules that you should follow, not only here and now but virtually every time you are writing a program.

Always separate program logic from input and output operations. This will allow for a better code reuse. If a function performs actions on data and outputs messages at the same time, you won’t be able to reuse its logic in another situation (e.g., it can output messages to an application with a graphical user interface, and in another case you might want to use it on a remote server).
Always comment your code in plain English.
Name your variables based on their meaning for the program. It is very hard to deduce what variables with meaningless names like aaa mean.
Remember to put const wherever you can.
Use appropriate types for indexing.

9.1.15.2 Example: Array Summation

This section is an absolute must read if you are a beginner with C and even more so if you are a self-taught programmer.

We are going to write a simple program in “beginner style,” see what’s wrong with it, and modify it appropriately to make it better.

Here is the task: implement an array summation functionality. As simple as it is, there is a huge difference between a solution written by a beginner or one written by a more experienced programmer.

The beginner will come up with a program similar to the one shown in Listing 9-29.

Listing 9-29. beg1.c

#include <stdio.h>
int array[] = {1,2,3,4,5};

int main( int argc, char** argv ) {
    int i;
    int sum;
    for( i = 0; i < 5; i++ )
        sum = sum + array[i];
    printf("The sum is: %d\n", sum );
    return 0;
}

Before we start polishing the code, we can immediately spot a bug : the starting value of sum is not defined and can be random. Local variables in C are not initialized by default, so you have to do it by hand. Check Listing 9-30.

Listing 9-30. beg2.c

#include <stdio.h>
int array[] = {1,2,3,4,5};

int main( int argc, char** argv ) {
    int i;
    int sum = 0;
    for( i = 0; i < 5; i++ )
        sum = sum + array[i];
    printf("The sum is: %d\n", sum );
    return 0;
}

First of all, this code is totally not reusable. Let’s extract a piece of logic into an array_sum procedure, shown in Listing 9-31.

Listing 9-31. beg3.c

#include <stdio.h>
int array[] = {1,2,3,4,5};

void array_sum( void ) {
    int i;
    int sum = 0;
    for( i = 0; i < 5; i++ )
        sum = sum + array[i];
    printf("The sum is: %d\n", sum );

}

int main( int argc, char** argv ) {
    array_sum();
    return 0;
}

What is this magic number 5? Every time we change an array we have to change this number as well, so we probably want to calculate it dynamically , as shown in Listing 9-32.

Listing 9-32. beg4.c

#include <stdio.h>
int array[] = {1,2,3,4,5};

void array_sum( void ) {
    int i;
    int sum = 0;
    for( i = 0; i < sizeof(array) / 4; i++ )
        sum = sum + array[i];
    printf("The sum is: %d\n", sum );

}

int main( int argc, char** argv ) {
    array_sum();
    return 0;
}

But why are we dividing the array size by 4? The size of int varies depending on the architecture, so we have to calculate it too (in compile time) as shown in Listing 9-33.

Listing 9-33. beg5.c

#include <stdio.h>
int array[] = {1,2,3,4,5};

void array_sum( void ) {
    int i;
    int sum = 0;
    for( i = 0; i < sizeof(array) / sizeof(int); i++ )
        sum = sum + array[i];
    printf("The sum is: %d\n", sum );
}

int main( int argc, char** argv ) {
    array_sum();
    return 0;
}

We immediately face a problem: sizeof returns a number of type size_t, not int. So, we have to change the type of i and are doing it for a good reason (see section 9.1.9). Listing 9-34 shows the result.

Listing 9-34. beg6.c

#include <stdio.h>

int array[] = {1,2,3,4,5};

void array_sum( void ) {
    size_t i;
    int sum = 0;
    for( i = 0; i < sizeof(array) / sizeof(int); i++ )
        sum = sum + array[i];
    printf("The sum is: %d\n", sum );
}

int main( int argc, char** argv ) {
    array_sum();
    return 0;
}

Right now, array_sum works only on statically defined arrays, because they are the only ones whose size can be calculated by sizeof. Next we want to add enough parameters to array_sum so it would be able to sum any array. You cannot add only a pointer to an array, because the array size is unknown by default, so you give it two parameters: the array itself and the amount of elements in the array, as shown in Listing 9-35.

Listing 9-35. beg7.c

#include <stdio.h>

int array[] = {1,2,3,4,5};

void array_sum( int* array, size_t count ) {
    size_t i;
    int sum = 0;
    for( i = 0; i < count; i++ )
        sum = sum + array[i];
    printf("The sum is: %d\n", sum );
}

int main( int argc, char** argv ) {
    array_sum(array, sizeof(array) / sizeof(int));
    return 0;
}

This code is much better but it still breaks the rule of not mixing input/output and logic. You cannot use array_sum anywhere in graphical programs, you also can do nothing with its result. We are going to get rid of the output in the summation function and make it return its result. Check Listing 9-36.

Listing 9-36. beg8.c

#include <stdio.h>

int g_array[] = {1,2,3,4,5};

int array_sum( int* array, size_t count ) {
    size_t i;
    int sum = 0;
    for( i = 0; i < count; i++ )
        sum = sum + array[i];
    return sum;
}

int main( int argc, char** argv ) {
    printf(
            "The sum is: %d\n",
            array_sum(g_array, sizeof(g_array) / sizeof(int))
         );
    return 0;
}

For convenience, we renamed the global array variable g_array, but it is not necessary.

Finally, we have to think about adding const qualifiers . The most important place is function arguments of pointer types. We really want to declare that array_sum will never change the array that its argument is pointing at. We can also like the idea of protecting the global array itself from being changed by adding a const qualifier.

Remember that if we make g_array itself constant but will not mark array in the argument list as such, we would not be able to pass g_array to array_sum, because there are no guarantees that array_sum will not change data that its argument is pointing at. Listing 9-37 shows the final result.

Listing 9-37. beg9.c

#include <stdio.h>

const int g_array[] = {1,2,3,4,5};

int array_sum( const int* array, size_t count ) {
    size_t i;
    int sum = 0;
    for( i = 0; i < count; i++ )
        sum = sum + array[i];
    return sum;
}

int main( int argc, char** argv ) {
    printf(
            "The sum is: %d\n",
            array_sum(g_array, sizeof(g_array) / sizeof(int))
         );
    return 0;
}

When you write a solution for an assignment in this book, remember all the points stated previously and check whether your program conforms to them, and if not, how it can be improved.

Can this program be improved further? Of course, and we are going to give you some hints about how.

Can the pointer array be NULL? If so, how do we signalize it without dereferencing a NULL pointer, which will probably result in crash?
Can sum overflow?

9.1.16 Assignment: Scalar Product

A scalar product of two vectors (a ₁ , a ₂ , … , a _n) and (b ₁ , b ₂ , … , b _n) is the sum
$\sum_{i=0}^n{a}_i{b}_i={a}_1{b}_1+{a}_2{b}_2+\cdots +{a}_n{b}_n$

For example, the scalar product of vectors (1, 2, 3) and (4, 5, 6) is

1 . 4 + 2 . 5 + 3 . 6 = 4 + 10 + 18 = 32

The solution should consist of

Two global arrays of int of the same size.
A function to compute the scalar product of two given arrays.
A main function which calls the product computations and outputs its results.

9.1.17 Assignment: Prime Number Checker

You have to write a function to test the number for primarity. The interesting thing is that the number will be of the type unsigned long and that it will be read from stdin.

You have to write a function int is_prime( unsigned long n ), which checks whether n is a prime number or not. If it is the case, the function will return 1; otherwise 0.
The main function will read an unsigned long number and call is_prime function on it. Then, depending on its result, it will output either yes or no.

Read man scanf and use scanf function with the format specifier %lu.

Remember, is_prime accepts unsigned long, which is not the same thing as unsigned int!

9.2 Tagged Types

There are three “tagged” kinds of types in C: structures, unions, and enumerations. We call them that because their names consist of a keyword struct, union, or enum followed by a mnemonic tag, like struct pair or union pixel.

9.2.1 Structures

Abstraction is absolutely key to all programming. It replaces the lower-level, more verbose concepts with those closer to our thinking: higher-level, less verbose. When you are thinking about visiting your favorite pizzeria and plan an optimal route, you do not think about “moving your right foot X centimeters forward,” but rather about “crossing the road” or “turning to the right.” While for program logic the abstraction mechanism is implemented using functions, the data abstraction is implemented using complex data types.

A structure is a data type which packs several fields. Each field is a variable of its own type. Mathematics would probably be happy calling structures “tuples with named fields.”

To create a variable of a structural type we can refer to the example shown in Listing 9-38. There we define a variable d which has two fields: a and b of types int and char, respectively. Then d.a and d.b become valid expressions that you can use just as you are using variable names.

Listing 9-38. struct_anon.c

struct { int a; char b; } d;
d.a = 0;
d.b = 'k';

This way, however, you only create a one-time structure. In fact, you are describing a type of d but you are not creating a new named structural type. The latter can be done using a syntax shown in Listing 9-39.

Listing 9-39. struct_named.c

struct pair {
    int a;
    int b;
};

...

struct pair d;
d.a = 0;

d.b = 1;

Be very aware that the type name is not pair but struct pair, and you cannot omit the struct keyword without confusing the compiler. The C language has a concept of namespaces quite different from the namespaces in other languages (including C++). There is a global type namespace, and then there is a tag-namespace, shared between struct, union, and enum datatypes. The name following the struct keyword is a tag. You can define a structural type whose name is the same as other type, and the compiler will distinguish them based on the struct keyword presence.

An example shown in Listing 9-40 demonstrates two variables of types struct type and type, which are perfectly accepted by the compiler.

Listing 9-40. struct_namespace.c

typedef unsigned int type;
struct type {
    char c;
};

int main( int argc, char** argv ) {
    struct type st;
    type t;
    return 0;
}

It does not mean, though, that you really should make types with similar names.

However, as struct type is a perfectly fine type name, it can be aliased as type using the typedef keyword, as shown in Listing 9-41. Then the type and struct type names will be completely interchangeable.

Listing 9-41. typedef_struct_simple.c

typedef struct type type;

Please, do not do it

It is not a good practice to alias structural types using typedef, because it hides information about the type nature.

Structures can be initialized similarly to arrays (see Listing 9-42).

Listing 9-42. struct_init.c

struct S {char const* name; int value; };
...
struct S new_s = { "myname", 4 };

You can also assign 0 to all fields of a structure, as shown in Listing 9-43.

Listing 9-43. struct_zero.c

struct pair { int a; int b; };

...
struct pair p = { 0 };

In C99, there is a better syntax for structure initialization, which allows you to name the fields to initialize. The unmentioned fields will be initialized to zeros. Listing 9-44 shows an example.

Listing 9-44. struct_c99_init.c

struct pair {
    char a;
    char b;
};

struct pair st = { .a = 'a',.b = 'b' };

The fields of the structures are guaranteed to not overlap; however, unlike arrays, structures are not continuous in a sense that there can be free space between their fields. Thus, sizeof of a structural type can be greater than the sum of element sizes because of these gaps. We will talk about it in Chapter 12.

9.2.2 Unions

Unions are very much like structures, but their fields are always overlapping. In other words, all union fields start at the same address. The unions share their namespace with structures and enumerations.

Listing 9-45 shows an example.

Listing 9-45. union_example.c

union dword {
    int integer;
    short shorts[2];
};

...
dword test;
test.integer = 0xAABBCCDD;

We have just defined a union which stores a number of size 4 bytes (on x86 or x64 architectures). At the same time it stores an array of two numbers, each of which is 2 bytes wide. These two fields (a 4-byte number and a pair of 2-byte numbers) overlap. By changing the .integer field we are also modifying .shorts array. If we assign .integer = 0xAABBCCDD and then try to output shorts[0] and shorts[1], we will see ccdd aabb.

Question 162

Why do these shorts seem reversed? Will it always be the case, or is it architecture dependent?

By mixing structures and unions we can achieve interesting results. An example shown in Listing 13-17 demonstrates, how one can address parts of a 3-byte structure using indices.⁵

Listing 9-46. pixel.c

union pixel {
    struct {
        char a,b,c;
    };
    char at[3];
};

Remember that if you assigned a union field to a value, the standard does not guarantee you anything about the values of other fields. An exception is made for the structures that have the same initial sequence of fields.

Listing 9-47 shows an example.

Listing 9-47. union_guarantee.c

struct sa {
    int x;
    char y;
    char z;
};

struct sb {
    int x;
    char y;
    int notz;
};

union test {
    struct sa as_sa;
    struct sb as_sb;
};

9.2.3 Anonymous Structures and Unions

Starting from C11, the unions and structures can be anonymous when inside other structures or unions. It allows for a less verbose syntax when accessing inner fields.

In the example shown in Listing 9-48, to access the x field of vec, you need to write vec.named.x. You cannot omit named.

Listing 9-48. anon_no.c

union vec3d {
    struct {
        double x;
        double y;
        double z;
    } named ;
    double raw[3];
};

union vec3d vec;

Now, in the next example, shown in Listing 9-49, we got rid of the name of the first field (named). This is an anonymous structure, and now we can access its fields as if they were the fields of vec itself: vec.x.

Listing 9-49. anon_struct.c

union vec3d {
    struct {
        double x;
        double y;

        double z;
    };
    double raw[3];
};

union vec3d vec;

9.2.4 Enumerations

Enumerations are a simple data type based on int type. It fixes certain values and gives them names, similar to how DEFINE works.

For example, the traffic light can be in one of the following states (based on which lights are turned on):

Red.
Red and yellow.
Yellow.
Green.
No lights.

This can be encoded in C as shown in Listing 9-50.

Listing 9-50. enum_example.c

enum light {
    RED,
    RED_AND_YELLOW,
    YELLOW,
    GREEN,
    NOTHING
};

...
enum light l = nothing;
...

When is it useful? It is often used to encode a state of an entity, for example, as a part of a finite automaton; it can serve as a bag of error codes or code mnemonics.

The constant value 0 was named RED, RED_AND_YELLOW stands for 1, etc.

9.3 Data Types in Programming Languages

We have given an overview of data types in C; now let’s take a step back from C and look at the bigger picture and the types of systems in programming languages.

In many areas of computer science and programming the evolution went from untyped universe to typing. For example, the following entities are untyped:

Lambda terms in untyped lambda calculus;
Sets in many set theories, for example, ZF;
S expressions in LISP language; and
Bit strings.

We are mostly interested in bit strings right now. For the computer, everything is a bit string of some fixed size. Those can be interpreted as numbers (integer or real), sequences of character codes, or something else. We can say that the assembly is an untyped language.

However, when we start working in an untyped environment we are trying to divide objects into several categories. We are working with objects from one category in a similar way. So, we establish a convention: these bit strings are integer numbers, those are floating point numbers, etc.

Is this it, the typing? Not quite yet. We are still not limited in our capabilities and can add a floating point number to a string pointer, because the programming language does not enforce any type control. This type checking can be performed in compile time ( static typing ) or in runtime (dynamic typing).

So, not only we are dividing all kinds of possible objects into categories, we are also declaring which operations can be performed on each type. The data of different types is also often encoded in a different way.

9.3.1 Kinds of Typing

Besides static and dynamic typing, there are also other, orthogonal classifications.

Strong typing means that all operations require exactly the argument they need. No implicit conversions from other types into the needed ones are allowed.

Weak typing means that there are implicit conversions between types which make possible the operations on data which is not of exactly the required type (but a conversion to a required type exists).

This division is not strictly binary; in the real world the languages tend to be closer to one of these two poles. We have quite extreme cases, such as Ada for strong typing and JavaScript for the weak one.

Sometimes we also divide languages based on verbosity.

With explicit typing we always annotate data with types.

With implicit typing we allow the compiler to infer the type whenever it is possible.

Now we are going to give real-world examples of all combinations of static/dynamic and strong/weak typing.

9.3.1.1 Static Strong Typing

Types are checked in compile time and the compiler is pedantic about them.

In OCaml language there are two different addition operators: + for integer numbers and +. for reals. So, this code will raise an error at compile time:

4 +. 1.0

We used the data of type int when the compiler expected a float and, unlike in C, where a conversion would have occurred, has thrown an error. This is the essence of very strong typing.

9.3.1.2 Static Weak Typing

The C language has exactly this kind of typing. All types are known in compile time, but the implicit conversions occur quite often.

The almost identical line double x = 4 + 3.0; causes no compiler errors, because 4 gets automatically promoted to double and then added to 3.0. The weakness expresses itself in the fact that programmer does not specify conversion operations explicitly.

9.3.1.3 Strong Dynamic Typing

This is the kind of typing used in Python. Python does not allow implicit conversions between types as much as JavaScript does. However, the type errors will not be reported until you launch the program and actually try to execute the erroneous statement.

Python has an interpreter where you can type expressions and statements and immediately execute them. If you try to evaluate an expression "3" + 2 and see its result in an interactive Python interpreter, you will get an error because the first object is a string, and the second is a number. Even though this string contains a number (so a conversion could have been written), the addition is not allowed. Listing 9-51 shows the dump.

Listing 9-51. Python Typing Error

>>> "3" + 2
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: cannot concatenate 'str' and 'int' objects

Now let’s try to evaluate an expression 1 if True else "3" + 2. This expression is evaluated to 1 if True is true (which obviously holds); otherwise its value is a result of the same invalid operation "3" + 2. However, as we are never reaching into the else branch, there will be no error raised even in runtime. Listing 9-52 shows the terminal dump. When applied to two strings, the plus acts as a concatenation operator.

Listing 9-52. Python Typing: No Error Because the Statement Is Not Executed

>>> 1 if True else "3" + 2
1
>>> "1" + "2"
'12'

9.3.1.4 Weak Dynamic Typing

Probably the most used language with such typing is JavaScript.

In the example we provided for Python we tried to add a number to a string. Despite the fact that the string contained a valid decimal number, an error was reported, because a string is a string, whatever it might hold. Its type won’t be automatically changed.

However, JavaScript is much less strict about what you are allowed to do. We are going to use the interactive JavaScript console (which you can access in virtually any modern web browser) and type some expressions. Listing 9-53 shows the result.

Listing 9-53. JavaScript Implicit Conversions

>>> 3 == '3'
true
>>> 3 == '4'
false
>>> "7.0" == 7

true

By studying this example only we can deduce that when a number and a string are compared, both sides are apparently converted to a number and then compared. It is not clear whether the numbers are integers or reals, but the amount of implicit operations in action here is quite astonishing.

9.3.2 Polymorphism

Now that we have a general understanding of typing, let’s go after one of the most important concepts related to the type systems, namely, polymorphism.

Polymorphism (from Greek: polys, “many, much” and morph, “form, shape”) is the possibility of calling different actions for different types in a uniform way. You can also think about it in another way: the data entities can take different types.

There are four different kinds of polymorphism [8], which we can also divide into two categories:

Universal polymorphism, when a function accepts an argument of an infinite number of types (including maybe even those who are not defined yet) and behaves in a similar way for each of them.
- Parametric polymorphism, where a function accepts an additional argument, defining the type of another argument.
  In languages such as Java or C#, the generic functions are an example of parametric compile-time polymorphism.
- Inclusion, where some types are subtypes of other types. So, when given an argument of a child type, the function will behave in the same way as when the parent type is provided.
Ad hoc , where functions accept a parameter from a fixed set of types and these functions may operate differently on each type.
- Overloading, several functions exist with the same name and one of them is called based on an argument type.
- Coercion, where a conversion exists from type X to type Y and a function accepting an argument of type Y is called with an argument of type X.

The popular object-oriented programming paradigm has popularized the notion of polymorphism, but in a very particular way. The object-oriented programming usually refers to only one kind of polymorphism, namely, subtyping, which is essentially the same as inclusion, because the objects of the child type form a subset of objects of the parent type.

Sometimes it is hard to say which type of polymorphism is used in a certain place. Consider the following four lines:

3 + 4
3 + 4.0
3.0 + 4
3.0 + 4.0

The “plus” operation here is obviously polymorphic, because it is used in the same way with all kinds of int and double operands. But how is it really implemented? We can think of different options, for example,

This operator has four overloads for all combinations.
This operator has two overloads for int + int and double + double cases. Additionally, a coercion from int to double is defined.
This operator can only add up two reals, and all ints are coerced to double.

9.4 Polymorphism in C

The C language allows for different types of polymorphisms, and some can be emulated through little tricks.

9.4.1 Parametric Polymorphism

Can we make a function which will behave differently for different types of arguments based on an explicitly given type? We can do it to some extent, even in C89. However, we will need some rather heavy macro machinery in order to achieve a smooth result.

First, we have to know what this fancy # symbol does in a macro context. When used inside a macro, the # symbol will quote the symbol contents. Listing 9-54 shows an example.

Listing 9-54. macro_str.c

#define mystr hello
#define res #mystr

puts( res );  /* will be replaced with `puts("hello")`

The ## operator is even more interesting. It allows us to form symbol names dynamically. Listing 9-55 shows an example.

Listing 9-55. macro_concat.c

#define x1 "Hello"
#define x2 " World"

#define str(i) x##i

puts( str(1) );  /* str(1) -> x1 -> "Hello" */
puts( str(2) );  /* str(2) -> x2 -> " World" */

Some higher-level language features can be boiled down to compiler logic performing a program analysis and making a call to one or another function, using one or another data structure, etc. In C we can imitate it by relying on a preprocessor.

Listing 9-56 shows an example.

Listing 9-56. c_parametric_polymorphism.c

#include <stdio.h>
#include <stdbool.h>

#define pair(T) pair_##T
#define DEFINE_PAIR(T) struct pair(T) {\
    T fst;\
    T snd;\
};\
bool pair_##T##_any(struct pair(T) pair, bool (*predicate)(T)) {\
    return predicate(pair.fst) || predicate(pair.snd); \
}

#define any(T) pair_##T##_any

DEFINE_PAIR(int)

bool is_positive( int x ) { return x > 0; }
int main( int argc, char** argv ) {
    struct pair(int) obj;
    obj.fst = 1;
    obj.snd = -1;
    printf("%d\n", any(int)(obj, is_positive) );
    return 0;
}

First, we included stdbool.h file to get access to the bool type, as we said in section 9.1.3.

pair(T) when called like that: pair(int) will be replaced by the string pair_int.
DEFINE_PAIR is a macro which, when called like that: DEFINE_PAIR(int), will be replaced by the code shown in Listing 9-57.
Notice the backslashes at the end of each line: they are used to escape the newline character, thus making this macro span across multiple lines. The last line of the macro is not ended by the backslash.
This code defines a new structural type called struct pair_int, which essentially contains two integers as fields. If we instantiated this macro with a parameter other than T, we would have had a pair of elements of a different type.
Then a function is defined, which will have a specific name for each macro instantiation, since the parameter name T is encoded into its name. In our case it is pair_int_any, whose purpose is to check whether any of two elements in the pair satisfies the condition. It accepts the pair itself as the first argument and the condition as the second. The condition is essentially a pointer to a function accepting T and returning bool, a predicate, as its name suggests.
pair_int_any launches the condition function on the first element and then on the second element.
When used, DEFINE_PAIR defines the structure that holds two elements of a given type, and functions to work with it. We can have only one copy of these functions and structure definition for each type, but we need them, so we want to instantiate DEFINE_PAIR once for every type we want to work with.
Listing 9-57. macro_define_pair.c
```
struct pair_int {
    int fst;
    int snd;
};
bool pair_int_any(struct pair_int pair, bool (*predicate)(int)) {
    return predicate(pair.fst) || predicate(pair.snd);
}
```
Then a macro #define any(T) pair_##T##_any is defined. Notice that its sole purpose is apparently just to form a valid function name depending on type. It allows us to call pair_##T##_any in a rather elegant way: any(int), as if it was a function returning a pointer to a function.

So, syntactically we got very close to a concept of parametric polymorphism: we are providing an additional argument (int) which serves to determine the type of other argument (struct pair_int). Of course, it is not as good as the type arguments in functional languages or even generic type parameters in C# or Scala, but it is something.

9.4.2 Inclusion

The inclusion is fairly easy to achieve in C for pointer types. The idea is that every struct’s address is the same as the address of its first member.

Take a look at the example shown in Listing 9-58.

Listing 9-58. c_inclusion.c

#include <stdio.h>

struct parent {
    const char* field_parent;
};

struct child {
    struct parent base;
    const char* field_child;
};

void parent_print( struct parent* this ) {
    printf( "%s\n", this->field_parent );
}

int main( int argc, char** argv ) {
    struct child c;
    c.base.field_parent = "parent";
    c.field_child = "child";
    parent_print( (struct parent*) &c );

    return 0;
}

The function parent_ print accepts an argument of a type parent*. As the definition of child suggests, its first field has a type parent. So, every time we have a valid pointer child*, there exists a pointer to an instance of parent which is equal to the former. Thus it is safe to pass a pointer to a child when a pointer to the parent is expected.

The type system, however, is not aware of this; thus you have to convert the pointer child* to parent*, as seen in the call parent_print( (struct parent*) &c );. We could replace the type struct parent* with void* in this case, because any pointer type can be converted to void* (see section 9.1.5).

9.4.3 Overloading

Automated overloading was not possible in C until C11. Until recently, people included the argument type names in the function names to provide different “overloadings” given some base name. Now the newer standard has included a special macro which expands based on the argument type: _Generic. It has a wide range of usages.

The _Generic macro accepts an expression E and then many association clauses, separated by a comma. Each clause is of the form type name: string. When instantiated, the type of E is checked against all types in the associations list, and the corresponding string to the right of colon will be the instantiation result.

In the example shown in Listing 9-59, we are going to define a macro print_fmt, which can choose an appropriate printf format specifier based on argument type, and a macro print, which forms a valid call to printf and then outputs newline.

print_fmt matches the type of the expression x with one of two types: int and double. In case the type of x is not in this list, the default case is executed, providing a fairly generic %x specifier. However, in absence of the default case, the program would not compile should you provide print_fmt with an expression of the type, say, long double. So in this case it would be probably wise to just omit default case, forcing the compilation to abort when we don’t really know what to do.

Listing 9-59. c_overload_11.c

#include <stdio.h>

#define print_fmt(x) (_Generic( (x), \
            int: "%d",\
            double: "%f",\
            default: "%x"))

#define print(x) printf( print_fmt(x), x ); puts("");

int main(void) {
    int x = 101;
    double y = 42.42;
    print(x);
    print(y);
    return 0;
}

We can use _Generic to write a macro that will wrap a function call and select one of differently named functions based on an argument type.

9.4.4 Coercions

C has several coercions embedded into the language itself. We are speaking essentially about pointer conversions to void* and back and integer conversions, described in section 9.1.4. To our knowledge, there is no way to add user-defined coercions or anything that looks at least remotely similar, akin to Scala’s implicit functions or C++ implicit conversions.

As you see, in some form, C allows for all four types of polymorphism.

9.5 Summary

In this chapter we have made an extensive study of the C type system: arrays, pointers, constant types. We learned to make simple function pointers, seen the caveats of sizeof, revised strings, and started to get used to better code practices. Then we learned about structures, unions, and enumerations. At the end we talked briefly about type systems in mainstream programming languages and polymorphism and provided some advanced code samples to demonstrate how to achieve similar results using plain C. In the next chapter we are going to take a closer look at the ways of organizing your code into a project and the language properties that are important in this context.

Question 163

What is the purpose of & and * operators?

Question 164

How do we read an integer from an address 0x12345?

Question 165

What type does the literal 42 have?

Question 166

How do we create a literal of types unsigned long, long, and long long?

Question 167

Why do we need size_t type?

Question 168

How do we convert values from one type to another?

Question 169

Is there a Boolean type in C89?

Question 170

What is a pointer type?

Question 171

What is NULL?

Question 172

What is the purpose of the void* type?

Question 173

What is an array?

Question 174

Can any consecutive memory cells be interpreted as an array?

Question 175

What happens when trying to access an element outside the array’s bounds?

Question 176

What is the connection between arrays and pointers?

Question 177

Is it possible to declare a pointer to a function?

Question 178

How do we create an alias for a certain type?

Question 179

How are the arguments passed to the main function?

Question 180

What is the purpose of the sizeof operator?

Question 181

Is sizeof evaluated during the program execution?

Question 182

Why is the const keyword important?

Question 183

What are structure types and why do we need them?

Question 184

What are union types? How do they differ from the structure types?

Question 185

What are enumeration types? How do they differ from the structure types?

Question 186

What kinds of typing exist?

Question 187

What kinds of polymorphism exist and what is the difference between them?

Footnotes

1 This language design flaw is corrected in C++, where 'x' has type char.

2 The keyword is usual arithmetic conversions.

3 Until C99; but even nowadays variable length arrays are discouraged by many because if the array size is big enough, the stack will not be able to hold it and the program will be terminated.

4 POSIX is a family of standards specified by the IEEE Computer Society. It includes the description of utilities, application programming interface (API), etc. Its purpose is to ease the portability of software, mostly between different branches of UNIX-derived systems.

5 Note that this might not work out of the box for wider types due to possible gaps between struct fields.

Table of Contents for Low-Level Programming: C, Assembly, and Program Execution on Intel® 64 Architecture

9. Type System

9.1 Basic Type System of C

9.1.1 Numeric Types

Listing 9-1. char_example.c

Note

9.1.2 Type Casting

Listing 9-2. type_cast.c

Question 158

9.1.3 Boolean Type

9.1.4 Implicit Conversions

Figure 9-1. Integer conversions

Note

Listing 9-3. int_promotion_pitfall.c

Be understood

Listing 9-4. int_float_conv.c

9.1.5 Pointers

Listing 9-5. ptr_deref.c

Is zero a zero?

Listing 9-6. void_deref.c

9.1.6 Arrays

Listing 9-7. array_decl.c

Listing 9-8. array_example_rw.c

9.1.7 Arrays as Function Arguments

Listing 9-9. fun_array1.c

Listing 9-10. fun_array2.c

Listing 9-11. fun_array3.c

Listing 9-12. array_param_size.c

Listing 9-13. array_param_size_static.c

9.1.8 Designated Initializers in Arrays

Listing 9-14. designated_initializers_arrays.c

9.1.9 Type Aliases

Listing 9-15. typedef_example.c

Never use int for array indices

Listing 9-16. size_int_difference.c

9.1.10 The Main Function Revisited

Listing 9-17. main_revisited.c

9.1.11 Operator sizeof

Listing 9-18. sizeof_array.c

Listing 9-19. sizeof_array_fun.c

Which format specifier?

Question 159

Question 160

Question 161

9.1.12 Const Types

Listing 9-20. const_def.c

Simple rule

Listing 9-21. const_cast.c

Listing 9-22. const_discard.c

Should I use const at all? It is cumbersome.

9.1.13 Strings

Listing 9-23. string_literal_breaks.c

Note

9.1.14 Functional Types

Listing 9-24. fun_type_example.c

Listing 9-25. fun_type_example_alt.c

Listing 9-26. typedef_bad_fun_ptr.c

Listing 9-27. typedef_good_fun_ptr.c

Listing 9-28. fun_types_decl.c

9.1.15 Coding Well

9.1.15.1 General Considerations

9.1.15.2 Example: Array Summation

Listing 9-29. beg1.c

Listing 9-30. beg2.c

Listing 9-31. beg3.c

Listing 9-32. beg4.c

Listing 9-33. beg5.c

Listing 9-34. beg6.c

Listing 9-35. beg7.c

Listing 9-36. beg8.c

Listing 9-37. beg9.c

9.1.16 Assignment: Scalar Product

9.1.17 Assignment: Prime Number Checker

9.2 Tagged Types

9.2.1 Structures

Listing 9-38. struct_anon.c

Listing 9-39. struct_named.c

Listing 9-40. struct_namespace.c

Listing 9-41. typedef_struct_simple.c

Please, do not do it

Table of Contents for
Low-Level Programming: C, Assembly, and Program Execution on Intel® 64 Architecture