© Igor Zhirkov 2017

Igor Zhirkov, Low-Level Programming, 10.1007/978-1-4842-2403-8_13

13. Good Code Practices

Igor Zhirkov

(1)Saint Petersburg, Russia

In this chapter we want to concentrate on the coding style. When writing code a developer is constantly faced with a decision-making procedure. What kinds of data structures should he use? How should they be named? Where and when should they be allocated? Experienced programmers make these decisions in a different way compared to beginners, and we find it extremely important to speak about this decision making process.

13.1 Making Choices

Decisions often require balancing between two poles that are mutually exclusive. The classical example is that you cannot ship a quality product cheaply and quickly. Fine performance tuning of the code often makes it harder to read and to debug. So, some code characteristics should be prioritized over others based on common sense and the task itself. Because of this, such code guidelines are a good start, but following them blindly is not the way to go.

Our code writing advices are based on the following premises:

  1. We want the code to be as reusable as possible. This often requires careful planning and coordination between developers, which does not let you write code really fast but pays off very soon because it spares time for debugging and actually allows you to write complex software. Debugging programs is generally considered harder than writing them. So, less code often means less time spent debugging and more robust functions. It is especially important for such languages as C, which are

    • Unsafe in a large sense (allows for pointer arithmetic, does not perform bound checks, etc.)

    • Lack an expressive type system, seen in such languages as Scala, Haskell, or OCaml. Such types impose a number of restrictions on the program that should be satisfied, otherwise the compiler will reject it.

    This rule has a notable exception. If reusing functions results in a drastic performance decrease, the algorithm has an unnecessary large O-complexity. For example, we have done an assignment with linked lists in Chapter 10. There was a function to calculate sum of all integers in a certain list. One way of creating it is roughly shown in Listing 13-1.

    Listing 13-1. list_sum_bad.c
    int list_sum( const struct list* l ) {
        size_t i;
        int sum = 0;
        /* We do not want to launch the full computation
         * of size at each cycle iteration */
        size_t sz = list_size( l ) ;
        for( i = 0; i < sz; l = l-> next )
            sum = sum + l->value;
        return sum;
    }

    In this example, for each i in the range from 0 inclusive to the list length exclusive we actually start walking through the list from its very first element. This results in a drastic decrease in performance in comparison with the single summing pass through the list. In the latter case, appending another element to the list results in an additional list access, while in the program shown in Listing 13-1 this leads to list_length(l) additional list accesses!

  2. The program should be easy to modify. This point is interdependent with the previous one. Smaller functions are often more reusable, and thus the modifications become easier, because more code can be left untouched from the previous version.

  3. The code should be as easy to read as possible. The key factors here are

    • Sane naming. Even if you are not a native English speaker, you should not write variable names, function names, or commentary in your native language.

    • Consistency . Use the same naming conventions and uniform ways of performing similar operations.

    • Short and concise functions. If the logic description is overly verbose, it is often a sign of a lack of sane decomposition or you need an abstraction layer. It has also a good effect on maintainability.

  4. The code should be easy to test. Testing assures us that at least in some elaborated cases the code behaves as intended.

Sometimes the task demands the opposite. For example, if we are writing the code for a controller in absence of a good optimizing compiler and with very restricted resources, we can be forced to abandon beautiful code structure because the compiler cannot inline functions properly; thus each call will impact the performance, often in an unacceptable way.

13.2 Code Elements

13.2.1 General Naming

The specific naming convention is often imposed by the language itself. In cases in which the project is based on an existing codebase, it might be reasonable to not deviate from it for the sake of consistency. In this book we are using the following naming conventions:

  • All names are written in lowercase letters.

  • The name parts are separated with an underscore, as follows: list_count.

The rest of this section concentrates on different language features and associated naming and usage conventions.

13.2.2 File Structure

Include files should have an include guard.

They should be self-contained , which means that for each header file thisfile.h a .c file with only the line #include "thisfile.h" should compile. The order of includes is often chosen as follows:

  • Related header.

  • C library.

  • Other libraries’ .h.

  • Your project’s .h.

Then adhere to a consistent order of declaration of macros, types, functions, variables, etc. It greatly simplifies navigating the project. A typical order is

  • for headers:

    1. Include files.

    2. Macros.

    3. Types.

    4. Variables (globals).

    5. Functions.

  • for .c files

    1. Include files .

    2. Macros.

    3. Types.

    4. Variables (globals).

    5. Static variables.

    6. Functions.

    7. Static functions.

13.2.3 Types

  • When possible (C99 or newer), prefer the types defined in stdint.h, such as uint64_t or uint8_t.

  • If you want to be POSIX-compliant, do not define your own types with _t suffix. It is reserved for standard types, so the new types that might be introduced in future revisions of standard will not clash with the custom types defined in some programs.

  • Types are often named with a prefix common to the project. For example, you want to write a calculator, then the type tags will be prefixed with calc_.

  • When you are defining structures and if you can choose the order of fields, define them in the following order:

    • First try to minimize the memory losses from data structure padding .

    • Then order fields by size.

    • Finally, sort them alphabetically.

    • Sometimes structures have fields that should not be modified by user directly. For example, a library defines the structure shown in Listing 13-2.

      Listing 13-2. struct_private_ex.c
      struct mypair {
          int x;
          int y;
          int _refcount;
      };

      The fields of such structure can be modified directly using dot or arrow syntax. Our convention, however, implies that only specific library functions should modify the _refcount field, and the library user should never do it by hand.

      C lacks a concept of structure private fields, so it is as close as we can get without using more or less dirty hacks.

    • Enumeration members should be written in uppercase, like constants. The ­common prefix is suggested for the members of one enumeration. An example is shown in Listing 13-3.

      Listing 13-3. enum_ex.c
      Enum exit_code   {
          EX_SUCCESS,
          EX_FAILURE,
          EX_INVALID_ARGUMENTS
      };

13.2.4 Variables

Choosing the right names for variables and functions is crucial.

  • Use nouns for names.

  • Boolean variables should have meaningful names too. Prefixing them with is_ is advisable. Then append the exact property that is being checked. is_good is probably too broad to be a good name in most cases, unlike is_prime or is_before_last.

    Prefer positive names to negative ones, as the human brain parses them easily—for example, is_even over is_not_odd.

  • It is not advisable to use names that bear no meaning, like a, b, or x4. The notable exception is the code that illustrates an article or a paper, which describes an algorithm in pseudo code using such names. In this case, any naming change is more likely to confuse readers than to bring more clarity. The indices are traditionally named i and j and you will be understood if you stick to them.

  • Including the measuring units might be a good idea—for example, uint32_t delay_msecs.

  • Other suffixes are useful too, such as cnt, max, etc.

    For example, attempts_max (maximum attempts allowed), attempts_cnt (attempts made).

  • Global constants are named in all capital letters. Global mutable variables are prefixed with g_.

  • The tradition says that the global constants should be defined using #define directive. However, the modern approach is to use const static or just const global variables. Contrary to #defines, they are typed and also better seen when debugging. If you have an access to a quality compiler, it will inline them anyway (if it decides that it will be faster).

  • Use const modifier whenever appropriate. C99 allows you to create variables in arbitrary places inside functions, not just at the block start. Use it to store intermediate results in named constants.

  • Do not define global variables in header files! Define them in .c files and declare them in .h file as extern.

13.2.5 On Global Variables

Do not use global mutable variables if you can. We cannot stress this enough. Here are the most important problems they bring:

  • In medium scale and more in large projects with a whopping number of lines, all information about the function effects is better localized in its signature. A function f might call another function g, and so on, and somewhere in this chain a global variable will be changed. We cannot see that this change might occur by looking at f; we have to study all functions it calls, and the functions they call, and so on.

  • They make functions that are not reenterable . It means that a function f cannot be called if is already being executed. The latter can happen in two cases:

    • Function f is calling other functions, which after some inner calls might call f again, when the first instance of f has not yet been terminated.

      Listing 13-4 shows an example of a function f that is not reenterable.

      Listing 13-4. reenterability.c
      bool flag = true;
      int var = 0;
      void g(void) {
          f();
          flag = false;
      }
      void f(void) {
          if (flag) g();
      }
    • The program is parallelized and the function is being used in multiple threads (which is often the case on modern computers).

In case of a complex call hierarchy, knowing whether the function is reenterable or not requires an additional analysis.

  • They introduce security risks, because usually their values have to be checked before being modified or used. Programmers tend to forget these checks. If something can go wrong, it will go wrong.

  • They make testing function harder because of the data dependency they are introducing. Writing code without tests, however, is always a practice to avoid.

Global static mutable variables are evil too, but at least they do not pollute the global namespace in other files.

Global static immutable variables (const static) are, however, perfectly fine and can be often inlined by compiler.

13.2.6 Functions

  • Use verbs to name functions —for example, packet_checksum_calc.

  • The prefix is_ is also quite common for functions checking conditions—for example, int is_prime( long num ).

  • The functions that operate on a struct with a certain tag are often prefixed with the respective tag name—for example, bool list_is_empty(struct list* lst );.

As C does not allow for fine namespace control, this seems to be the simplest form of controlling the chaos that emerges when most functions are accessible from anywhere.

  • Use the static modifier for all functions except for those you want to be available for everyone.

  • Probably the most important place to use const is for function arguments of type “pointer to immutable data.” It ensures that function does not occasionally change them due to a programmer’s mistake.

13.3 Files and Documentation

As the project grows, the number of files increases and it becomes more difficult to navigate them. To be able to cope with voluminous projects, you have to structure them from the very beginning.

Following is a common template for the project root directory.

src/

Source files

doc/

Documentation

res/

Resource files (such as images).

lib/

Static libraries that will be linked to the executable file.

build/

The artifacts: an executable file and other generated files.

include/

Include files. This directory is added to the compiler include search path by -I flag.

obj/

Generated object files. They are assembled in the executable files and libraries by the linker and are not needed after the compilation end.

configure

The initial configuration script that should be launched prior to building. It can set up different target architectures or turn on and off features.

Makefile

Contains instructions for the automated build system.

The file name and format varies depending on system used.

There are many build systems; some of the most popular ones for C are make, cmake, and automake. Different languages have different ecosystems and often have dedicated build tools (e.g., Gradle or OCamlBuild).

Doxygen is a de facto standard for creating documentation for C and C++ programs. It allows us to generate a fully structured set of HTML or LATEXpages from the program source code. The descriptions of functions and variables are taken from specifically formatted comments. Listing 13-5 shows an example of a source file which is accepted by Doxygen.

Listing 13-5. doxygen_example.h
#pragma once
#include <common.h>
#include <vm.h>


/** @defgroup const_pool Constant pool */

/** Free allocated memory for the pool contents
*/
void const_pool_deinit( struct vm_const_pool* pool );


/** Non-destructive constant pool combination
 * @param a First pool.
 * @param b Second pool.
 * @returns An initialized constant pool combining contents of both arguments
 * */
struct vm_const_pool const_combine(
        struct vm_const_pool const* a,
        struct vm_const_pool const* b );


/** Change the constant pool by adding the other pool's contents in its end.
 * @param[out] src The source pool which will be modified.
 * @param fresh The pool to merge with the `src` pool.
 */
void const_merge(
        struct vm_const_pool* src,
        struct vm_const_pool const* fresh );


/**@} */

The specially formatted comments (starting with /** and containing commands such as @defgroup) are processed by Doxygen to generate documentation for the respective code entities. For more information, refer to Doxygen documentation.

13.4 Encapsulation

One of the thinking fundamentals is abstraction. In software engineering, it is a process of hiding implementation details and data.

If we want to implement a certain behavior like an image rotation, we would like to think only about the image rotation. The input file format, the format of its headers, is of little importance to us. What is really important is to be able to work with dots which form the image and know its dimensions. However, you cannot write a program without considering all this information that is actually independent of the rotation algorithm itself.

We are going to split the program into parts; each part will do its purpose and only it. This logic can be used by calling a set of exposed functions and/or a set of exposed global variables. Together they form an interface for this program part. To implement them, however, we usually have to write more functions, which are better hidden from the end user.

Working with version control systems

When working in a team where many people perform changes simultaneously, making smaller functions is very important. If a function performs many actions, and its code is huge, multiple independent changes will be harder to merge automatically.

In programming languages supporting packages or classes, these are used to hide pieces of code and create interfaces for them. Unfortunately, C has none of them; furthermore, there is no concept of “private fields” in structures: all fields are seen by everyone.

The support for separate code files, called translation units, is the only real language feature to help us isolate parts of program code. We use a notion of module as a synonym for a translation unit, a .c file.

The C standard does not define a notion of module. In this book we are using them interchangeably because for the C language they are roughly equivalent.

As we know, functions and global variables become public symbols by default and thus accessible to other files. What is reasonable is to mark all “private” functions and global variables as static in the .c file and declare all “public” functions in the .h file.

As an example, we are going to write a module that implements a stack.

The header file will describe the structure and the functions that can operate its instances. It resembles object-oriented programming without subtyping (no inheritance).

The interface will consist of the following functions:

  • Create or destroy a stack;

  • Push and pop elements from a stack.

  • Check if the stack is empty.

  • Launch a function for each element in the stack.

The code file will define all functions and probably some more, which won’t be accessible outside of it and are only created for the sake of decomposition and code reusability.

Listings 13-6 and 13-7 show the resulting code. stack.h describes an interface. It has an include guard, enumerates all other headers (first standard headers, then custom ones), and defines custom types.

Listing 13-6. stack.h
#ifndef _STACK_H_
#define _STACK_H_


#include <stddef.h>
#include  <stdint.h>
#include  <stdbool.h>


struct list;

struct stack  {
    struct list* first;
    struct list* last;
    size_t count;
};


struct stack stack_init    ( void );
void stack_deinit( struct stack* st );


void stack_push( struct stack* s, int value );
int  stack_pop ( struct stack* s );
bool stack_is_empty( struct stack const* s );


void stack_foreach( struct stack* s, void (f)(int) );

#endif  /* _STACK_H_  */

There are two types defined: list and stack. The first one is only used internally inside the stack, and so we declared it an incomplete type . Only pointers to instances of such type are allowed unless its definition is specified later.

For everyone who includes stack.h, the type struct list will remain incomplete. The implementation file stack.c, however, will define the structure, completing the type and allowing to access its fields (but only in stack.c).

Then the struct stack is defined and the functions that work with it are declared (stack_push, stack_pop, etc.) (see Listing 13-7).

Listing 13-7. stack.c
#include <malloc.h>
#include "stack.h"


struct list { int value; struct list* next; };

static struct list* list_new( int item, struct list* next ) {
    struct list* lst = malloc( sizeof( *lst ) );
    lst->value  =  item;
    lst->next  =  next;
    return lst;


}

void stack_push( struct stack* s, int value ) {
    s->first = list_new( value, s->first );
    if ( s->last ==  NULL  ) s->last =  s-> first;
    s->count++;
}


int stack_pop( struct stack* s ) {
    struct list* const head = s->first;
    int value;
    if ( head ) {
        if ( head->next ) s->first = head->next;
        value = head->value;
        free( head );
        if( -- s->count ) {
            s->first =  s->last =  NULL;
        }
        return value;
    }
    return 0;
}


void stack_foreach( struct stack* s, void (f)(int) ) {
    struct list* cur;
    for( cur = s->first; cur; cur = cur-> next )
        f( cur->value );


}

bool stack_is_empty( struct stack const* s ) {
    return s->count == 0;
}


struct stack stack_init( void ) {
    struct stack empty = { NULL, NULL, 0 };
    return empty;
}


void stack_deinit( struct stack* st ) {
    while( ! stack_is_empty( st ) ) stack_pop( st );
    st-> first = NULL;
    st-> last = NULL;
}

This file defines all functions declared in the header. It can be split into multiple .c files, which will sometimes do good for the project structure; what is important is that the compiler should accept them all and then the compiled code should get to the linker.

A static function list_new is defined to isolate the instance initialization of struct list. It is not exposed to the outside world. During optimizations, not only can the compiler inline it, but it can even delete the function itself, effectively eliminating any possible implications on the code performance. Marking function static is necessary (but not sufficient) for this optimization to occur. Additionally, the instructions of static functions might be placed closer to their respective callers, improving locality.

By splitting the program on modules with well-described interfaces you reduce the overall complexity and achieve better reusability.

The need to create header files makes modifications a bit cumbersome because the consistency of headers with the code itself is the programmer’s responsibility. However, we can benefit from it as well by specifying a clear interface description, which lacks the implementation details.

13.5 Immutability

It is quite common to have to choose between creating a new modified copy of a structure and performing modifications in place.

Here are some advantages and disadvantages of both choices.

  • Creating copy:

    • Easier to write: you won’t accidentally pass the wrong instance to a function.

    • Easier to debug, because you don’t have to track changes of variable.

    • Can be optimized by the compiler.

    • Friendly to parallelization.

    • Can be slower.

  • Mutating existing instance.

    • Faster.

    • Can become very hard to debug, especially in a multithreaded environment.

    • Sometimes simpler because you don’t have to carefully and recursively copy structures with multiple pointers to other structures (this process is called deep copying).

    • For objects with a distinct identity, this approach may be more intuitive and is also robust enough.

Our perception of the real world is based on mutable objects, because the objects in the real world often have a distinct identity. When you are turning on your phone, the phone is not replaced by its copy, but the state of the same phone is changed instead. In other words, the identity of the phone is maintained, while its state is changing. Thus, in situations where you only have one instance of a certain type and the consecutive changes are performed on it, it is fine to mutate it instead of making a copy at every change.

13.6 Assertions

There is a mechanism that allows you to test certain conditions during program execution. When such a condition is not being satisfied, an error is produced and the program is terminated abnormally.

To use the assertion mechanism, we have to use #include <assert.h> and then use the assert macro. Listing 13-8 shows an example.

Listing 13-8. assert.c
#include <assert.h>
int main() {
    int x = 0;
    assert( x != 0 );
    return 0;
}

The condition, given to the assert macro, is obviously false; hence the program will terminate abnormally and inform us about the failed assertion:

assert: assert.c:6: main: Assertion `x != 0' failed.

If the preprocessor symbol NDEBUG is defined (which can be achieved by using -D NDEBUG compiler option or #define NDEBUG directive), the assert is replaced by an empty string and thus turned off. So, assertions will produce zero overhead and the checks will not be performed.

You should use asserts to check for impossible conditions that signify the inconsistency of the program state. Never use asserts to perform checks on user input.

13.7 Error Handling

While higher-level languages have some kind of error handling mechanism (which does not interfere with the main logic description), C lacks one. There are three principal ways to deal with errors:

  1. Use return codes. A function should not return a result as such but a code that shows whether it has processed well or not. In the latter case the code reflects the exact type of error that has occurred. The computation result is assigned by a pointer that is accepted as an additional argument.

    Listing 13-9 shows an example.

    Listing 13-9. error_code.c
    enum   div_res   {
          DIV_OK,
          DIV_BYZERO
    };


    enum div_res div( int x, int y, int* result ) {
        if ( y  != 0  ) { *result =  x/y; return DIV_OK;  }
        else return DIV_BYZERO;
    }

    Symmetrically , you can return values as you do and set up error code using a pointer to a respective variable.

    Error codes can be described using an enum or with several #defines. Then you can use them as indices in a static array of messages or use a switch statement. Listing 13-10 shows an example.

    Listing 13-10. err_switch_arr.c
    enum   error_code   {
          ERROR1,
          ERROR2
    };
    ...
    enum  error_code  err;
    ...
    switch (err) {
        case ERROR1: ... break;
        case ERROR2: ... break;
        default: ... break;
    }


    /* alternatively */

    static const char* const messages[] = {
        "It is the first error\n",
        "The second error it is\n"
    };


    fprintf( stderr, messages[err] );

    Never use global variables as error code holders (or to return a value from a function).

    According to C standard , a standard variable-like entity errno exists. It should be a modifiable lvalue and must not be explicitly declared. Its usage is akin to a global variable, albeit its value is thread-local. The library functions use it as an error code holder, so after seeing a failure from a function (e.g., fopen returned NULL), one should check the errno value for an error code. The man pages for the respective function enumerate possible errno values (e.g., EEXIST).

    Despite this feature having sneaked into the standard library, it is largely considered an anti-pattern and should not be imitated.

  2. Using callbacks.

    Callbacks are function pointers that are passed as arguments and called by the function that accepts them. They can be used to isolate the error handling code, but they often look weird to people who are more accustomed to traditional return code usage. Additionally, the execution order becomes less obvious.

    Listing 13-11 shows an example.

    Listing 13-11. div_cb.c
    #include <stdio.h>

    int div( int x, int y, void (onerror)(int, int)) {
        if ( y != 0 )
            return x/y;
        else  {
            onerror(x,y);
            return 0;
        }
    }


    static void div_by_zero(int x, int y) {
        fprintf( stderr, "Division by zero: %d / %d\n", x, y );
    }


    int main(void) {
        printf("%d %d\n",
                div( 10, 2, div_by_zero ),
                div( 10, 0, div_by_zero ) );
        return 0;
    }
  3. Using longjmp. This advanced technique will be explained in section 14.3.

    There is a classical error recovering technique, which requires the goto usage. Listing 13-12 shows an example is shown.

    Listing 13-12. goto_error_recover.c
    void foo(void)

    {
        if (!doA()) goto exit;
        if (!doB()) goto revertA;
        if (!doC()) goto revertB;


        /* doA, doB and doC succeeded */
        return;


    revertB:
        undoB();
    revertA:
        undoA();
    exit:
        return;
    }

In this example, three actions have been performed, and they all can fail. The nature of these actions is such that we have to do a cleanup after. For example, doA might trigger dynamic memory allocation. In case doA succeeded but doB did not, we have to free this memory to prevent memory leak. This is what the code labeled revertA does.

The recoveries are performed in reverse order. If doA and doB succeeded, but doC failed, we have to revert to B, and then to A. So, we label the reverting stages with the labels and let the control fall through them. So, goto revertB will revert to doB first and then fall to the code, reverting to doA. This trick can often be seen in a Linux kernel. However, be wary, gotos usually make verification much harder, which is why they are sometimes banned.

13.8 On Memory Allocation

  • Many programmers advise against flexible arrays allocated on a stack. It is an easy way to get a stack overflow if you do not check the length well enough. What’s even worse, there is no way to tell whether you can safely allocate a said amount of bytes on a stack or not.

  • Do not overuse malloc! As you will see in the last assignment of this chapter, malloc is not cheap at all. Whenever you want to allocate something reasonably small, do it on a stack, as a local variable. If some function needs an address of a structure, you can take the address of a stack allocated structure and pass to it. This prevents memory leaks and improves performance and code readability.

  • Global variables pose no threat as long as they are constants. Static local variables are the same. Use them if you want to limit the usage of a certain constant by one function.

13.9 On Flexibility

We advocate code reusability indeed. However, taking this to the extreme results in an absurd amount of abstraction layers and boilerplate code that is only present to support a possible future need for additional features (which might never happen).

There is no silver bullet, in the large sense. Every programming style, every model of computation, is good and concise in some cases and bulky and verbose in other ones. Analogously, the best tool is specialized rather than a jack of all trades. You could transform an image viewer into a powerful editor, capable of playing video and editing IDv3 tags, but the image viewer facet will surely suffer, and so will the user experience.

Writing more abstract code can bring benefits because such code is easier to adapt to new contexts. At the same time, it introduces complexity that might be unnecessary. Only generalize to an extent that does no harm. To know when to stop you need to answer several questions for yourself, such as

  • What is the purpose of your program or library?

  • What are the limits of functionality that you imagine for your program?

  • Will it be easier to write, use, and/or debug this function if it is written in a more general way?

While the first two questions are very subjective, the latter one can be provided with an example. Let’s take a look at the code shown in Listing 13-13.

Listing 13-13. dump_1.c
void dump( char const* filename ) {
    FILE* f = fopen( filename, "w" );
    fprintf(f, "this is the dump %d", 42 );
    fclose( f );
}

Compare it to another version with the same logic, split in two functions, shown in Listing 13-14.

Listing 13-14. dump_2.c
void dump( FILE* f ) {
    fprintf(f, "this is the dump %d", 42 );
}
void fun( void ) {
    FILE* f = fopen( "dump.txt", "w" );
    dump( f );
    fclose( f );
}

The second version is preferable for two reasons:

  • The first version requires a filename, which means that you cannot use it to write to stderr or stdout.

  • The second version decouples file opening logic and file writing logic. If you want to handle errors that might occur on fprintf, fopen, or fclose calls, you will do it separately for fopen, keeping the functions relatively simple. The dump function will not handle file opening errors: it will not be called at all if the opening failed.

Listing 13-15 shows an example of the same logic with error handling . As you see, there is no error handling for file opening and closing in dump function; it is performed in fun instead.

Listing 13-15. file_open_sep.c
#include <stdio.h>

enum stat {
    STAT_OK,
    STAT_ERR_OPEN,
    STAT_ERR_CLOSE,
    STAT_ERR_WRITE
};


enum stat dump( FILE * f ) {
    if ( fprintf( f, "this is the dump %d", 42 ) ) return STAT_OK;
    return  STAT_ERR_WRITE;
}


enum stat fun( void ) {

    enum stat dump_stat;
    FILE * f;


    f =  fopen(  "dump.txt",  "w"  );
    if (!f) return STAT_ERR_OPEN;
    dump_stat = dump( f );
    if ( dump_stat != STAT_OK ) return dump_stat;
    if (! fclose( f ) ) return STAT_ERR_CLOSE;


    return STAT_OK;
}

In case of multiple writes in the dump function, the function will become encumbered and thus less readable.

13.10 Assignment: Image Rotation

You have to create a program to rotate a BMP image of any resolution to 90 degrees clockwise.

13.10.1 BMP File Format

BMP (BitMaP) format is a raster graphics format, which means that it stores an image as a table of colored dots (pixels). In this format the color is encoded with numbers of a fixed size (can be 1, 4, 8, 16, or 24 bits). If 1 bit is used per pixel, the image is black and white. If 24 bits are used, the number of different colors possible is roughly 16 million. We only implement the rotation of 24-bit images.

The subset of BMP files that your program should be able to work with is described by the structure shown in Listing 13-16. It represents the file header, followed immediately by the pixel data.

Listing 13-16. bmp_struct.c
#include  <stdint.h>
Struct __attribute__((packed))
    bmp_header {
        uint16_t bfType;
        uint32_t  bfileSize;
        uint32_t bfReserved;
        uint32_t bOffBits;
        uint32_t biSize;


        uint32_t biWidth;
        uint32_t  biHeight;
        uint16_t  biPlanes;
        uint16_t biBitCount;
        uint32_t biCompression;
        uint32_t biSizeImage;
        uint32_t biXPelsPerMeter;
        uint32_t biYPelsPerMeter;
        uint32_t biClrUsed;
        uint32_t  biClrImportant;
};
Question 259

Read BMP file specifications to identify what these fields are responsible for.

The file format depends on the bit count per pixel. There are no color palettes when 16 or 24 bits per pixel are used.

Each pixel is encoded by 24 bits or 3 bytes as shown in Listing 13-17. Each component is a number from 0 to 255 (one byte) which shows the presence of blue, green, or red color in this pixel. The resulting color is a superposition of these three base colors.

Listing 13-17. pixel.c
struct  pixel  {
    unsigned char b, g, r;
}

Every row of pixels is padded so that its length would be a multiple of 4. For example, the image width is 15 pixels. It corresponds to 15 × 3 = 45 bytes. To pad it we skip 3 bytes (to the closest multiple of 4, 48) before starting the new row of pixels. Because of this, the real image size will differ from the product of width, height, and pixel size (3 bytes).

Note

Remember to open the image in a binary mode!

13.10.2 Architecture

We want to think about program architecture that is extensible and modular.

  1. Describe the pixel structure struct pixel to not work with the raster table directly (as with completely structureless data). This should always be avoided.

  2. Separate the inner image representation from the input format. The rotation is performed on the inner image format, which is then serialized back to BMP. There can be changes in BMP format, you might want to support other formats, and you do not want to couple the rotation algorithm tightly to BMP.

    To achieve that, define a structure structure image to store the pixel array (continuous, now without padding) and some information that should really be kept. For example, there is absolutely no need to store BMP signature here, or any of the never-used header fields. We can get away with the image width and height in pixels.

    You will need to create functions to read an image from BMP file and to write it to BMP file (probably also to generate a BMP header from the inner representation).

  3. Separate file opening from its reading.

  4. Make error handling unified and handle errors in exactly one place (for this very program it is enough).

To achieve that, define the from_bmp function, which will read a file from the stream and will return one of the codes that show whether the operation completed successfully or not.

Remember the flexibility concerns. Your code should be easy to use in applications with graphical user interface (GUI) as well as in those without GUI at all, so throwing prints into stderr all over the place is not a good option: restrict them to the error handling piece of code. Your code should be easily adaptable for different input formats as well.

Listing 13-18 shows several snippets of starting code.

Listing 13-18. image_rot_stub.c
#include  <stdint.h>
#include <stdio.h>


struct pixel { uint8_t b,g,r; };

struct image {
    uint64_t width, height;
    struct pixel_t* data;
};


/*  deserializer   */
enum read_status  {
    READ_OK = 0,
    READ_INVALID_SIGNATURE,
    READ_INVALID_BITS,
    READ_INVALID_HEADER
        /* more codes  */
};


enum read_status from_bmp( FILE* in, struct image* const read );

/*  image_t from_jpg( FILE* );...
 *  and other deserializers are possible
 *  All information needed will be
 *  stored in image structure */


/* makes a rotated copy */
struct image rotate( struct image const source );


/*  serializer   */
enum  write_status  {
     WRITE_OK = 0,
     WRITE_ERROR
         /* more codes */
};


enum write_status to_bmp( FILE* out, struct image const* img );
Question 260

Implement blurring . It is done in a very simple way: for each pixel you compute its new components as an average in a 3 × 3 pixels window (called kernel). The border pixels are left untouched.

Question 261

Implement rotation to an arbitrary angle (not only 90 or 180 degrees).

Question 262

Implement “dilate” and “erode” transformations. They are similar to the blur, but instead of doing an average in a window, you have to compute the minimal (erode) or maximal (dilate) component values.

13.11 Assignment: Custom Memory Allocator

In this assignment, we are going to implement our own version of malloc and free based on the memory mapping system call mmap and a linked list of chunks of arbitrary sizes. It can be viewed as a simplified version of a memory manager typical for the standard C library and shares most of its weaknesses.

For this assignment, the usage of malloc/calloc, free and realloc is forbidden.

As we know, these functions are used to manipulate the heap. The heap consists of anonymous pages and is in fact a linked list of chunks. Each chunk consists of a header and the data itself. The header is described by a structure shown in Listing 13-19.

Listing 13-19. mem_str.c
struct mem  {
    struct mem* next;
    size_t capacity;
    bool is_free;
};

The header is immediately followed by the usable area.

We need to store both the size and the link to the next block because in our case the heap can have gaps for two reasons.

  • The heap start can be placed between two already mapped regions.

  • The heap can grow to an arbitrary size.

An allocation in a heap is splitting the first available chunk in two (given its size is enough). It marks the first part as not free and returns its address. If there are no free chunks big enough for the requested size, the allocator attempts to get more memory from OS by calling mmap.

There is no point allocating blocks of 1 or 3 bytes; they are too small. It is usually a waste since the header size is superior anyway. So we are going to introduce a constant BLOCK_MIN_SIZE for the minimal allowed block size (not including header).

Given a request of query bytes, we first change it to BLOCK_MIN_SIZE if it is too small. Then we iterate over the block chain and apply the following logic to each block:

  • query <= capacity-sizeof(struct mem) - MINIMAL_BLOCK_SIZE

    In this case, we can split the block in two and use the first part as the allocated memory chunk, leaving the second one free.

  • Otherwise the block is not large enough to hold a requested amount of bytes.

    • If the block is not last, we continue to the next block.

    • Otherwise we need to map more pages (enough to allocate query bytes).

First we try to do it immediately after the block end (flag MAP_FIXED for mmap), and if we succeed, we enlarge the current block to incorporate new pages. At the end, we split it in two and use the first of the pair.

If we cannot map more pages immediately at the heap end, we try to map them anywhere (enough to store query bytes). Then we split it in two and use the first of the pair.

If all mappings fail, we return NULL, just as malloc does.

The free is easier to implement. Given the block start we have to calculate the respective header start, which changes its status from “allocated” to “free.” If it is followed immediately by a free block, they are merged. This is not the case when the block is the last in its memory region and the next one is mapped after a certain gap. You can use the header file shown in Listing 13-20 as a starting point.

Listing 13-20. mem.h
#ifndef _MEM_H_
#define _MEM_H_


#define _USE_MISC

#include <stddef.h>
#include  <stdint.h>
#include <stdio.h>


#include <sys/mman.h>

#define  HEAP_START  ((void*)0x04040000)

struct mem;

#pragma pack(push, 1)
struct mem  {
    struct mem* next;
    size_t capacity;
    bool is_free;
};
#pragma pack(pop)


void* _malloc( size_t query );
void  _free( void* mem );
void* heap_init( size_t initial_size );


#define DEBUG_FIRST_BYTES 4

void memalloc_debug_struct_info( FILE* f,
        struct mem const* const address );


void memalloc_debug_heap( FILE* f,   struct mem  const* ptr );

#endif

Remember that complex logic begs for well-thought-out decomposition on smaller functions.

You can use the code shown in Listing 13-21 to debug the heap state. Do not forget that you can also wait for user input and check the /proc/PID/maps file to see the actual mappings of a process with the identifier PID.

Listing 13-21. mem_debug.c
#include "mem.h"

void memalloc_debug_struct_info( FILE* f,
        struct mem const* const address ) {


    size_t i;

    fprintf( f,
            "start: %p\nsize: %lu\nis_free: %d\n",
            (void*)address,
            address-> capacity,
            address-> is_free );
    for ( i = 0;
            i <  DEBUG_FIRST_BYTES  &&  i <  address-> capacity;
            ++i )
        fprintf( f, "%hhX",
                ((char*)address)[ sizeof( struct mem_t ) + i ] );
    putc( '\n', f );
}


void memalloc_debug_heap( FILE* f, struct mem const* ptr ) {
    for( ; ptr; ptr = ptr->next )
        memalloc_debug_struct_info( f, ptr );
}

An estimated number of lines of code is 150 to 200. Do not forget to write a Makefile.

13.12 Summary

In this chapter we have extensively studied some of the most important recommendations considering coding style and program architecture. We have seen the naming conventions and the reasons behind the common code guidelines. When we write code, we should adhere to certain restrictions derived from our requirements for the code as well as the development process itself. We have seen such important concepts as encapsulation. Finally, we have provided two more advanced assignments, where you can apply your new knowledge about program architecture. In the next part we are going to dive into the details of translation, review some language features that are easier to understand on the assembly level, and talk about performance and compiler optimizations.