Igor Zhirkov

Igor Zhirkov
Saint Petersburg, Russia
Any source code or other supplementary material referenced by the author in this book is available to readers on GitHub via the book’s product page, located at www.apress.com/9781484224021 . For more detailed information, please visit http://www.apress.com/source-code .
ISBN 978-1-4842-2402-1
e-ISBN 978-1-4842-2403-8
DOI 10.1007/978-1-4842-2403-8
Library of Congress Control Number: 2017945327
© Igor Zhirkov 2017
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed.
Trademarked names, logos, and images may appear in this book. Rather than use a trademark symbol with every occurrence of a trademarked name, logo, or image we use the names, logos, and images only in an editorial fashion and to the benefit of the trademark owner, with no intention of infringement of the trademark. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights.
While the advice and information in this book are believed to be true and accurate at the date of publication, neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may be made. The publisher makes no warranty, express or implied, with respect to the material contained herein.
Printed on acid-free paper
This book aims to help you develop a consistent vision of the domain of low-level programming. We want to enable a careful reader to
Freely write in assembly language.
Understand the Intel 64 programming model.
Write maintainable and robust code in C11.
Understand the compilation process and decipher assembly listings.
Debug errors in compiled assembly code.
Use appropriate models of computation to greatly reduce program complexity.
Write performance-critical code.
There are two kinds of technical books: those used as a reference and those used to learn. This book is, without doubt, the second kind. It is pretty dense on purpose, and in order to successfully digest the information we highly suggest continuous reading. To quickly memorize new information you should try to connect it with the information with which you are already familiar. That is why we tried, whenever possible, to base our explanation of each topic on the information you received from previous topics.
This book is written for programming students, intermediate-to-advanced programmers, and low-level programming enthusiasts. The prerequisites are a basic understanding of binary and hexadecimal systems and a basic knowledge of Unix commands.
Throughout this book you will encounter numerous questions. Most of them are meant to make you think again about what you have just learned, but some of them encourage you to do additional research, pointing to the relevant keywords.
We propose the answers to these questions in our GitHub page, which also hosts all listings and starting code for assignments, updates and other goodies.
Refer to the book’s page on Apress site for additional information: http://www.apress.com/us/book/9781484224021 .
There you can also find several preconfigured virtual machines with Debian Linux installed, with and without a graphical user interface (GUI), which allows you to start practicing right away without spending time setting up your system. You can find more information in section 2.1.
We start with the very simple core ideas of what a computer is, explaining concepts of model of computation and computer architecture. We expand the core model with extensions until it becomes adequate enough to describe a modern processor as a programmer sees it. From Chapter 2 onward we start programming in the real assembly language for Intel 64 without resorting to older 16-bit architectures, that are often taught for historical reasons. It allows us to see the interactions between applications and operating system through the system calls interface and the specific architecture details such as endianness. After a brief overview of legacy architecture features, some of which are still in use, we study virtual memory in great detail and illustrate its usage with the help of procfs and examples of using mmap system call in assembly. Then we dive into the process of compilation, overviewing preprocessing, static, and dynamic linking. After exploring interrupts and system calls mechanisms in greater detail, we finish the first part with a chapter about different models of computations, studying examples of finite state machines, stack machines, and implementing a fully functional compiler of Forth language in pure assembly.
The second part is dedicated to the C language. We start from the language overview, building a core understanding of its model of computation necessary to start writing programs. In the next chapter we study the type system of C and illustrate different kinds of typing, ending with about a discussion of polymorphism and providing exemplary implementations for different kinds of polymorphism in C. Then we study the ways of correctly structuring the program by splitting it into multiple files and also viewing its effect on the linking process. The next chapter is dedicated to the memory management, input and output. After that, we elaborate three facets of each language: syntax, semantics, and pragmatics and concentrate on the first and the third ones. We see how the language propositions are transformed into abstract syntax trees, the difference between undefined and unspecified behavior in C, and the effect of language pragmatics on the assembly code produced by the compiler. In the end of the second part, we dedicate a chapter to the good code practices to give readers an idea of how the code should be written depending on its specific requirements. The sequence of the assignments for this part is ended by the rotation of a bitmap file and a custom memory allocator.
The final part is a bridge between the two previous ones. It dives into the translation details such as calling conventions and stack frames and advanced C language features, requiring a certain understanding of assembly, such as volatile and restrict keywords. We provide an overview of several classic low-level bugs such as stack buffer overflow, which can be exploited to induce an unwanted behavior in the program. The next chapter tells about shared objects in great details and studies them on the assembly level, providing minimal working examples of shared libraries written in C and assembly. Then, we discuss a relatively rare topic of code models. The chapter studies the optimizations that modern compilers are capable of and how that knowledge can be used to produce readable and fast code. We also provide an overview of performance-amplifying techniques such as specialized assembly instructions usage and cache usage optimization. This is followed by an assignment where you will implement a sepia filter for an image using specialized SSE instructions and measure its performance. The last chapter introduces multithreading via pthreads library usage, memory models, and reorderings, which anyone doing multithreaded programming should be aware of, and elaborates the need for memory barriers.
The appendices include short tutorials on gdb (debugger), make (automated build system), and a table of the most frequently used system calls for reference and system information to make performance tests given throughout the book easier to reproduce. They should be read when necessary, but we recommend that you get used to gdb as soon as you start assembly programming in Chapter 2 .
Most illustrations were produced using VSVG library aimed to produce complex interactive vector graphics, written by Alexey Velikiy ( http://www.corpglory.com ). The sources for the library and book illustrations are available at VSVG Github page: https://github.com/corpglory/vsvg .
We hope that you find this book useful and wish you an enjoyable read!
I was blessed to meet a great number of persons, both very gifted and extremely dedicated, who helped me and often guided me toward the areas of knowledge I could never have imagined myself.
I thank Vladimir Nekrasov, my most beloved math teacher, for his course and his influence on me, which enabled me to think better and more logically.
I thank Andrew Dergachev, who entrusted me to create and teach my course and helped me so much during these years, Boris Timchenko, Arkady Kluchev, Ivan Loginov (who also kindly agreed to be the technical reviewer for this book), and all my colleagues from ITMO university, who helped me to shape this course in one way or another.
I thank all my students who provided feedback or even helped me in teaching. You are the very reason I am doing this. Several students helped by reviewing the draft of this book, I want to note the most useful remarks of Dmitry Khalansky and Valery Kireev.
For me, the years I have spent in Saint-Petersburg Academic University are easily the best of my life. Never have I had more opportunities to study with world-class specialists working in the leading companies along with other students, much smarter than me. I want to express my deepest gratitude to Alexander Omelchenko, Alexander Kulikov, Andrey Ivanov, and everyone contributing to the quality of computer science education in Russia. I also thank Dmitry Boulytchev, Andrey Breslav, and Sergey Sinchuk from JetBrains, my supervisors who have taught me a lot.
I am also very grateful to my french colleagues: Ali Ed-Dbali, Frédéric Loulergue, Rémi Douence, and Julien Cohen.
I also want to thank Sergei Gorlatch and Tim Humernbrum for providing much necessary feedback on Chapter 17 , which helped me shape it into a much more consistent and understandable version. Special thanks go to Dmitry Shubin for his most useful impact on fixing the imperfections of this book.
I am very grateful to my friend Alexey Velikiy and to his agency CorpGlory.com, which focused on data visualizations and infographics and crafted the best illustrations in this book.
Behind every little success of mine is an infinite amount of support from my family and friends. I would not have achieved anything without you.
Last, but not least, I thank the Apress team, including Robert Hutchinson, Rita Fernando, Laura Berendson, and Susan McDermott, for putting their trust in me and this project and doing everything they could to bring this book into reality.

Igor Zhirkov teaches his highly successful “System Programming Languages” course in ITMO University in Saint-Petersburg, which is a six-time winner of the ACM-ICPC Intercollegiate World Programming Championship. He studied at Saint Petersburg Academic University and received his master’s degree from ITMO University. Currently he is doing research in verified C refactorings as part of his PhD thesis and formalization of Bulk Synchronous Parallelism library in C at IMT Atlantique in Nantes, France. His main interests are low-level programming, programming language theory, and type theory.
His other interests include playing piano, calligraphy, art, and the philosophy of science.

Ivan Loginov is a researcher and lecturer at ITMO University of Saint Petersburg, Russia (University of Information Technologies, Mechanics and Optics), teaching the course “Introduction to Programming Languages” to bachelor degree students of computer science.
He received his master’s degree from ITMO University. His research focuses on compiler theory, language workbenches, and distributed and parallel programming as well as new teaching techniques and their application to IT (information technology).
Currently, he is writing his PhD dissertation on a cloud-based modeling toolkit for system dynamics.
His hobbies include playing the trumpet and reading classic (Russian) literature.