© Igor Zhirkov 2017

Igor Zhirkov, Low-Level Programming, 10.1007/978-1-4842-2403-8_6

6. Interrupts and System Calls

Igor Zhirkov

(1)Saint Petersburg, Russia

In this chapter we are going to discuss two topics.

First, as von Neumann architecture lacks interactivity, the interrupts were introduced to change that. Although we are not diving into the hardware part of interrupts, we are going to learn exactly how programmer views the interrupts. Additionally, we will speak about input and output ports used to communicate with external devices.

Second, the operating system (OS) usually provides an interface to interact with the resources it controls: memory, files, CPU (central processing unit), etc. This is implemented via system calls mechanism. Transferring control to the operating system routines requires a well defined mechanism of privilege escalation, and we are going to see how it works in Intel 64 architecture.

6.1 Input and Output

When we were extending the von Neumann architecture to work with external devices, we mentioned interrupts only as a way to communicate with them. In fact, there is a second feature, input/output (I/O) ports, which complements it and allows data exchange between CPU and devices.

The applications can access I/O ports in two ways:

  1. Through a separate I/O address space.

    There are 216 1-byte addressable I/O ports, from 0 through FFFFH. The commands in and out are used to exchange data between ports and eax register (or its parts).

    The permissions to perform writes and reads from ports are controlled by checking:

    • IOPL (I/O privilege level) field of rflags registers

    • I/O Permission bit map of a Task State Segment . We will speak about it in section 6.1.1.

  2. Through memory-mapped I/O .

    A part of address space is specifically mapped to provide interaction with such external devices that respond like memory components. Consecutively, any memory addressing instructions (mov, movsb, etc.) can be used to perform I/O with these devices.

    Standard segmentation and paging protection mechanisms are applied to such I/O tasks.

The IOPL field in rflags register works as follows: if the current privilege level is less or equal to the IOPL, the following instructions are allowed to be executed:

  • in and out (normal input/output).

  • ins and outs (string input/output).

  • cli and sti (clear/set interrupt flag).

Thus, setting IOPL in an application individually allows us to forbid it from writing even if it is working at a higher privilege level than the user applications.

Additionally, Intel 64 allows an even finer permission control through an I/O permission bit map. If the IOPL check has passed, the processor checks the bit corresponding to the used port. The operation proceeds only if this bit is not set.

The I/O permission bit map is a part of Task State Segment (TSS), which was created to be an entity unique to a process. However, as the hardware task-switching mechanism is considered obsolete, only one TSS (and I/O permission bit map) can exist in long mode.

6.1.1 TR register and Task State Segment

There are some artifacts from the protected mode that are still somehow used in long mode. A segmentation is an example, now mostly used to implement protection rings. Another is a pair of a tr register and Task State Segment control structure.

The tr register holds the segment selector to the TSS descriptor. The latter resides in the GDT (Global Descriptor Table) and has a format similar to segment descriptors.

Likewise for segment registers, there is a shadow register , which is updated from GDT when tr is updated via ltr (load task register) instruction.

The TSS is a memory region used to hold information about a task in the presence of a hardware task-switching mechanism. Since no popular OS has used it in protected mode, this mechanism was removed from long mode. However, TSS in long mode is still used, albeit with a completely different structure and purpose.

These days there is only one TSS used by an operating system, with the structure described in Figure 6-1.

A418868_1_En_6_Fig1_HTML.gif
Figure 6-1. Task State Segment in long mode

The first 16 bits store an offset to an Input/Output Port Permission Map, which we already discussed in section 6.1. The TSS then holds eight pointers to special interrupt stack tables (ISTs) and stack pointers for different rings. Each time a privilege level changes, the stack is automatically changed accordingly. Usually, the new rsp value will be taken from the TSS field corresponding to the new protection ring. The meaning of ISTs is explained in section 6.2.

6.2 Interrupts

Interrupts allow us to change the program control flow at an arbitrary moment in time. While the program is executing, external events (device requires CPU attention) or internal events (division by zero, insufficient privilege level to execute an instruction, a non-canonical address) may provoke an interrupt, which results in some other code being executed. This code is called an interrupt handler and is a part of an operating system or driver software.

In [15], Intel separates external asynchronous interrupts from internal synchronous exceptions, but both are handled alike.

Each interrupt is labeled with a fixed number, which serves as its identifier. For us it is not important exactly how the processor acquires the interrupt number from the interrupt controller.

When the n-th interrupt occurs, the CPU checks the Interrupt Descriptor Table (IDT) , which resides in memory. Analogously to GDT , its address and size are stored in idtr. Figure 6-2 describes the idtr.

A418868_1_En_6_Fig2_HTML.gif
Figure 6-2. idtr register

Each entry in IDT takes 16 bytes, and the n-th entry corresponds to the n-th interrupt. The entry incorporates some utility information as well as an address of the interrupt handler. Figure 6-3 describes the interrupt descriptor format.

A418868_1_En_6_Fig3_HTML.gif
Figure 6-3. Interrupt descriptor

DPL Descriptor Privilege Level

  • Current privilege level should be less or equal to DPL in order to call this handler using int instruction. Otherwise the check does not occur.

Type 1110 (interrupt gate, IF is automatically cleared in the handler) or 1111 (trap gate, IF is not cleared).

The first 30 interrupts are reserved. It means that you can provide interrupt handlers for them, but the CPU will use them for its internal events such as invalid instruction encoding. Other interrupts can be used by the system programmer.

When the IF flag is set, the interrupts are handled; otherwise they are ignored.

Question 96

What are non-maskable interrupts? What is their connection with the interrupt with code 2 and IF flag?

The application code is executed with low privileges (in ring3). Direct device control is only possible on higher privilege levels. When a device requires attention by sending an interrupt to the CPU, the handler should be executed in a higher privilege ring, thus requiring altering the segment selector.

What about the stack? The stack should also be switched. Here we have several options based on how we set up the IST field of interrupt descriptor.

  • If the IST is 0, the standard mechanism is used. When an interrupt occurs, ss is loaded with 0, and the new rsp is loaded from TSS. The RPL field of ss then is set to an appropriate privilege level. Then old ss and rsp are saved in this new stack.

  • If an IST is set, one of seven ISTs defined in TSS is used. The reason ISTs are created is that some serious faults (non-maskable interrupts, double fault, etc.) might profit from being executed on a known good stack. So, a system programmer might create several stacks even for ring0 and use some of them to handle specific interrupts.

There is a special int instruction, which accepts the interrupt number. It invokes an interrupt handler manually with respect to its descriptor contents. It ignores the IF flag: whether it is set or cleared, the handler will be invoked. To control execution of privileged code using int instruction, a DPL field exists.

Before an interrupt handler starts its execution, some registers are automatically saved into stack. These are ss, rsp, rflags, cs, and rip. See a stack diagram in Figure 6-4. Note how segment selectors are padded to 64 bit with zeros.

A418868_1_En_6_Fig4_HTML.gif
Figure 6-4. Stack when an interrupt handler starts

Sometimes an interrupt handler needs additional information about the event. An interrupt error code is then pushed into stack. This code contains various information specific for this type of interrupt.

Many interrupts are described using special mnemonics in Intel documentation. For example, the 13-th interrupt is referred to as #GP (general protection).1 You will find the short description of the some interesting interrupts in the Table 6-1.

Table 6-1. Some Important Interrupts

VECTOR

MNEMONIC

DESCRIPTION

0

#DE

Divide error

2

 

Non-maskable external interrupt

3

#BP

Breakpoint

6

#UD

Invalid instruction opcode

8

#DF

A fault while handling interrupt

13

#GP

General protection

14

#PF

Page fault

Not all binary code corresponds to correctly encoded machine instructions. When rip is not addressing a valid instruction, the CPU generates the #UD interrupt.

The #GP interrupt is very common. It is generated when you try to dereference a forbidden address (which does not correspond to any allocated page), when trying to perform an action, requiring a higher privilege level, and so on.

The #PF interrupt is generated when addressing a page which has its present flag cleared in the corresponding page table entry. This interrupt is used to implement the swapping mechanism and file mapping in general. The interrupt handler can load missing pages from disk.

The debuggers rely heavily on the #BP interrupt. When the TF is set in rflags, the interrupt with this code is generated after each instruction is executed, allowing a step-by-step program execution. Evidently, this interrupt is handled by an OS. It is thus an OS’s responsibility to provide an interface for user applications that allows programmers to write their own debuggers.

To sum up, when an n-th interrupt occurs, the following actions are performed from a programmer’s point of view:

  1. The IDT address is taken from idtr.

  2. The interrupt descriptor is located starting from 128 × n-th byte of IDT.

  3. The segment selector and the handler address are loaded from the IDT entry into cs and rip, possibly changing privilege level. The old ss, rsp, rflags, cs, and rip are stored into stack as shown in Figure 6-4.

  4. For some interrupts, an error code is pushed on top of handler’s stack. It provides additional information about interrupt cause.

  5. If the descriptor’s type field defines it as an Interrupt Gate, the interrupt flag IF is cleared. The Trap Gate, however, does not clear it automatically, allowing nested interrupt handling.

If the interrupt flag is not cleared immediately after the interrupt handler start, we cannot have any kind of guarantees that we will execute even its first instruction without another interrupt appearing asynchronously and requiring our attention.

Question 97

Is the TF flag cleared automatically when entering interrupt handlers? Refer to [15].

The interrupt handler is ended by a iretq instruction , which restores all registers saved in the stack, as shown in Figure 6-4, compared to the simple call instruction, which restores only rip.

6.3 System Calls

System calls are, as you already know, functions that an OS provides for user applications. This section describes the mechanism that allows their secure execution with higher privilege level.

The mechanisms used to implement system calls vary in different architectures. Overall, any instruction resulting in an interrupt will do, for example, division by zero or any incorrectly encoded instruction. The interrupt handler will be called and then the CPU will handle the rest. In protected mode on Intel architecture, the interrupt with code 0x80 was used by *nix operating systems. Each time a user executed int 0x80, the interrupt handler checked the register contents for system call number and arguments.

System calls are quite frequent, and you cannot perform any interaction with the outside world without them. Interrupts, however, can be slow, especially in Intel 64, since they require memory accesses to IDT .

So in Intel 64 there is a new mechanism to perform system calls, which uses syscall and sysret instructions to implement them.

Compared to interrupts, this mechanism has some key differences:

  • The transition can only happen between ring0 and ring3.As pretty much no one uses ring1 and ring2, this limitation is not considered important.

  • Interrupt handlers differ, but all system calls are handled by the same code with only one entry point.

  • Some general purpose registers are now implicitly used during system call.

    • rcx is used to store old rip

    • r11 is used to store old rflags

6.3.1 Model-Specific Registers

Sometimes when a new CPU appears it has additional registers, which other, more ancient ones, do not have. Quite often these are so-called Model-Specific Registers . When these registers are rarely modified, their manipulation is performed via two commands: rdmsr to read them and wrmsr to change them. These two commands operate on the register identifying number.

rdmsr accepts the MSR number in ecx, returns the register value in edx:eax.

wrmsr accepts the MSR number in ecx and stores the value taken from edx:eax in it.

6.3.2 syscall and sysret

The syscall instruction depends on several MSRs .

  • STAR (MSR number 0xC0000081), which holds two pairs of cs and ss values: for system call handler and for sysret instruction. Figure 6-5 shows its structure.

    A418868_1_En_6_Fig5_HTML.gif
    Figure 6-5. MSR STAR
  • LSTAR (MSR number 0xC0000082) holds the system call handler address (new rip).

  • SFMASK (MSR number 0xC0000084) shows which bits in rflags should be cleared in the system call handler.

The syscall performs the following actions:

  • Loads cs from STAR;

  • Changes rflags with regards to SFMASK;

  • Saves rip into rcx; and

  • Initializes rip with LSTAR value and takes new cs and ss from STAR.

Note that now we can explain why system calls and procedures accept arguments in slightly different sets of registers. The procedures accept their fourth argument in rcx, which, as we know, is used to store the old rip value.

Contrary to the interrupts, even if the privilege level changes, the stack pointer should be changed by the handler itself.

System call handling ends with sysret instruction, which loads cs and ss from STAR and rip from rcx.

As we know, the segment selector change leads to a read from GDT to update its paired shadow register . However, when executing syscall, these shadow registers are loaded with fixed values and no reads from GDT are performed.

Here are these two fixed values in deciphered form:

  • Code Segment shadow register:

    • Base = 0

    • Limit = FFFFFH

    • Type = 112 (can be executed, was accessed)

    • S = 1 (System)

    • DPL = 0

    • P = 1

    • L = 1 (Long mode)

    • D = 0

    • G = 1 (always the case in long mode)

Additionally, CPL (current privilege level) is set to 0

  • Stack Segment shadow register:

    • Base = 0

    • Limit = FFFFFH

    • Type = 112 (can be executed, was accessed)

    • S = 1 (System)

    • DPL = 0

    • P = 1

    • L = 1 (Long mode)

    • D = 1

    • G = 1

However, the system programmer is responsible for fulfilling a requirement: GDT should have the descriptors corresponding to these fixed values.

So, GDT should store two particular descriptors for code and data specifically for syscall support.

6.4 Summary

In this chapter we have provided an overview of interrupts and system call mechanisms. We have studied their implementation down to the system data structures residing in memory. In the next chapter we are going to review different models of computation, including stack machines akin to Forth and finite automatons, and finally work on a Forth interpreter and compiler in assembly language.

Question 98

What is an interrupt?

Question 99

What is IDT?

Question 100

What does setting IF change?

Question 101

In which situation does the #GP error occur?

Question 102

In which situations does the #PF error occur?

Question 103

How is #PF error related to the swapping? How does the operating system use it?

Question 104

Can we implement system calls using interrupts?

Question 105

Why do we need a separate instruction to implement system calls?

Question 106

Why does the interrupt handler need a DPL field?

Question 107

What is the purpose of interrupt stack tables?

Question 108

Does a single thread application have only one stack?

Question 109

What kinds of input/output mechanisms does Intel 64 provide?

Question 110

What is a model-specific register?

Question 111

What are the shadow registers?

Question 112

How are the model-specific registers used in the system call mechanism?

Question 113

Which registers are used by syscall instruction?

Footnotes

1 See section 6.3.1 of the third volume of [15]