© Daniel Kusswurm 2018
Daniel KusswurmModern X86 Assembly Language Programminghttps://doi.org/10.1007/978-1-4842-4063-2_2

2. X86-64 Core Programming – Part 1

Daniel Kusswurm1 
(1)
Geneva, IL, USA
 

In the previous chapter, you learned about the fundamentals of the x86-64 platform including its data types, register sets, memory addressing modes, and the core instruction set. In this chapter, you learn how to code basic x86-64 assembly language functions that are callable from C++. You also learn about the semantics and syntax of an x86-64 assembly language source code file. The sample source code and accompanying remarks of this chapter are intended to complement the instructive material presented in Chapter 1.

The content of Chapter 2 is organized as follows. The first section describes how to code functions that perform simple integer arithmetic such as addition and subtraction. You also learn the basics of passing arguments and return values between functions written in C++ and x86-64 assembly language. The next section highlights additional arithmetic instructions including integer multiplication and division. In the final section, you learn how to reference operands in memory and use conditional jumps and conditional moves.

It should be noted that the primary purpose of the sample code presented in this chapter is to elucidate proper use of the x86-64 instruction set and basic assembly language programming techniques. All of the assembly language code is straightforward, but not necessarily optimal since understanding optimized assembly language code can be challenging especially for beginners. The sample code that's discussed in later chapters places more emphasis on efficient coding techniques. Chapter 15 also examines techniques that you can use to improve the efficiency of your assembly language code.

Simple Integer Arithmetic

In this section, you learn the basics of x86-64 assembly language programming. It begins with a simple program that demonstrates how to perform integer addition and subtraction. This is followed by an example program that illustrates use of the logical instructions and, or, and xor. The final program describes how to execute shift operations. All three programs illustrate passing argument and return values between a C++ and assembly language function. They also show how to employ commonly-used assembler directives.

As mentioned in the Introduction, all of the sample code discussed in this book was created using Microsoft's Visual C++ and Macro Assembler (MASM), which are included with Visual Studio. Before taking a look at the first code example, a few instructive comments about these development tools may be helpful. Visual Studio uses entities called solutions and projects to help simplify application development. A solution is a collection of one or more projects that are used to build an application. Projects are container objects that help organize an application’s files including (but not limited to) source code, resources, icons, bitmaps, HTML, and XML. A Visual Studio project is usually created for each buildable component (e.g., executable file, dynamic-linked library, static library, etc.) of an application. You can open and load a chapter’s sample programs into the Visual Studio development environment by double-clicking on its solution (.sln) file. Appendix A contain additional information regarding the use of Visual C++ and MASM.

Note

All of the source code examples in this book include one or more functions written in x86-64 assembly language plus some C++ code that demonstrates how to invoke the assembly language code. The C++ code also contains ancillary functions that perform required initializations and display results. For each source code example, a single listing that includes both the C++ and assembly language source code is used in order to minimize the number of listing references in the main text. The actual source code uses separate files for the C++ (.cpp) and assembly language (.asm) code.

Addition and Subtraction

The first source code example of this chapter is called Ch02_01. This example demonstrates how to use the x86-64 assembly language instructions add (Integer Add) and sub (Integer Subtract). It also illustrates some basic assembly language programming concepts including argument passing, returning values, and how to use a few MASM assembler directives. Listing 2-1 shows the source code for example Ch02_01.
//------------------------------------------------
//        Ch02_01.cpp
//------------------------------------------------
#include "stdafx.h"
#include <iostream>
using namespace std;
extern "C" int IntegerAddSub_(int a, int b, int c, int d);
static void PrintResult(const char* msg, int a, int b, int c, int d, int result)
{
  const char nl = '\n';
  cout << msg << nl;
  cout << "a = " << a << nl;
  cout << "b = " << b << nl;
  cout << "c = " << c << nl;
  cout << "d = " << d << nl;
  cout << "result = " << result << nl;
  cout << nl;
}
int main()
{
  int a, b, c, d, result;
  a = 10; b = 20; c = 30; d = 18;
  result = IntegerAddSub_(a, b, c, d);
  PrintResult("Test 1", a, b, c, d, result);
  a = 101; b = 34; c = -190; d = 25;
  result = IntegerAddSub_(a, b, c, d);
  PrintResult("Test 2", a, b, c, d, result);
  return 0;
}
;-------------------------------------------------
;        Ch02_01.asm
;-------------------------------------------------
; extern "C" int IntegerAddSub_(int a, int b, int c, int d);
    .code
IntegerAddSub_ proc
; Calculate a + b + c - d
    mov eax,ecx             ;eax = a
    add eax,edx             ;eax = a + b
    add eax,r8d             ;eax = a + b + c
    sub eax,r9d             ;eax = a + b + c - d
    ret                 ;return result to caller
IntegerAddSub_ endp
    end
Listing 2-1.

Example Ch02_01

The C++ code in Listing 2-1 is mostly straightforward but includes a few lines that warrant some explanatory comments. The #include "stdafx.h" statement specifies a project-specific header file that contains references to frequently used system items. Visual Studio automatically generates this file whenever a new C++ project is created. The line extern "C" int IntegerAddSub_(int a, int b, int c, int d) is a declaration statement that defines the parameters and return value for the x86-64 assembly language function IntegerAddSub_ (all assembly language function names and public variables used in this book include a trailing underscore for easier recognition). The declaration statement’s "C" modifier instructs the C++ compiler to use C-style naming for function IntegerAddSub_ instead of a C++ decorated name (a C++ decorated name includes extra characters that help support function overloading). It also notifies the compiler to use C-style linkage for the specified function.

The C++ function main contains the code that calls the assembly language function IntegerAddSub_. This function requires four arguments of type int and returns a single int value. Like many programming languages, Visual C++ uses a combination of processor registers and the stack to pass argument values to a function. In the current example, the C++ compiler generates code that loads the values of a, b, c, and d into registers ECX, EDX, R8D, and R9D, respectively, prior to calling function IntegerAddSub_.

In Listing 2-1 the x86-64 assembly language code for example Ch02_01 is shown immediately after the C++ function main. The first thing to notice are the lines that begin with a semicolon. These are comments lines. MASM treats any text that follows a semicolon as comment text. The .code statement is a MASM directive that defines the start of an assembly language code section. A MASM directive is a statement that instructs the assembler how to perform certain actions. You’ll learn how to use additional directives throughout this book.

The IntegerAddSub_ proc statement defines the start of the assembly language function . Toward the end of Listing 2-1, the IntegerAddSub_ endp statement marks the end of the function. Like the .code line, the proc and endp statements are not executable instructions but assembler directives that signify the start and end of an assembly language function. The final end statement is a required assembler directive that indicates the completion of statements for the assembly language file. The assembler ignores any text that appears after the end directive.

The assembly language function IntegerAddSub_ calculates a + b + c - d and returns this value to the calling C++ function. It begins with a mov eax,ecx (Move) instruction that copies the value a from ECX into EAX. Note that the contents of ECX are not altered by the mov instruction. Following execution of this mov instruction, registers EAX and ECX both contain the value a. The add eax,edx instruction adds the values in registers EAX and EDX. It then saves the sum (or a + b) in register EAX. Like the previous mov instruction, the contents of register EDX are not modified by the add instruction. The next instruction, add eax,r8d computes a + b + c. This is followed by a sub eax,r9d instruction that calculates the final value a + b + c – d.

An x86-64 assembly language function must use register EAX to return a single 32-bit integer (or C++ int) value to its calling function. In the current example, no additional instructions are necessary to achieve this requirement since EAX already contains the correct return value. The final ret (Return from Procedure) instruction transfers control back to the calling function main, which displays the result. Here’s the output for example Ch02_01.
Test 1
a = 10
b = 20
c = 30
d = 18
result = 42
Test 2
a = 101
b = 34
c = -190
d = 25
result = -80

Logical Operations

The next source code example is called Ch02_02. This example illustrates use of the x86-64 instructions and (Logical AND), or (Logical Inclusive OR), and xor (Logical Exclusive OR). It also shows how to access a C++ global variable from an assembly language function. Listing 2-2 shows the source code for Example Ch02_02.
//------------------------------------------------
//        Ch02_02.cpp
//------------------------------------------------
#include "stdafx.h"
#include <iostream>
#include <iomanip>
using namespace std;
extern "C" unsigned int IntegerLogical_(unsigned int a, unsigned int b, unsigned int c, unsigned int d);
extern "C" unsigned int g_Val1 = 0;
unsigned int IntegerLogicalCpp(unsigned int a, unsigned int b, unsigned int c, unsigned int d)
{
  // Calculate (((a & b) | c ) ^ d) + g_Val1
  unsigned int t1 = a & b;
  unsigned int t2 = t1 | c;
  unsigned int t3 = t2 ^ d;
  unsigned int result = t3 + g_Val1;
  return result;
}
void PrintResult(const char* s, unsigned int a, unsigned int b, unsigned int c, unsigned int d, unsigned val1, unsigned int r1, unsigned int r2)
{
  const int w = 8;
  const char nl = '\n';
  cout << s << nl;
  cout << setfill('0');
  cout << "a =  0x" << hex << setw(w) << a << " (" << dec << a << ")" << nl;
  cout << "b =  0x" << hex << setw(w) << b << " (" << dec << b << ")" << nl;
  cout << "c =  0x" << hex << setw(w) << c << " (" << dec << c << ")" << nl;
  cout << "d =  0x" << hex << setw(w) << d << " (" << dec << d << ")" << nl;
  cout << "val1 = 0x" << hex << setw(w) << val1 << " (" << dec << val1<< ")" << nl;
  cout << "r1 =  0x" << hex << setw(w) << r1 << " (" << dec << r1 << ")" << nl;
  cout << "r2 =  0x" << hex << setw(w) << r2 << " (" << dec << r2 << ")" << nl;
  cout << nl;
  if (r1 != r2)
    cout << "Compare failed" << nl;
}
int main()
{
  unsigned int a, b, c, d, r1, r2 = 0;
  a = 0x00223344;
  b = 0x00775544;
  c = 0x00555555;
  d = 0x00998877;
  g_Val1 = 7;
  r1 = IntegerLogicalCpp(a, b, c, d);
  r2 = IntegerLogical_(a, b, c, d);
  PrintResult("Test 1", a, b, c, d, g_Val1, r1, r2);
  a = 0x70987655;
  b = 0x55555555;
  c = 0xAAAAAAAA;
  d = 0x12345678;
  g_Val1 = 23;
  r1 = IntegerLogicalCpp(a, b, c, d);
  r2 = IntegerLogical_(a, b, c, d);
  PrintResult("Test 2", a, b, c, d, g_Val1, r1, r2);
  return 0;
}
;-------------------------------------------------
;        Ch02_02.asm
;-------------------------------------------------
; extern "C" unsigned int IntegerLogical_(unsigned int a, unsigned int b, unsigned int c, unsigned int d);
    extern g_Val1:dword         ;external doubleword (32-bit) value
    .code
IntegerLogical_ proc
; Calculate (((a & b) | c ) ^ d) + g_Val1
    and ecx,edx             ;ecx = a & b
    or ecx,r8d             ;ecx = (a & b) | c
    xor ecx,r9d             ;ecx = ((a & b) | c) ^ d
    add ecx,[g_Val1]          ;ecx = (((a & b) | c) ^ d) + g_Val1
    mov eax,ecx             ;eax = final result
    ret                 ;return to caller
IntegerLogical_ endp
    end
Listing 2-2.

Example Ch02_02

Similar to what you saw in the first example, the declaration of assembly language function IntegerLogical_ uses the "C" modifier to instruct the C++ compiler not to generate a decorated name for this function. Omitting this modifier would result in a link error during program build. (If the "C" modifier is omitted from the current example, Visual C++ 2017 uses the decorated function name ?IntegerLogical_@@YAIIIII@Z instead of IntegerLogical_. Decorated names are derived using the function's argument types, and these names are compiler specific.) Function IntegerLogical_ requires four unsigned int arguments and returns a single unsigned int result. Immediately following the declaration of function IntegerLogical_ is the definition of a global unsigned int variable named g_Val1. This variable is defined to demonstrate how to access a global value from an assembly language function. Like function declarations, use of the "C" modifier for g_Val1 instructs the compiler to use C-style naming instead of a decorated C++ name.

The definition of function IntegerLogicalCpp follows next in the C++ source code. The reason for defining this function is to provide a simple method for determining whether or not the corresponding x86-64 assembly language function IntegerLogical_ calculates the correct result. While overkill for this particular example, coding complex functions using both C++ and assembly language is often helpful for software test and debugging purposes. The function main in Listing 2-2 includes code that calls both IntegerLogicalCpp and IntegerLogical_. It also calls the function PrintResult to display the results.

In Listing 2-2 the x86-64 assembly language code for example Ch02_02 follows the C++ function main. The first assembly language source code statement, extern g_Val1: dword , is the MASM equivalent of the corresponding declaration for g_Val1 that’s used in the C++ code. In this instance, the extern directive notifies the assembler that storage space for the variable g_Val1 is defined in another module, and the dword directive indicates that g_Val1 is a doubleword (or 32-bit) unsigned value.

Similar to the example in the previous section, the arguments a, b, c, and d are passed to function IntegerLogical_ using registers ECX, EDX, R8D, and R9D. The and ecx,edx instruction performs a bitwise AND operation using the values in registers ECX and EDX, and saves the result to register ECX. The or ecx,r8d and xor ecx,r9d instructions carry out bitwise inclusive OR and exclusive OR operations, respectively. The add ecx,[g_Val1] instruction adds the contents of register ECX and the value of global variable g_Val1, and saves the resultant sum to register ECX. A mov eax,ecx copies the final result to register EAX so that it can be passed back to the calling function. Here’s the output for example Ch02_02.
Test 1
a =  0x00223344 (2241348)
b =  0x00775544 (7820612)
c =  0x00555555 (5592405)
d =  0x00998877 (10061943)
val1 = 0x00000007 (7)
r1 =  0x00eedd29 (15654185)
r2 =  0x00eedd29 (15654185)
Test 2
a =  0x70987655 (1889039957)
b =  0x55555555 (1431655765)
c =  0xaaaaaaaa (2863311530)
d =  0x12345678 (305419896)
val1 = 0x00000017 (23)
r1 =  0xe88ea89e (3901663390)
r2 =  0xe88ea89e (3901663390)

Shift Operations

The last source code example of this section, which is similar in form to the previous two examples, demonstrates use of the shl (Shift Logical Left) and shr (Shift Logical Right) instructions. It also illustrates use of a few more frequently used instructions including cmp (Compare), ja (Jump if Above), and xchg (Exchange). Listing 2-3 shows the C++ and assembly language source code for example Ch02_03.
//------------------------------------------------
//        Ch02_03.cpp
//------------------------------------------------
#include "stdafx.h"
#include <iostream>
#include <iomanip>
#include <bitset>
using namespace std;
extern "C" int IntegerShift_(unsigned int a, unsigned int count, unsigned int* a_shl, unsigned int* a_shr);
static void PrintResult(const char* s, int rc, unsigned int a, unsigned int count, unsigned int a_shl, unsigned int a_shr)
{
  bitset<32> a_bs(a);
  bitset<32> a_shl_bs(a_shl);
  bitset<32> a_shr_bs(a_shr);
  const int w = 10;
  const char nl = '\n';
  cout << s << '\n';
  cout << "count =" << setw(w) << count << nl;
  cout << "a =  " << setw(w) << a << " (0b" << a_bs << ")" << nl;
  if (rc == 0)
    cout << "Invalid shift count" << nl;
  else
  {
    cout << "shl = " << setw(w) << a_shl << " (0b" << a_shl_bs << ")" << nl;
    cout << "shr = " << setw(w) << a_shr << " (0b" << a_shr_bs << ")" << nl;
  }
  cout << nl;
}
int main()
{
  int rc;
  unsigned int a, count, a_shl, a_shr;
  a = 3119;
  count = 6;
  rc = IntegerShift_(a, count, &a_shl, &a_shr);
  PrintResult("Test 1", rc, a, count, a_shl, a_shr);
  a = 0x00800080;
  count = 4;
  rc = IntegerShift_(a, count, &a_shl, &a_shr);
  PrintResult("Test 2", rc, a, count, a_shl, a_shr);
  a = 0x80000001;
  count = 31;
  rc = IntegerShift_(a, count, &a_shl, &a_shr);
  PrintResult("Test 3", rc, a, count, a_shl, a_shr);
  a = 0x55555555;
  count = 32;
  rc = IntegerShift_(a, count, &a_shl, &a_shr);
  PrintResult("Test 4", rc, a, count, a_shl, a_shr);
  return 0;
}
;-------------------------------------------------
;        Ch02_03.asm
;-------------------------------------------------
;
; extern "C" int IntegerShift_(unsigned int a, unsigned int count, unsigned int* a_shl, unsigned int* a_shr);
;
; Returns:   0 = error (count >= 32), 1 = success
;
    .code
IntegerShift_ proc
    xor eax,eax             ;set return code in case of error
    cmp edx,31             ;compare count against 31
    ja InvalidCount           ;jump if count > 31
    xchg ecx,edx            ;exchange contents of ecx & edx
    mov eax,edx             ;eax = a
    shl eax,cl             ;eax = a << count;
    mov [r8],eax            ;save result
    shr edx,cl             ;edx = a >> count
    mov [r9],edx            ;save result
    mov eax,1              ;set success return code
InvalidCount:
    ret                 ;return to caller
IntegerShift_ endp
    end
Listing 2-3.

Example Ch02_03

Near the top of the C++ code, the declaration of the x86 assembly language function IntegerShift_ is somewhat different than the previous examples in that it defines two pointer arguments. Pointers are used by this function since it needs to return more than one result to its calling function. The other minor difference is that the int return value from IntegerShift_ is used to indicate whether or not the value of count is valid. The remaining C++ code in Listing 2-3 exercises the assembly language function IntegerShift_ using a few test cases and displays results.

The assembly language code of function IntegerShift_ starts with an xor eax,eax instruction that sets register EAX to zero. This is done to ensure that register EAX contains the correct return code should an invalid value for argument count be detected. The next instruction, cmp edx,31, compares the contents of register EDX, which contains count, to the constant value 31. When the processor performs a compare operation, it subtracts the second operand from the first operand, sets the status flags based on the results of this operation, and discards the result. If the value of count is above 31, the ja InvalidCount instruction performs a jump to the program location specified by the destination operand. If you look ahead a few lines, you will notice a statement with the text InvalidCount:. This text is called a label. If count > 31 is true, the ja InvalidCount instruction transfers program control to the first assembly language instruction immediately following the label InvalidCount. Note that this instruction can be on same line or a different line, as shown in Listing 2-3.

The xchg ecx,edx instruction swaps the values in registers ECX and EDX. The reason for doing this is that the shl and shr instructions must use register CL for the shift count. The mov eax,edx copies the value a into register EAX, and the subsequent shl eax,cl instruction shifts this value left by the number of bits that’s specified in register CL. The 64-bit pointer value a_shr is passed to function IntegerShift_ using register R8 (in 64-bit programming, all pointers are 64 bits). The mov [r8],eax instruction saves the result of the shift operation to the memory location that’s specified by the contents of register R8.

The subsequent shr edx,cl instruction shifts the value in register EDX (which contains argument value a) right by the number of bits specified in register CL. This result is then saved to the memory location pointed to by register R9, which contains a pointer to the memory location specified by a_shr. The shr instruction is used in function IntegerShift_ since argument a is declared as an unsigned int. If a were declared an int, the sar (Shift Arithmetic Right) instruction could be used to preserve the sign bit of the source operand. The mov eax,1 instruction loads EAX with the constant one to indicate that the value of the count argument was valid. It should be noted that the testing of count for a value above 31 was implemented to illustrate argument checking in an assembly language function. For shift instructions that use an immediate or variable bit shift count, the processor performs a masking operation that limits the shift count to a value between 0 and 31 when the target operand is 32-bits (the limits 0 and 63 are used for 64-bit operands). Here’s the output for source code example Ch02_03.
Test 1
count =     6
a =     3119 (0b00000000000000000000110000101111)
shl =   199616 (0b00000000000000110000101111000000)
shr =     48 (0b00000000000000000000000000110000)
Test 2
count =     4
a =    8388736 (0b00000000100000000000000010000000)
shl =  134219776 (0b00001000000000000000100000000000)
shr =   524296 (0b00000000000010000000000000001000)
Test 3
count =    31
a =  2147483649 (0b10000000000000000000000000000001)
shl = 2147483648 (0b10000000000000000000000000000000)
shr =      1 (0b00000000000000000000000000000001)
Test 4
count =    32
a =  1431655765 (0b01010101010101010101010101010101)
Invalid shift count

Advanced Integer Arithmetic

In this section, you’ll learn how to perform integer multiplication and division. You’ll also learn how to use the x86-64 assembly language instruction set to carry out integer arithmetic using operands of different sizes. In addition to these topics, this section introduces important programming concepts and a few particulars related to Visual C++ calling convention.

Note

The Visual C++ calling convention requirements that are described in this section and in subsequent chapters may be different for other high-level programming languages and operating systems. If you're reading this book to learn x86-64 assembly language and plan on using it with a different high-level programming language or operating system, you should consult the appropriate documentation for more information regarding the target platform's calling convention requirements.

Multiplication and Division

Listing 2-4 contains the source code for example Ch02_04. In this example, the function IntegerMulDiv_ computes the product, quotient, and remainder of two integers using the imul (Integer Multiplication) and idiv (Integer Division) instructions. Note that the C++ declaration of function IntegerMulDiv_ includes five parameters. Up to this point you’ve only seen function declarations with a maximum of four parameters, and the arguments values for these parameters were passed using registers RCX, RDX, R8, and R9 or the low-order portion of these registers. The reason for using these registers is that they are required by the Visual C++ calling convention.
//------------------------------------------------
//        Ch02_04.cpp
//------------------------------------------------
#include "stdafx.h"
#include <iostream>
using namespace std;
extern "C" int IntegerMulDiv_(int a, int b, int* prod, int* quo, int* rem);
void PrintResult(const char* s, int rc, int a, int b, int p, int q, int r)
{
  const char nl = '\n';
  cout << s << nl;
  cout << "a = " << a << ", b = " << b << ", rc = " << rc << nl;
  if (rc != 0)
    cout << "prod = " << p << ", quo = " << q << ", rem = " << r << nl;
  else
    cout << "prod = " << p << ", quo = undefined" << ", rem = undefined" << nl;
  cout << nl;
}
int main()
{
  int rc;
  int a, b;
  int prod, quo, rem;
  a = 47;
  b = 13;
  prod = quo = rem = 0;
  rc = IntegerMulDiv_(a, b, &prod, &quo, &rem);
  PrintResult("Test 1", rc, a, b, prod, quo, rem);
  a = -291;
  b = 7;
  prod = quo = rem = 0;
  rc = IntegerMulDiv_(a, b, &prod, &quo, &rem);
  PrintResult("Test 2", rc, a, b, prod, quo, rem);
  a = 19;
  b = 0;
  prod = quo = rem = 0;
  rc = IntegerMulDiv_(a, b, &prod, &quo, &rem);
  PrintResult("Test 3", rc, a, b, prod, quo, rem);
  a = 247;
  b = 85;
  prod = quo = rem = 0;
  rc = IntegerMulDiv_(a, b, &prod, &quo, &rem);
  PrintResult("Test 4", rc, a, b, prod, quo, rem);
  return 0;
}
;-------------------------------------------------
;        Ch02_04.asm
;-------------------------------------------------
;
; extern "C" int IntegerMulDiv_(int a, int b, int* prod, int* quo, int* rem);
;
; Returns:   0 = error (divisor equals zero), 1 = success
;
    .code
IntegerMulDiv_ proc
; Make sure the divisor is not zero
    mov eax,edx             ;eax = b
    or eax,eax             ;logical OR sets status flags
    jz InvalidDivisor          ;jump if b is zero
; Calculate product and save result
    imul eax,ecx            ;eax = a * b
    mov [r8],eax            ;save product
; Calculate quotient and remainder, save results
    mov r10d,edx            ;r10d = b
    mov eax,ecx             ;eax = a
    cdq                 ;edx:eax contains 64-bit dividend
    idiv r10d              ;eax = quotient, edx = remainder
    mov [r9],eax            ;save quotient
    mov rax,[rsp+40]          ;rax = 'rem'
    mov [rax],edx            ;save remainder
    mov eax,1              ;set success return code
InvalidDivisor:
    ret                 ;return to caller
IntegerMulDiv_ endp
    end
Listing 2-4.

Example Ch02_04

A calling convention is a binary protocol that describes how arguments and return values are exchanged between two functions. As you have already seen, the Visual C++ calling convention for x86-64 programs on Windows requires a calling function to pass the first four integer (or pointer) arguments using registers RCX, RDX, R8, and R9. The low-order portions of these registers are used for argument values smaller than 64 bits (e.g., ECX, CX, or CL for a 32-, 16-, or 8-bit integer). Any additional arguments are passed using the stack. The calling convention also defines additional requirements including rules for passing floating-point values, general-purpose and XMM register use, and stack frames. You’ll learn about these additional requirements in Chapter 5.

The C++ code in Listing 2-4 is similar to the other examples that you’ve already seen. It simply exercises some test cases and displays results. Upon entry to function IntegerMulDiv_, registers ECX, EDX, R8, and R9 contain the argument values a, b, prod, and quo, respectively. The fifth argument rem is passed on the stack, as shown in Figure 2-1. Note that since prod, quo, and rem are pointers, they are passed to IntegerMulDiv_ as 64-bit values.
../images/326959_2_En_2_Chapter/326959_2_En_2_Fig1_HTML.jpg
Figure 2-1.

Argument registers and stack at entry to function IntegerMulDiv_

Figure 2-1 illustrates the state of the stack and the argument registers upon entry to IntegerMulDiv_ but prior to the execution of its first instruction. Note that the location of the fifth argument value rem is at memory address RSP + 40. As simple mov instruction can be used to load rem, which is a pointer, into a general-purpose register when it’s needed. Also note in Figure 2-1 that register RSP points to the caller’s return address on the stack. During execution of a ret instruction, the processor copies this value from the stack and ultimately stores it in register RIP. The ret instruction also removes the caller’s return address from the stack by adding 8 to the value in RSP. The stack locations labeled RCX Home, RDX Home, R8 Home, and R9 Home are storage areas that can be used to temporarily save the corresponding argument registers. These areas can also be used to store other transient data. You’ll learn more about the home area in Chapter 5.

The function IntegerMulDiv_ computes and saves the product a * b. It also calculates and saves the quotient and remainder of a / b. Since IntegerMulDiv_ performs division using b, it makes sense to test the value of b to confirm that it’s not equal to zero. In Listing 2-4, the mov eax,edx instruction copies b into register EAX. The next instruction, or eax,eax, performs a bitwise OR operation to set the status flags. If b is zero, the jz InvalidDivisor (Jump if Zero) instruction skips over the code that performs the division. Like the previous example, the function IntegerMulDiv_ uses a return value of zero to indicate an error condition. Since EAX already contains zero, no additional instructions are necessary.

The next instruction imul eax,ecx computes a * b and saves the product to the memory location specified by R8, which contains the pointer prod. The x86-64 instruction set supports several different forms of the imul instruction. The two-operand form that’s used here actually computes a 64-bit result (recall that the product of two 32-bit integers is always a 64-bit result) but saves only the lower 32 bits in the destination operand. The single-operand form of imul can be used when a non-truncated result is required.

Integer division occurs next. The mov r10d,rdx and mov eax,ecx instructions load registers R10D and EAX with argument values b and a, respectively. Before performing the division operation, the 32-bit dividend in EAX must be sign-extended to 64 bits and this is carried out by the cdq (Convert Doubleword to Quadword) instruction. Following execution of cdq, register pair EDX:EAX contains the 64-bit dividend and register R10D contains the 32-bit divisor. The idiv r10d instruction divides the contents of register pair EDX:EAX by the value in R10D. After execution of the idiv instruction, the 32-bit quotient and 32-bit remainder reside in registers EAX and EDX, respectively. The subsequent mov [r9],eax saves the quotient to the memory location specified by quo. In order to save the remainder, the pointer rem must be obtained from the stack and this is achieved using a mov rax,[rsp+40] instruction. The mov [rax],edx instruction saves the remainder to the memory location specified by rem. The output for example Ch02_04 is the following:
Test 1
a = 47, b = 13, rc = 1
prod = 611, quo = 3, rem = 8
Test 2
a = -291, b = 7, rc = 1
prod = -2037, quo = -41, rem = -4
Test 3
a = 19, b = 0, rc = 0
prod = 0, quo = undefined, rem = undefined
Test 4
a = 247, b = 85, rc = 1
prod = 20995, quo = 2, rem = 77

Calculations Using Mixed Types

In many programs, it is often necessary to perform arithmetic calculations using multiple integer types. Consider the C++ expression a = b * c * d * e, where a, b, c, d, and e are declared as long long, long long, int, short, and char. Calculating the correct result requires proper promotion of the smaller-sized integers into large ones. In the next example, you’ll learn few techniques that can be used to carry out integer promotions in an assembly language function. You’ll also learn how to access integer argument values of various sizes that are stored on the stack. Listing 2-5 contains the source code for example Ch02_05.
//------------------------------------------------
//        Ch02_05.cpp
//------------------------------------------------
#include "stdafx.h"
#include <iostream>
#include <cstdint>
using namespace std;
extern "C" int64_t IntegerMul_(int8_t a, int16_t b, int32_t c, int64_t d, int8_t e, int16_t f, int32_t g, int64_t h);
extern "C" int UnsignedIntegerDiv_(uint8_t a, uint16_t b, uint32_t c, uint64_t d, uint8_t e, uint16_t f, uint32_t g, uint64_t h, uint64_t* quo, uint64_t* rem);
void IntegerMul(void)
{
  int8_t a = 2;
  int16_t b = -3;
  int32_t c = 8;
  int64_t d = 4;
  int8_t e = 3;
  int16_t f = -7;
  int32_t g = -5;
  int64_t h = 10;
  // Calculate a * b * c * d * e * f * g * h
  int64_t prod1 = a * b * c * d * e * f * g * h;
  int64_t prod2 = IntegerMul_(a, b, c, d, e, f, g, h);
  cout << "\nResults for IntegerMul\n";
  cout << "a = " << (int)a << ", b = " << b << ", c = " << c << ' ';
  cout << "d = " << d << ", e = " << (int)e << ", f = " << f << ' ';
  cout << "g = " << g << ", h = " << h << '\n';
  cout << "prod1 = " << prod1 << '\n';
  cout << "prod2 = " << prod2 << '\n';
}
void UnsignedIntegerDiv(void)
{
  uint8_t a = 12;
  uint16_t b = 17;
  uint32_t c = 71000000;
  uint64_t d = 90000000000;
  uint8_t e = 101;
  uint16_t f = 37;
  uint32_t g = 25;
  uint64_t h = 5;
  uint64_t quo1, rem1;
  uint64_t quo2, rem2;
  quo1 = (a + b + c + d) / (e + f + g + h);
  rem1 = (a + b + c + d) % (e + f + g + h);
  UnsignedIntegerDiv_(a, b, c, d, e, f, g, h, &quo2, &rem2);
  cout << "\nResults for UnsignedIntegerDiv\n";
  cout << "a = " << (unsigned)a << ", b = " << b << ", c = " << c << ' ';
  cout << "d = " << d << ", e = " << (unsigned)e << ", f = " << f << ' ';
  cout << "g = " << g << ", h = " << h << '\n';
  cout << "quo1 = " << quo1 << ", rem1 = " << rem1 << '\n';
  cout << "quo2 = " << quo2 << ", rem2 = " << rem2 << '\n';
}
int main()
{
  IntegerMul();
  UnsignedIntegerDiv();
  return 0;
}
;-------------------------------------------------
;        Ch02_05.asm
;-------------------------------------------------
; extern "C" int64_t IntegerMul_(int8_t a, int16_t b, int32_t c, int64_t d, int8_t e,
 int16_t f, int32_t g, int64_t h);
    .code
IntegerMul_ proc
; Calculate a * b * c * d
    movsx rax,cl            ;rax = sign_extend(a)
    movsx rdx,dx            ;rdx = sign_extend(b)
    imul rax,rdx            ;rax = a * b
    movsxd rcx,r8d           ;rcx = sign_extend(c)
    imul rcx,r9             ;rcx = c * d
    imul rax,rcx            ;rax = a * b * c * d
; Calculate e * f * g * h
    movsx rcx,byte ptr [rsp+40]     ;rcx = sign_extend(e)
    movsx rdx,word ptr [rsp+48]     ;rdx = sign_extend(f)
    imul rcx,rdx            ;rcx = e * f
    movsxd rdx,dword ptr [rsp+56]    ;rdx = sign_extend(g)
    imul rdx,qword ptr [rsp+64]     ;rdx = g * h
    imul rcx,rdx            ;rcx = e * f * g * h
; Compute the final product
    imul rax,rcx            ;rax = final product
    ret
IntegerMul_ endp
; extern "C" int UnsignedIntegerDiv_(uint8_t a, uint16_t b, uint32_t c, uint64_t d, uint8_t e, uint16_t f, uint32_t g, uint64_t h, uint64_t* quo, uint64_t* rem);
UnsignedIntegerDiv_ proc
; Calculate a + b + c + d
    movzx rax,cl            ;rax = zero_extend(a)
    movzx rdx,dx            ;rdx = zero_extend(b)
    add rax,rdx             ;rax = a + b
    mov r8d,r8d             ;r8 = zero_extend(c)
    add r8,r9              ;r8 = c + d
    add rax,r8             ;rax = a + b + c + d
    xor rdx,rdx             ;rdx:rax = a + b + c + d
; Calculate e + f + g + h
    movzx r8,byte ptr [rsp+40]     ;r8 = zero_extend(e)
    movzx r9,word ptr [rsp+48]     ;r9 = zero_extend(f)
    add r8,r9              ;r8 = e + f
    mov r10d,[rsp+56]          ;r10 = zero_extend(g)
    add r10,[rsp+64]          ;r10 = g + h;
    add r8,r10             ;r8 = e + f + g + h
    jnz DivOK              ;jump if divisor is not zero
    xor eax,eax             ;set error return code
    jmp done
; Calculate (a + b + c + d) / (e + f + g + h)
DivOK: div r8               ;unsigned divide rdx:rax / r8
    mov rcx,[rsp+72]
    mov [rcx],rax            ;save quotient
    mov rcx,[rsp+80]
    mov [rcx],rdx            ;save remainder
    mov eax,1              ;set success return code
Done:  ret
UnsignedIntegerDiv_ endp
    end
Listing 2-5.

Example Ch02_05

The assembly language function IntegerMul_ calculates the product of eight signed integers ranging in size from 8 bits to 64 bits. The C++ declaration for this function uses the fixed-sized integer types that are declared in the header file <cstdint> instead of the normal long long, int, short, and char. Some assembly language programmers (including me) prefer to use fixed-sized integer types for assembly language function declarations since it emphasizes the exact size of the argument. The declaration of function UnsignedIntegerDiv_, which demonstrates how to perform unsigned integer division, also uses fixed-size integer types. Figure 2-2 illustrates the contents of the stack at entry to IntegerMul_.
../images/326959_2_En_2_Chapter/326959_2_En_2_Fig2_HTML.jpg
Figure 2-2.

Argument registers and stack at entry to function IntegerMul_

The first instruction of IntegerMul_, movsx rax,cl (Move with Sign Extension), sign-extends a copy of the 8-bit integer value a that’s in register CL to 64 bits and saves this value in register RAX. Note that the original value in register CL is unaltered by this operation. Another movsx instruction follows that saves a 64-bit sign-extend copy of the 16-bit value d to RDX. Like the previous movsx instruction, the source operand is not modified by this operation. An imul rax,rdx instruction computes the product of a and b. The two-operand form of the imul instruction that’s used here saves only the lower 64 bits of the 128-bit product in the destination operand RAX. The next instruction movsxd rcx,r8d sign-extends the 32-bit operand c to 64 bits. Note that a different instruction mnemonic is required when sign extending a 32-bit integer to 64 bits. The next two imul instructions compute the intermediate product a * b * c * d.

Calculation of the second intermediate product e * f * g * h is carried out next. All of these argument values were passed using the stack as shown in Figure 2-2. The movsx rcx,byte ptr [rsp+40] sign extends a copy of the 8-bit argument value e that’s located on the stack and saves the result to register RCX. The text byte ptr is a MASM directive that acts like a C++ cast operator and conveys to the assembler the size of the source operand. Without the byte ptr directive, the movsx instruction is ambiguous since several different sizes are possible for the source operand. The argument value f is loaded next using a movsx rdx,word ptr [rsp+48] instruction . Following calculation of the intermediate product e * f using an imul instruction, a movsxd rdx,dword ptr[rsp+56] instruction loads a sign-extended copy of g into RDX. This is followed by an imul rdx,qword ptr[rsp+64] instruction that calculates the intermediate product g * h. Use of the qword ptr directive is optional here; size directives are often used in this manner to improve program readability. The final two imul instructions calculate the final product.

Figure 2-3 illustrates the contents of the stack at entry to the function UnsignedIntegerDiv_. This function calculates the quotient and remainder of the expression (a + b + c + d) / (e + f + g + g). As implied by its name, UnsignedIntegerDiv_ uses unsigned integer arguments of different sizes and performs unsigned integer division. In order to calculate the correct results, the smaller-sized arguments must be zero-extended prior to any arithmetic operations. The movzx rax,cl and movzx rdx,dx instructions load zero-extended copies of argument values a and b into their respective destination registers. The add rax,rdx instruction that follows next calculates the intermediate sum a + b. At first glance, the mov r8d,r8d instruction that follows seems superfluous, but it’s actually performing a necessary operation. When an x86 processor is running in 64-bit mode, instructions that employ 32-bit operands produce 32-bit results. If the destination operand is a 32-bit register, the high-order 32 bits (i.e., bits 63 – 32) of the corresponding 64-bit register are set to zero. The mov r8d,r8d instruction is used here to zero-extend the 32-bit value c that’s already loaded in register R8D to a 64-bit value in R8. The next two add instructions calculate the intermediate sum a + b + c + d and save the result to RAX. The ensuing xor rdx,rdx instruction yields a 128-bit zero-extended dividend value that’s stored in register pair RDX:RAX.
../images/326959_2_En_2_Chapter/326959_2_En_2_Fig3_HTML.jpg
Figure 2-3.

Argument registers and stack at entry to function UnsignedIntegerDiv_

A similar sequence of instructions is used to calculate the intermediate sum e + f + g + h, with the main difference being that these arguments are loaded from the stack. This value is then tested to see if it’s equal to zero since it will be used as the divisor. If the divisor is not zero, a div r8 instruction performs unsigned integer division using register pair RDX:RAX as the dividend and register R8 as the divisor. The resulting quotient (RAX) and remainder (RDX) are then saved to the memory locations specified by the pointers quo and rem, which were passed on the stack. Here’s the output for example Ch02_05.
Results for IntegerMul
a = 2, b = -3, c = 8 d = 4, e = 3, f = -7 g = -5, h = 10
prod1 = -201600
prod2 = -201600
Results for UnsignedIntegerDiv
a = 12, b = 17, c = 71000000 d = 90000000000, e = 101, f = 37 g = 25, h = 5
quo1 = 536136904, rem1 = 157
quo2 = 536136904, rem2 = 157

Memory Addressing and Condition Codes

Thus far the source code examples of this chapter have primarily illustrated how to use basic arithmetic and logical instructions. In this section, you’ll learn more about the x86’s memory addressing modes. You’ll also examine sample code that demonstrates how to exploit some of the x86’s condition-code based instructions.

Memory Addressing Modes

You learned in Chapter 1 that the x86-64 instruction set supports a variety of addressing modes that can be used to reference an operand in memory. In this section, you’ll examine an assembly language function that illustrates how to use some of these modes. You’ll also learn how to initialize an assembly language lookup table and use assembly language global variables in a C++ function. Listing 2-6 shows the source code for example Ch02_06.
//------------------------------------------------
//        Ch02_06.cpp
//------------------------------------------------
#include "stdafx.h"
#include <iostream>
#include <iomanip>
using namespace std;
extern "C" int NumFibVals_, FibValsSum_;
extern "C" int MemoryAddressing_(int i, int* v1, int* v2, int* v3, int* v4);
int main()
{
  const int w = 5;
  const char nl = '\n';
  const char* delim = ", ";
  FibValsSum_ = 0;
  for (int i = -1; i < NumFibVals_ + 1; i++)
  {
    int v1 = -1, v2 = -1, v3 = -1, v4 = -1;
    int rc = MemoryAddressing_(i, &v1, &v2, &v3, &v4);
    cout << "i = " << setw(w - 1) << i << delim;
    cout << "rc = " << setw(w - 1) << rc << delim;
    cout << "v1 = " << setw(w) << v1 << delim;
    cout << "v2 = " << setw(w) << v2 << delim;
    cout << "v3 = " << setw(w) << v3 << delim;
    cout << "v4 = " << setw(w) << v4 << delim;
    cout << nl;
  }
  cout << "FibValsSum_ = " << FibValsSum_ << nl;
  return 0;
}
;-------------------------------------------------
;        Ch02_06.asm
;-------------------------------------------------
; Simple lookup table (.const section data is read only)
      .const
FibVals   dword 0, 1, 1, 2, 3, 5, 8, 13
      dword 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597
NumFibVals_ dword ($ - FibVals) / sizeof dword
      public NumFibVals_
; Data section (data is read/write)
      .data
FibValsSum_ dword ?       ;value to demo RIP-relative addressing
      public FibValsSum_
;
; extern "C" int MemoryAddressing_(int i, int* v1, int* v2, int* v3, int* v4);
;
; Returns:   0 = error (invalid table index), 1 = success
;
      .code
MemoryAddressing_ proc
; Make sure 'i' is valid
    cmp ecx,0
    jl InvalidIndex           ;jump if i < 0
    cmp ecx,[NumFibVals_]
    jge InvalidIndex          ;jump if i >= NumFibVals_
; Sign extend i for use in address calculations
    movsxd rcx,ecx           ;sign extend i
    mov [rsp+8],rcx           ;save copy of i (in rcx home area)
; Example #1 - base register
    mov r11,offset FibVals       ;r11 = FibVals
    shl rcx,2              ;rcx = i * 4
    add r11,rcx             ;r11 = FibVals + i * 4
    mov eax,[r11]            ;eax = FibVals[i]
    mov [rdx],eax            ;save to v1
; Example #2 - base register + index register
    mov r11,offset FibVals       ;r11 = FibVals
    mov rcx,[rsp+8]           ;rcx = i
    shl rcx,2              ;rcx = i * 4
    mov eax,[r11+rcx]          ;eax = FibVals[i]
    mov [r8],eax            ;save to v2
; Example #3 - base register + index register * scale factor
    mov r11,offset FibVals       ;r11 = FibVals
    mov rcx,[rsp+8]           ;rcx = i
    mov eax,[r11+rcx*4]         ;eax = FibVals[i]
    mov [r9],eax            ;save to v3
; Example #4 - base register + index register * scale factor + disp
    mov r11,offset FibVals-42      ;r11 = FibVals - 42
    mov rcx,[rsp+8]           ;rcx = i
    mov eax,[r11+rcx*4+42]       ;eax = FibVals[i]
    mov r10,[rsp+40]          ;r10 = ptr to v4
    mov [r10],eax            ;save to v4
; Example #5 - RIP relative
    add [FibValsSum_],eax        ;update sum
    mov eax,1              ;set success return code
    ret
InvalidIndex:
    xor eax,eax             ;set error return code
    ret
MemoryAddressing_ endp
    end
Listing 2-6.

Example Ch02_06

Toward the top of the C++ code are the requisite declaration statements for this example. Earlier in this chapter you learned how to reference a C++ global variable in an assembly language function. In this example, the opposite is illustrated. Storage space for the variables NumFibVals_ and FibValsSum_ is defined in the assembly language code, and these variables are referenced in the function main.

In the assembly language function MemoryOperands_, argument i is employed as an index into an array (or lookup table) of constant integers, while the four pointer arguments are used to save values loaded from the lookup table using different addressing modes. Near the top of Listing 2-6 is a .const directive, which defines a memory block that contains read-only data. Immediately following the .const directive, a lookup table named FibVals is defined. This table contains 16 doubleword integer values. The text dword is an assembler directive that is used to allocate storage space and optionally initialize a doubleword value (the text dd can also be used as a synonym for dword).

The line NumFibVals_ dword ($ - FibVals) / sizeof dword allocates storage space for a single doubleword value and initializes it with the number of doubleword elements in the lookup table FibVals. The $ character is an assembler symbol that equals the current value of the location counter (or offset from the beginning of the current memory block). Subtracting the offset of FibVals from $ yields the size of the table in bytes. Dividing this result by the size in bytes of a doubleword value generates the correct number of elements. These statements emulate a commonly-used technique in C++ to define and initialize a variable with the number of elements in an array:
const int Values[] = {10, 20, 30, 40, 50};
const int NumValues = sizeof(Values) / sizeof(int);

The final line of the .const section declares NumFibVals_ as a public symbol in order to enable its use in main. The .data directive denotes the start of a memory block that contains modifiable data. The FibValsSum_ dword ? statement defines an uninitialized doubleword value, and the subsequent public statement makes it globally accessible.

Let’s now look at the assembly language code for MemoryAddressing_. Upon entry into the function, the argument i is checked for validity since it will be used as an index into the lookup table FibVals. The cmp ecx,0 instruction compares the contents of ECX, which contains i, to the immediate value 0. As discussed earlier in this chapter, the processor carries out this comparison by subtracting the source operand from the destination operand. It then sets the status flags based on the result of the subtraction (the result is not saved to the destination operand). If the condition ecx < 0 is true, program control will be transferred to the location specified by the jl (Jump if Less) instruction. A similar sequence of instructions is used to determine if the value of i is too large. The cmp ecx,[NumFibVals_] instruction compares ECX against the number of elements in the lookup table. If ecx >= [NumFibVals] is true, a jump is performed to the target location specified by the jge (Jump if Greater or Equal) instruction.

Immediately following the validation of i, a movsxd rcx,ecx sign-extends the table index value to 64 bits. Sign-extending or zero-extending a 32-bit integer to a 64-bit integer is often necessary when using an addressing mode that employs an index register as you’ll soon see. The subsequent mov [rsp+8],rcx saves a copy of the signed-extended table index value to the RCX home area on the stack and is done primarily to exemplify use of the stack home area.

The remaining instructions of MemoryAddressing_ illustrate accessing items in the lookup table using various memory addressing modes. The first example uses a single base register to read an item from the table. In order to use a single base register, the function must explicitly calculate the address of the i-th table element, which is achieved by adding the offset (or starting address) of FibVals and the value i * 4. The mov r11,offset FibVals instruction loads R11 with the correct table offset value. This is followed by a shl rcx,2 instruction that determines the offset of the i-th item relative to the start of the lookup table. An add r11,rcx instruction calculates the final address. Once this is complete, the specified table value is read using a mov eax,[r11] instruction. It is then saved to the memory location specified by the argument v1.

In the second example, the table value is read using BaseReg+IndexReg memory addressing. This example is similar to the first one except that the processor computes the final effective address during execution of the mov eax,[r11+rcx] instruction. Note that recalculation of the lookup table element offset using the mov rcx,[rsp+8] and shl rcx,2 instructions is unnecessary here but included to illustrate use of the stack home area.

The third example demonstrates use of BaseReg+IndexReg*ScaleFactor memory addressing. In this example, the offset of FibVals and the value i are loaded into registers R11 and RCX, respectively. The correct table value is loaded into EAX using a mov eax,[r11+rcx*4] instruction. In the fourth (and somewhat contrived) example, BaseReg+IndexReg*ScaleFactor+Disp memory addressing is demonstrated. The fifth and final memory address mode example uses an add[FibValsSum_],eax instruction to demonstrate RIP-relative addressing. This instruction, which uses a memory location as a destination operand, updates a running sum that is ultimately displayed by the C++ code.

The function main that’s shown in Listing 2-6 contains a simple looping construct that exercises the function MemoryOperands_ including test cases with an invalid index. Note that the for loop uses the variable NumFibVals_, which was defined as a public symbol in the assembly language file. The output for the sample program Ch02_06 is shown here.
i =  -1, rc =  0, v1 =  -1, v2 =  -1, v3 =  -1, v4 =  -1,
i =  0, rc =  1, v1 =   0, v2 =   0, v3 =   0, v4 =   0,
i =  1, rc =  1, v1 =   1, v2 =   1, v3 =   1, v4 =   1,
i =  2, rc =  1, v1 =   1, v2 =   1, v3 =   1, v4 =   1,
i =  3, rc =  1, v1 =   2, v2 =   2, v3 =   2, v4 =   2,
i =  4, rc =  1, v1 =   3, v2 =   3, v3 =   3, v4 =   3,
i =  5, rc =  1, v1 =   5, v2 =   5, v3 =   5, v4 =   5,
i =  6, rc =  1, v1 =   8, v2 =   8, v3 =   8, v4 =   8,
i =  7, rc =  1, v1 =  13, v2 =  13, v3 =  13, v4 =  13,
i =  8, rc =  1, v1 =  21, v2 =  21, v3 =  21, v4 =  21,
i =  9, rc =  1, v1 =  34, v2 =  34, v3 =  34, v4 =  34,
i =  10, rc =  1, v1 =  55, v2 =  55, v3 =  55, v4 =  55,
i =  11, rc =  1, v1 =  89, v2 =  89, v3 =  89, v4 =  89,
i =  12, rc =  1, v1 =  144, v2 =  144, v3 =  144, v4 =  144,
i =  13, rc =  1, v1 =  233, v2 =  233, v3 =  233, v4 =  233,
i =  14, rc =  1, v1 =  377, v2 =  377, v3 =  377, v4 =  377,
i =  15, rc =  1, v1 =  610, v2 =  610, v3 =  610, v4 =  610,
i =  16, rc =  1, v1 =  987, v2 =  987, v3 =  987, v4 =  987,
i =  17, rc =  1, v1 = 1597, v2 = 1597, v3 = 1597, v4 = 1597,
i =  18, rc =  0, v1 =  -1, v2 =  -1, v3 =  -1, v4 =  -1,
FibValsSum_ = 4180

Given the multiple addressing modes that are available on an x86 processor, you might wonder which mode should be used. The answer to this question depends on a number of factors, including register availability, the number of times an instruction (or sequence of instructions) is expected to execute, instruction ordering, and memory space vs. execution time tradeoffs. Hardware features such as the processor’s underlying microarchitecture and cache sizes also need to be considered.

When coding an x86 assembly language function, one suggested guideline is to favor simple (a single base register or displacement) rather than complex (multiple registers) memory addressing. The drawback of this approach is that the simpler forms generally require the programmer to code longer instruction sequences and may consume more code space. The use of a simple form also may be imprudent if extra instructions are needed to preserve non-volatile registers on the stack (non-volatile registers are explained in Chapter 3). Chapter 15 considers in greater detail some of the issues and tradeoffs that can affect the efficiency of assembly language code.

Condition Codes

The final sample program of this chapter expounds on how to use the x86’s conditional instructions jcc (Conditional Jump) and cmovcc (Conditional Move). As you have already seen in a few of this chapter’s source code examples, the execution of a conditional instruction is contingent on its specified condition code and the state of one or more status flags. The source code example Ch02_07, which is shown in Listing 2-7, demonstrates a few more use cases for the previously-mentioned instructions.
//------------------------------------------------
//        Ch02_07.cpp
//------------------------------------------------
#include "stdafx.h"
#include <iostream>
#include <iomanip>
using namespace std;
extern "C" int SignedMinA_(int a, int b, int c);
extern "C" int SignedMaxA_(int a, int b, int c);
extern "C" int SignedMinB_(int a, int b, int c);
extern "C" int SignedMaxB_(int a, int b, int c);
void PrintResult(const char* s1, int a, int b, int c, int result)
{
  const int w = 4;
  cout << s1 << "(";
  cout << setw(w) << a << ", ";
  cout << setw(w) << b << ", ";
  cout << setw(w) << c << ") = ";
  cout << setw(w) << result << '\n';
}
int main()
{
  int a, b, c;
  int smin_a, smax_a, smin_b, smax_b;
  // SignedMin examples
  a = 2; b = 15; c = 8;
  smin_a = SignedMinA_(a, b, c);
  smin_b = SignedMinB_(a, b, c);
  PrintResult("SignedMinA", a, b, c, smin_a);
  PrintResult("SignedMinB", a, b, c, smin_b);
  cout << '\n';
  a = -3; b = -22; c = 28;
  smin_a = SignedMinA_(a, b, c);
  smin_b = SignedMinB_(a, b, c);
  PrintResult("SignedMinA", a, b, c, smin_a);
  PrintResult("SignedMinB", a, b, c, smin_b);
  cout << '\n';
  a = 17; b = 37; c = -11;
  smin_a = SignedMinA_(a, b, c);
  smin_b = SignedMinB_(a, b, c);
  PrintResult("SignedMinA", a, b, c, smin_a);
  PrintResult("SignedMinB", a, b, c, smin_b);
  cout << '\n';
  // SignedMax examples
  a = 10; b = 5; c = 3;
  smax_a = SignedMaxA_(a, b, c);
  smax_b = SignedMaxB_(a, b, c);
  PrintResult("SignedMaxA", a, b, c, smax_a);
  PrintResult("SignedMaxB", a, b, c, smax_b);
  cout << '\n';
  a = -3; b = 28; c = 15;
  smax_a = SignedMaxA_(a, b, c);
  smax_b = SignedMaxB_(a, b, c);
  PrintResult("SignedMaxA", a, b, c, smax_a);
  PrintResult("SignedMaxB", a, b, c, smax_b);
  cout << '\n';
  a = -25; b = -37; c = -17;
  smax_a = SignedMaxA_(a, b, c);
  smax_b = SignedMaxB_(a, b, c);
  PrintResult("SignedMaxA", a, b, c, smax_a);
  PrintResult("SignedMaxB", a, b, c, smax_b);
  cout << '\n';
}
;-------------------------------------------------
;        Ch02_07.asm
;-------------------------------------------------
; extern "C" int SignedMinA_(int a, int b, int c);
;
; Returns:   min(a, b, c)
    .code
SignedMinA_ proc
    mov eax,ecx
    cmp eax,edx             ;compare a and b
    jle @F               ;jump if a <= b
    mov eax,edx             ;eax = b
@@:   cmp eax,r8d             ;compare min(a, b) and c
    jle @F
    mov eax,r8d             ;eax = min(a, b, c)
@@:   ret
SignedMinA_ endp
; extern "C" int SignedMaxA_(int a, int b, int c);
;
; Returns:   max(a, b, c)
SignedMaxA_ proc
    mov eax,ecx
    cmp eax,edx             ;compare a and b
    jge @F               ;jump if a >= b
    mov eax,edx             ;eax = b
@@:   cmp eax,r8d             ;compare max(a, b) and c
    jge @F
    mov eax,r8d             ;eax = max(a, b, c)
@@:   ret
SignedMaxA_ endp
; extern "C" int SignedMinB_(int a, int b, int c);
;
; Returns:   min(a, b, c)
SignedMinB_ proc
    cmp ecx,edx
    cmovg ecx,edx            ;ecx = min(a, b)
    cmp ecx,r8d
    cmovg ecx,r8d            ;ecx = min(a, b, c)
    mov eax,ecx
    ret
SignedMinB_ endp
; extern "C" int SignedMaxB_(int a, int b, int c);
;
; Returns:   max(a, b, c)
SignedMaxB_ proc
    cmp ecx,edx
    cmovl ecx,edx            ;ecx = max(a, b)
    cmp ecx,r8d
    cmovl ecx,r8d            ;ecx = max(a, b, c)
    mov eax,ecx
    ret
SignedMaxB_ endp
    end
Listing 2-7.

Example Ch02_07

When developing code to implement a particular algorithm, it is often necessary to determine the minimum or maximum value of two numbers. The standard C++ library defines two template functions named std::min() and std::max() to perform these operations. The assembly language code that’s shown in Listing 2-7 contains several three-argument versions of signed-integer minimum and maximum functions. The purpose of these functions is to illustrate proper use of the jcc and cmovcc instructions. The first function, called SignedMinA_, finds the minimum value of three signed integers. The first code block determines min(a, b) using two instructions: cmp eax,ecx and jle @F. The cmp instruction, which you saw earlier in this chapter, subtracts the source operand from the destination operand and sets the status flags based on the result (the result is not saved). The operand of the jle (Jump if Less or Equal) instruction, @F, is an assembler symbol that designates nearest forward @@ label as the target of the conditional jump (the symbol @B can be used for backward jumps). Following calculation of min(a, b), the next code block determines min(min(a, b), c) using the same technique. With the result already present in register EAX, SignedMinA_ can return to the caller.

The function SignedMaxA_ uses the same approach to find the maximum of three signed integers. The only difference between SignedMaxA_ and SignedMinA_ is the use of a jge (Jump if Greater or Equal) instead of a jle instruction. Versions of SignedMinA_ and SignedMaxA_ that operate on unsigned integers can be easily created by changing the jle and jge instructions to jbe (Jump if Below or Equal) and jae (Jump if Above or Equal), respectively. Recall from the discussion in Chapter 1 that condition codes using the words “greater” and “less” are intended for signed integer operands, while “above” and “below” are used with unsigned integer operands.

The assembly language code also contains the functions SignedMinB_ and SignedMaxB_. These functions determine the minimum and maximum of three signed integers using conditional move instructions instead of conditional jumps. The cmovcc instruction tests the specified condition and if it’s true, the source operand is copied to the destination operand. If the specified condition is false, the destination operand is not altered.

If you examine the function SignedMinB_, you will notice that following the cmp ecx,edx instruction is a cmovg ecx,edx instruction. The cmovg (Move if Greater) instruction copies the contents of EDX to ECX if ECX is greater than EDX. In this example, registers ECX and EDX contain argument values a and b. Following execution of the cmovg instruction, register ECX contains min(a, b). Another cmp and cmovg instruction sequence follows which yields min(a, b, c). The same technique is used in SignedMaxB_, which employs cmovl instead of cmovg to save the largest signed integer. Unsigned versions of these functions can be easily created by using cmova and cmovb instead of cmovg and cmovl, respectively. Here’s the output for Ch02_07.
SignedMinA(  2,  15,  8) =  2
SignedMinB(  2,  15,  8) =  2
SignedMinA( -3, -22,  28) = -22
SignedMinB( -3, -22,  28) = -22
SignedMinA( 17,  37, -11) = -11
SignedMinB( 17,  37, -11) = -11
SignedMaxA( 10,  5,  3) =  10
SignedMaxB( 10,  5,  3) =  10
SignedMaxA( -3,  28,  15) =  28
SignedMaxB( -3,  28,  15) =  28
SignedMaxA( -25, -37, -17) = -17
SignedMaxB( -25, -37, -17) = -17

The use of a conditional move instruction to eliminate one or more conditional jump statements frequently results in faster code, especially in situations where the processor is unable to accurately predict whether the jump will be performed. You’ll learn more about some of issues related to optimal use of the conditional jump and conditional move instructions in Chapter 15.

Summary

Here are the key learning points of Chapter 2:
  • The add and sub instructions perform integer (signed and unsigned) addition and subtraction.

  • The imul and idiv instructions carry out signed integer multiplication and division. The corresponding instructions for unsigned integers are mul and div. The idiv and div instructions usually require the dividend to be sign- or zero-extended prior to use.

  • The and, or, and xor instructions are used to perform bitwise AND, inclusive OR, and exclusive OR operations. The shl and shr instructions execute logical left and right shifts; sar is used for arithmetic right shifts.

  • Nearly all arithmetic, logical, and shift instructions set the status flags to indicate the results of an operation. The cmp instruction also sets the status flags. The jcc and cmovcc instructions can be used to alter program flow or perform conditional data moves based on the state of one or more status flags.

  • The x86-64 instruction set supports a variety of different address modes for accessing operands stored in memory.

  • MASM uses the .code, .data, and .const directives to designate code, data, and constant data sections. The directives proc and endp denote the beginning and end of an assembly language function.

  • The Visual C++ calling convention requires a calling function to use registers RCX, RDX, R8, and R9 (or the low-order portions of these registers for values smaller than 64 bits) for the first four integer or pointer arguments. Additional arguments are passed on the stack.

  • To disable the creation of decorated names by the C++ compiler, assembly language functions must be declared using the extern "C" modifier. Global variables shared between C++ and assembly language code must also use the extern "C" modifier.