Mastering Assembly Programming

This is an interesting group of instructions that operate on strings of bytes, words, double words, or quad words (long mode only). These instructions have implicit operands only:

The source address should be loaded into the ESI (RSI for long mode) register
The destination address should be loaded into the EDI (RDI for long mode) register
One of the EAX (for example, AL and AX) register variations is used with all of them except the MOVS* and CMPS* instructions
The number of iterations (if any) should be in ECX (only used with the REP* prefix)

ESI and/or EDI registers are automatically incremented by one for byte, two for word, and four for double word data. The direction of these operations (whether they increment or decrement ESI/EDI) is controlled by the direction flag (DF) in the EFlags register: DF = 1 : decrement ESI/EDI DF = 0 : increment ESI/EDI.

These instructions may be divided into five groups. In fact, to put it in a more precise manner, there are five instructions supporting four data sizes each:

MOVSB/MOVSW/MOVSD/MOVSQ: These move byte, word, double word, or quad word in memory from the location pointed by ESI/RSI to the location pointed by EDI/RDI. The instruction's suffix specifies the size of data to be moved. Setting ECX/RCX to the amount of data items to be moved and prefixing it with the REP* prefix instructs the processor to execute this instruction ECX times or while the condition used with the REP* prefix (if any) is true.
CMPSB/CMPSW/CMPSD/CMPSQ: These compare the data pointed by the ESI/RSI register to the data pointed by the EDI/RDI register. The iteration rules are the same as for MOVS* instruction.
SCASB/SCASW/SCASD/SCASQ: These scan sequences of data items (size thereof is specified by the instruction's suffix) pointed by the EDI/RDI register for a value specified in AL, AX, EAX, or RAX, depending on the mode (protected or long) and the instruction's suffix. Iterations rules are the same as those for the MOVS* instruction.
LODSB/LODSW/LODSD/LODSQ: These load AL, AX, EAX, or RAX (depending on operation mode and instruction's suffix) with a value from memory, pointed by the ESI/RSI register. The iteration rules are the same as those for the MOVS* instruction.
STOSB/STOSW/STOSD/STOSQ: These store the value of the AL, AX, EAX, or RAX registers to the memory location pointed by the EDI/RDI register. These iteration rules are the same as those for the MOVS* instruction.

All of the preceding instructions have the explicit-operands form without a suffix, but in such a case, we need to specify the size of the operands. While the operands themselves may not be changed and therefore would always be ESI/RSI and EDI/RDI, all we may change is the size of the operand. The following is an example of such case:

scas byte[edi]

The following example shows typical usage of the SCAS* instruction--scanning a sequence of, in this particular case, bytes for specific value, which is stored in the AL register. The other instructions are similar in their usage.

; Calculate the length of a string
   mov   edi, hello
   mov   ecx, 0x100    ; Maximum allowed string length
   xor   al, al        ; We will look for 0
   rep scasb           ; Scan for terminating 0
   or    ecx, 0        ; Check whether the string is too long
   jz    too_long
   neg   ecx           ; Negate ECX
   add   ecx, 0x100    ; Get the length of the string
                       ; ECX = 14 (includes terminating 0)
too_long:
   ; Handle this

hello db "Hello, World!", 0

The rep prefix, used in the preceding example, indicates to the processor that it should execute the prefixed command using the ECX register as a counter (in the same manner as it is used by the LOOP* instructions). However, there is one more optional condition designated by ZF (zero flag). Such a condition is specified by the condition suffix attached to REP. For example, using it with the E or Z suffix would instruct the processor to check ZF for being set before each iteration. Suffixes NE or NZ would instruct the processor to check ZF for being reset before each iteration. Consider the following example:

repz cmpsb

This would instruct the processor to keep comparing two sequences of bytes (pointed by the EDI/RDI and ESI/RSI registers) while they are equal and ECX is not zero.

Table of Contents for
Mastering Assembly Programming

String instructions

Table of Contents for Mastering Assembly Programming

Table of Contents for
Mastering Assembly Programming