This is an interesting group of instructions that operate on strings of bytes, words, double words, or quad words (long mode only). These instructions have implicit operands only:
- The source address should be loaded into the ESI (RSI for long mode) register
- The destination address should be loaded into the EDI (RDI for long mode) register
- One of the EAX (for example, AL and AX) register variations is used with all of them except the MOVS* and CMPS* instructions
- The number of iterations (if any) should be in ECX (only used with the REP* prefix)
These instructions may be divided into five groups. In fact, to put it in a more precise manner, there are five instructions supporting four data sizes each:
- MOVSB/MOVSW/MOVSD/MOVSQ: These move byte, word, double word, or quad word in memory from the location pointed by ESI/RSI to the location pointed by EDI/RDI. The instruction's suffix specifies the size of data to be moved. Setting ECX/RCX to the amount of data items to be moved and prefixing it with the REP* prefix instructs the processor to execute this instruction ECX times or while the condition used with the REP* prefix (if any) is true.
- CMPSB/CMPSW/CMPSD/CMPSQ: These compare the data pointed by the ESI/RSI register to the data pointed by the EDI/RDI register. The iteration rules are the same as for MOVS* instruction.
- SCASB/SCASW/SCASD/SCASQ: These scan sequences of data items (size thereof is specified by the instruction's suffix) pointed by the EDI/RDI register for a value specified in AL, AX, EAX, or RAX, depending on the mode (protected or long) and the instruction's suffix. Iterations rules are the same as those for the MOVS* instruction.
- LODSB/LODSW/LODSD/LODSQ: These load AL, AX, EAX, or RAX (depending on operation mode and instruction's suffix) with a value from memory, pointed by the ESI/RSI register. The iteration rules are the same as those for the MOVS* instruction.
- STOSB/STOSW/STOSD/STOSQ: These store the value of the AL, AX, EAX, or RAX registers to the memory location pointed by the EDI/RDI register. These iteration rules are the same as those for the MOVS* instruction.
All of the preceding instructions have the explicit-operands form without a suffix, but in such a case, we need to specify the size of the operands. While the operands themselves may not be changed and therefore would always be ESI/RSI and EDI/RDI, all we may change is the size of the operand. The following is an example of such case:
scas byte[edi]
The following example shows typical usage of the SCAS* instruction--scanning a sequence of, in this particular case, bytes for specific value, which is stored in the AL register. The other instructions are similar in their usage.
; Calculate the length of a string
mov edi, hello
mov ecx, 0x100 ; Maximum allowed string length
xor al, al ; We will look for 0
rep scasb ; Scan for terminating 0
or ecx, 0 ; Check whether the string is too long
jz too_long
neg ecx ; Negate ECX
add ecx, 0x100 ; Get the length of the string
; ECX = 14 (includes terminating 0)
too_long:
; Handle this
hello db "Hello, World!", 0
The rep prefix, used in the preceding example, indicates to the processor that it should execute the prefixed command using the ECX register as a counter (in the same manner as it is used by the LOOP* instructions). However, there is one more optional condition designated by ZF (zero flag). Such a condition is specified by the condition suffix attached to REP. For example, using it with the E or Z suffix would instruct the processor to check ZF for being set before each iteration. Suffixes NE or NZ would instruct the processor to check ZF for being reset before each iteration. Consider the following example:
repz cmpsb
This would instruct the processor to keep comparing two sequences of bytes (pointed by the EDI/RDI and ESI/RSI registers) while they are equal and ECX is not zero.