Index

A

Advanced Vector Extensions (AVX)
data types
packed floating-point
packed integer
scalar floating-point
differences between x86-SSE
execution lanes
intermixing x86-AVX and x86-SSE code
operand alignment
vzeroupper
YMM register high-order bit zeroing
instruction syntax
non-destructive source operand
registers
MXCSR
XMM registers
YMM registers
AVX2
MXCSR
non-destructive source operand
operand alignment
packed floating-point
packed integer
variable bit shift
XMM registers
YMM registers
AVX-512
conditional execution and merging
merge masking
zero masking
data types
embedded broadcast
instruction-level rounding
round down
round to nearest
round to zero
round up
suppress all exceptions
instruction set extensions
AV512CD
AVX512BW
AVX512DQ
AVX512F
AVX512VL
instruction syntax
conditional execution and merging
embedded broadcast
instruction-level rounding
merge masking
predicate mask
register sets
MXCSR
opmask registers
XMM registers
YMM registers
ZMM registers
zero masking
Array of structures (AOS)
Array operations
column means
row-major ordering
least squares
min-max
simple calculations
square roots
Arrays
accessing elements
comparing
one-dimensional
reversal
row-major ordering
two-dimensional
row and column indices

B

Benchmark timing measurements
csv file
TRIMMEAN

C

C++
classes
AlignedArray
AlignedMem
array
BmThreadTimer
default_random_engine
ImageMatrix
matrix
mutex
OS
thread
uniform_int_distribution
unique_ptr
lvalue
rvalue
size_t
specifiers
alignas
Cache
cache line
L1 data (D-Cache)
L1 instruction (I-Cache)
L2
L3
non-temporal data
pollution
slice
temporal data
Conditional jump
Conditional move
Condition codes
Convolution
discrete equation
input signal
padding
kernel
fixed size
variable size
output signal
response signal
SIMD equations
theory
YMM registers
ZMM registers
Correlation coefficient
CPU Identification (CPUID)
AVX-512 feature flags
feature flag
host operating system
OSXSAVE
leaf value
memory caches
return results
serializing instruction
sub-leaf value
xgetbv

D

Data blend
Data gather
indices
doubleword
quadword
merge control mask
vector scale-index-base
Data permute
indices
Data prefetch
hint
linked list
Differences between x86-32 and x86-64 programming
byte register restrictions
deprecated instructions
immediate operands
32-bit
invalid instructions
operand sizes

E

Enhanced bit manipulation
leading zero bits
trailing zero bits

F, G

Feature set identification
SeeCPUID
Flagless operations
multiplication
shift
FMA
SeeFused-Multiply-Add (FMA)
FMA3
SeeFused-Multiply-Add (FMA)
FMA4
SeeFused-Multiply-Add (FMA)
Fundamental data types
byte
double quadword
doubleword
little endian ordering
proper alignment
quadword
word
Fused-Multiply-Add (FMA)
arithmetic
convolution functions
packed
scalar
data dependencies
multiple registers
operand ordering scheme
packed
rounding
MXCSR.RC
scalar
value discrepancies

H

Half-precision floating-point
encoding
exponent
sign bit
significand
F16C
Half-precision floating-point conversions
rounding mode

I

IEEE 754
binary encoding
exponent
sign bit
significand
special values
denormal
floating-point zero
infinity
NaN
QNaN
SNaN
Image processing
image histogram
image statistics
mean
standard deviation
image thresholding
mask image
pixel clipping
pixel conversions
instruction-level rounding
size reduction
pixel mean
pixel minimum-maximum
RGB pixel min-max values
macro text string
RGB to grayscale conversion
color conversion coefficients
size reduction
weighted sum
thresholding
mask image
Instruction operands
immediate
memory
register
Instruction pipeline
allocate rename block
branch prediction unit
decoded instruction cache
execution engine
execution unit
instruction decoder
instruction fetch and pre-decode
instruction queue
loop stream detector
micro-op instruction queue
retire unit
scheduler
Instruction set extensions
ADX
BMI1
BMI2
F16C
FMA
LZCNT
POPCNT
Integer arithmetic
addition
division
logical operations
mixed sizes
multiplication
shift operations
subtraction

J, K

Jump table

L

Linked list
node
data
end-of-list terminator
link
Loop unrolling

M

MASM
SeeMicrosoft Macro Assembler (MASM)
Matrix operations
inverse
Cayley-Hamilton theorem
multiplication
transposition
Matrix-vector multiplication
equations
permutation of vector components
Memory addressing modes
base register
base register + disp
base register + index register
base register + index register + disp
base register + index register * scale factor
base register + index register * scale factor + disp
effective address calculation
index * scale factor + disp
RIP + disp (RIP relative)
RIP relative
Microarchitecture
Coffee Lake
Haswell
Kaby Lake
Skylake
Skylake Server
Micro-op
macro-fusion
micro-fusion
Microsoft Macro Assembler (MASM)
comment line
custom segment
directive
=
align
.allocstack
bcst
byte ptr
catstr
.code
.const
.data
dup
dword
dword ptr
endp
.endprolog
ends
equ
.erridni
macro
proc
proc frame
.pushreg
qword
qword ptr
readonly
real4
real8
.savexmm128
segment
.setframe
substr
word ptr
xmmword ptr
ymmword ptr
zmmword ptr
label
location counter ($)
macro text string
Miscellaneous data types
bit field
bit string
string
Multithreading
data arrays
MXCSR
control flags
rounding control
rounding mode
status flags

N

Non-temporal memory store
arrays
hint
Numeric data types
floating-point
double-precision
single-precision
signed integers
unsigned integers

O

Optimization
basic techniques
data alignment
multi-byte values
packed floating-point
packed integer
floating-point arithmetic
denormals
loop unrolling
precision
register dependencies
program branches
backward conditional
branch prediction
forward conditional
loop unrolling
SIMD techniques
register spills

P, Q

Packed floating-point arithmetic
common operations
addition
compares
conversions
division
multiplication
subtraction
compares
conversions
unsigned integer
logical decisions
operations
absolute value
addition
division
multiplication
square root
subtraction
Packed integer arithmetic
basic arithmetic
doubleword
word
common operations
addition
multiplication
shifts
subtraction
operations
addition
shifts
subtraction
pack and unpack
size promotions
sign extended
zero extended

R

Registers
general purpose
8-bit
16-bit
32-bit
64-bit
MXCSR
RFLAGS
carry
direction
overflow
parity
sign
zero
RIP (instruction pointer)
RSP (stack pointer)
XMM
YMM
ZMM
RFLAGS
SeeRegisters
Ring interconnect

S, T, U

Scalar floating-point arithmetic
arrays
double-precision
matrices
operations
addition
compares
conversions
division
multiplication
square root
subtraction
single-precision
SIMD
SeeSingle Instruction Multiple Data (SIMD)
Single Instruction Multiple Data (SIMD)
arithmetic
horizontal addition
horizontal subtraction
packed floating-point
packed integer
saturated
wrapround
data types
xmmword
ymmword
zmmword
programming concepts
Smoothing operator
Gaussian filter
coefficients
Strings
concatenation
counting characters
direction flag
end-of-string character
Structure
member alignment
padding
Structure of arrays (SOA)
System agent

V, W

Vector cross product
component equation
gather
opmask register
scatter
Vector scale-index-base (VSIB)
SeeAVX2
Visual C++
calling convention
epilog macros
floating-point argument
floating-point return value
function epilog
function prolog
general-purpose register
integer argument
leaf function
local storage
non-leaf function
non-volatile register
prolog macros
register arguments
returning structures by value
return value
stack alignment
stack arguments
stack frame
stack layout
volatile register
XMM register
ZMM registers
decorated name
extern “C” modifier

X

XmmVal

Y

YmmVal

Z

ZmmVal