Mastering Assembly Programming

As we have seen, using SSE instructions is quite convenient and effective; although, as we were mostly loading data from memory to registers and moving it within the registers, we have not been able to see its actual effectiveness yet. There are two procedures called from the calculation loop that perform the actual computations. One of them is the adjust() procedure.

Due to the overall simplicity of the algorithm, and since each of the two procedures is called from exactly one place, we are not following any specific calling convention; instead, we're using the XMM0 register for passing the floating point values and the ECX register for passing integer parameters.

In the case of the adjust() procedure, we only have one parameter, which is already loaded into the XMM0 register, so we simply call the procedure:

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;
; Value adjustment before calculation of SIN()
; Parameter is in XMM0 register
; Return value is in XMM0 register
;-----------------------------------------------------
adjust:
   push ebp
   mov ebp, esp
   sub esp, 16 * 2      ; Create the stack frame for local variables

This is a standard way to create a stack frame for local variables and temporary storage of non-general-purpose registers used in the procedure by saving the stack pointer ESP/RSP in EBP/RBP registers (we are free to use other general-purpose registers). General-purpose registers may be saved on stack by issuing a push instruction right after the allocation of space for local variables. The allocation of space for local variables is performed by subtracting the overall size of variables from the ESP/RSP register.

Addressing the allocated space is shown in the following code:

   movups [ebp - 16], xmm1      ; Store XMM1 and XMM2 registers
   movups [ebp - 16 * 2], xmm2

In the preceding two lines, we temporarily store the content of the XMM1 and XMM2 registers as we are going to use them, but we need to preserve their values.

The adjustment of the input values is very simple and may be expressed by the following code in C:

return v - 2*PI*floorf(v/(2*PI));

However, in C, we would have to call this function for every value (unless we use intrinsic functions), while in Assembly, we may adjust all three simultaneously with a few simple SSE instructions:

movd xmm1, [pi_2]        ; Load singles of the XMM1 register with 2*PI
movlhps xmm1, xmm1
movsldup xmm1, xmm1

We are already familiar with the above sequence, which loads a double word into an XMM register and duplicates it to every single-precision float part of it. Here, we load 2*PI into XMM1.

The following algorithm performs the actual calculations:

We duplicate the input parameter into the XMM2 register
Divide its singles by 2*PI
Round down the result (SSE has no floor or ceiling instructions, instead we may use roundps and specify the rounding mode in the third operand; in our case, we instruct the processor to, roughly speaking, round down)
Multiply rounded down results by 2*PI
Subtract them from the initial value and get results that fit into the (0.0, 2*PI) range

and the Assembly implementation thereof is:

   movaps xmm2, xmm0           ; Move the input parameter to XMM2
   divps xmm2, xmm1            ; Divide its singles by 2*PI
   roundps xmm2, xmm2, 1b      ; Floor the results
   mulps xmm2, xmm1            ; Multiply floored results by 2*PI
   subps xmm0, xmm2            ; Subtract resulting values from the  
                               ; input parameter

   movups xmm2, [ebp - 16 * 2] ; Restore the XMM2 and XMM1 registers
   movups xmm1, [ebp - 16]

   mov esp, ebp                ; "Destroy" the stack frame and return
   pop ebp
   ret

The result of the last operation is already in XMM0, so we simply return from procedure to our calculation loop.

Table of Contents for
Mastering Assembly Programming

Adjustment of sine input values

Table of Contents for Mastering Assembly Programming

Table of Contents for
Mastering Assembly Programming