Mastering Assembly Programming

We hardly ever think about how sine or cosine are calculated without actually having a right-angled triangle with known lengths of each of the cathetus and the hypotenuse. There are at least two approaches to make those computations in a fast and efficient way:

CORDIC algorithm: This stands for COordinate Rotation DIgital Computer. This one is implemented in simple calculators or primitive hardware devices.
Taylor series: A fast approximation algorithm. It does not provide the exact value, but is definitely enough for our needs.

LIBC on the other hand uses a different algorithm, which we could implement here, but it would be much more than a simple example. Therefore, what we are using in our code is a simple implementation of the simplest approximation algorithm, which provides us with a nice precision (much nicer than we need in this program) of up to the sixth digit after the point--the Taylor series for trigonometric functions (also known as Maclaurin series).

The formula for sine computation using the Taylor series is as follows:

sin(x) = x - x³/3! + x⁵/5! - x⁷/7! + x⁹/9! ...

Here, the ellipsis denote an infinite function. However, we do not need to run it forever to obtain values of satisfactory precision (after all, we are only interested in 2 digits after the point), instead, we will run it for 8 iterations.

Just as with the adjust() procedure, we will not follow any specific calling convention and, since the parameter we need to compute sine for is already in XMM0, we will simply leave it there. The head of the the sin_taylor_series procedure does not contain anything new for us:

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;
; Calculation of SIN() using the Taylor Series
; approximation:
; sin(x) = x - x^3/3! + x^5/5! - x^7/7! + x^9/9! ...
; Values to calculate the SIN() of are in XMM0 register
; Return values are in XMM0 register
;-----------------------------------------------------
sin_taylor_series:
   push ebp                       ; Create stack frame for 5 XMM registers
   mov ebp, esp
   sub esp, 5 * 16
   push eax ecx                   ; Temporarily store EAX and ECX
   xor eax, eax                   ; and set them to 0
   xor ecx, ecx

   movups [ebp - 16], xmm1        ; Temporarily store XMM1 to XMM5 on stack or, to be more
   movups [ebp - 16 * 2], xmm2    ; precise, in local variables.
   movups [ebp - 16 * 3], xmm3
   movups [ebp - 16 * 4], xmm4
   movups [ebp - 16 * 5], xmm5

   movaps xmm1, xmm0              ; Copy the parameter to XMM1 and XMM2
   movaps xmm2, xmm0

   mov ecx, 3                     ; Set ECX to the first exponent

The following computation loop is simple and does not contain any instructions that we have not met yet. However, there are two procedure calls taking two parameters each. Parameters are passed with the XMM0 register (three single-precision floating-point numbers) and the ECX register containing the currently used value of the exponent:


.l1:
   movaps xmm0, xmm2     ; Exponentiate the initial parameter
   call pow
   movaps xmm3, xmm0

   call fact             ; Calculate the factorial of current exponent
   movaps xmm4, xmm0

   divps xmm3, xmm4      ; Divide the exponentiated parameter by the factorial of the exponent
   test eax, 1           ; Check iteration for being odd number, add the result to accumulator
                         ; subtract otherwise
   jnz .plus
   subps xmm1, xmm3
   jmp @f
.plus:
   addps xmm1, xmm3
@@:                     ; Increment current exponent by 2
   add ecx, 2           
   inc eax
   cmp eax, 8           ; and continue till EAX is 8
   jb .l1

   movaps xmm0, xmm1    ; Store results into XMM0

All computations have completed and we now have sine values for the three inputs. For the first iteration, the inputs in XMM0 would be as follows:

bits 96 - 127	bits 64 - 95	bits 32 - 63	bits 0 - 31	register name
(irrelevant)	0.28564453	4.8244629	2.5952148	XMM0

Also, the result of our sin() approximation with eight iterations of Taylor series is as follows:

bits 96 - 127	bits 64 - 95	bits 32 - 63	bits 0 - 31	register name
(irrelevant)	0.28177592	-0.99365967	0.51959586	XMM0

This shows a perfect (at least for our needs) level of approximation. Then, we restore the previously saved XMM registers and return to caller procedure:

   movups xmm1, [ebp - 16]
   movups xmm2, [ebp - 16 * 2]
   movups xmm3, [ebp - 16 * 3]
   movups xmm4, [ebp - 16 * 4]
   movups xmm5, [ebp - 16 * 5]

   pop ecx eax
   mov esp, ebp
   pop ebp
   ret

Table of Contents for
Mastering Assembly Programming

Computing sine

Table of Contents for Mastering Assembly Programming

Table of Contents for
Mastering Assembly Programming