We hardly ever think about how sine or cosine are calculated without actually having a right-angled triangle with known lengths of each of the cathetus and the hypotenuse. There are at least two approaches to make those computations in a fast and efficient way:
- CORDIC algorithm: This stands for COordinate Rotation DIgital Computer. This one is implemented in simple calculators or primitive hardware devices.
- Taylor series: A fast approximation algorithm. It does not provide the exact value, but is definitely enough for our needs.
LIBC on the other hand uses a different algorithm, which we could implement here, but it would be much more than a simple example. Therefore, what we are using in our code is a simple implementation of the simplest approximation algorithm, which provides us with a nice precision (much nicer than we need in this program) of up to the sixth digit after the point--the Taylor series for trigonometric functions (also known as Maclaurin series).
The formula for sine computation using the Taylor series is as follows:
Here, the ellipsis denote an infinite function. However, we do not need to run it forever to obtain values of satisfactory precision (after all, we are only interested in 2 digits after the point), instead, we will run it for 8 iterations.
Just as with the adjust() procedure, we will not follow any specific calling convention and, since the parameter we need to compute sine for is already in XMM0, we will simply leave it there. The head of the the sin_taylor_series procedure does not contain anything new for us:
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;
; Calculation of SIN() using the Taylor Series
; approximation:
; sin(x) = x - x^3/3! + x^5/5! - x^7/7! + x^9/9! ...
; Values to calculate the SIN() of are in XMM0 register
; Return values are in XMM0 register
;-----------------------------------------------------
sin_taylor_series:
push ebp ; Create stack frame for 5 XMM registers
mov ebp, esp
sub esp, 5 * 16
push eax ecx ; Temporarily store EAX and ECX
xor eax, eax ; and set them to 0
xor ecx, ecx
movups [ebp - 16], xmm1 ; Temporarily store XMM1 to XMM5 on stack or, to be more
movups [ebp - 16 * 2], xmm2 ; precise, in local variables.
movups [ebp - 16 * 3], xmm3
movups [ebp - 16 * 4], xmm4
movups [ebp - 16 * 5], xmm5
movaps xmm1, xmm0 ; Copy the parameter to XMM1 and XMM2
movaps xmm2, xmm0
mov ecx, 3 ; Set ECX to the first exponent
The following computation loop is simple and does not contain any instructions that we have not met yet. However, there are two procedure calls taking two parameters each. Parameters are passed with the XMM0 register (three single-precision floating-point numbers) and the ECX register containing the currently used value of the exponent:
.l1:
movaps xmm0, xmm2 ; Exponentiate the initial parameter
call pow
movaps xmm3, xmm0
call fact ; Calculate the factorial of current exponent
movaps xmm4, xmm0
divps xmm3, xmm4 ; Divide the exponentiated parameter by the factorial of the exponent
test eax, 1 ; Check iteration for being odd number, add the result to accumulator
; subtract otherwise
jnz .plus
subps xmm1, xmm3
jmp @f
.plus:
addps xmm1, xmm3
@@: ; Increment current exponent by 2
add ecx, 2
inc eax
cmp eax, 8 ; and continue till EAX is 8
jb .l1
movaps xmm0, xmm1 ; Store results into XMM0
All computations have completed and we now have sine values for the three inputs. For the first iteration, the inputs in XMM0 would be as follows:
| bits 96 - 127 | bits 64 - 95 | bits 32 - 63 | bits 0 - 31 | register name |
| (irrelevant) | 0.28564453 | 4.8244629 | 2.5952148 | XMM0 |
Also, the result of our sin() approximation with eight iterations of Taylor series is as follows:
| bits 96 - 127 | bits 64 - 95 | bits 32 - 63 | bits 0 - 31 | register name |
| (irrelevant) | 0.28177592 | -0.99365967 | 0.51959586 | XMM0 |
This shows a perfect (at least for our needs) level of approximation. Then, we restore the previously saved XMM registers and return to caller procedure:
movups xmm1, [ebp - 16]
movups xmm2, [ebp - 16 * 2]
movups xmm3, [ebp - 16 * 3]
movups xmm4, [ebp - 16 * 4]
movups xmm5, [ebp - 16 * 5]
pop ecx eax
mov esp, ebp
pop ebp
ret