According to the documentation of the Eigen library, it is sufficient to set the appropriate compiler flag to enable the generation of vectorized code. Let us look at CMakeLists.txt:
- We declare a C++11 project:
cmake_minimum_required(VERSION 3.5 FATAL_ERROR)
project(recipe-06 LANGUAGES CXX)
set(CMAKE_CXX_STANDARD 11)
set(CMAKE_CXX_EXTENSIONS OFF)
set(CMAKE_CXX_STANDARD_REQUIRED ON)
- Since we wish to use the Eigen library, we need to find its header files on the system:
find_package(Eigen3 3.3 REQUIRED CONFIG)
- We include the CheckCXXCompilerFlag.cmake standard module file:
include(CheckCXXCompilerFlag)
- We check that the -march=native compiler flag works:
check_cxx_compiler_flag("-march=native" _march_native_works)
- The alternative -xHost compiler flag is also checked:
check_cxx_compiler_flag("-xHost" _xhost_works)
- We set an empty variable, _CXX_FLAGS, to hold the one compiler flag that was found to work among the two we just checked. If we see _march_native_works, we set_CXX_FLAGS to -march=native. If we see _xhost_works, we set_CXX_FLAGS to -xHost. If none of them worked, we will leave _CXX_FLAGS empty and vectorization will be disabled:
set(_CXX_FLAGS)
if(_march_native_works)
message(STATUS "Using processor's vector instructions (-march=native compiler flag set)")
set(_CXX_FLAGS "-march=native")
elseif(_xhost_works)
message(STATUS "Using processor's vector instructions (-xHost compiler flag set)")
set(_CXX_FLAGS "-xHost")
else()
message(STATUS "No suitable compiler flag found for vectorization")
endif()
- For comparison, we also define an executable target for the unoptimized version where we do not use the preceding optimization flags:
add_executable(linear-algebra-unoptimized linear-algebra.cpp)
target_link_libraries(linear-algebra-unoptimized
PRIVATE
Eigen3::Eigen
)
- In addition, we define an optimized version:
add_executable(linear-algebra linear-algebra.cpp)
target_compile_options(linear-algebra
PRIVATE
${_CXX_FLAGS}
)
target_link_libraries(linear-algebra
PRIVATE
Eigen3::Eigen
)
- Let us compare the two executables—first we configure (in this case, -march=native_works):
$ mkdir -p build
$ cd build
$ cmake ..
...
-- Performing Test _march_native_works
-- Performing Test _march_native_works - Success
-- Performing Test _xhost_works
-- Performing Test _xhost_works - Failed
-- Using processor's vector instructions (-march=native compiler flag set)
...
- Finally, let us compile and compare timings:
$ cmake --build .
$ ./linear-algebra-unoptimized
result: -261.505
elapsed seconds: 1.97964
$ ./linear-algebra
result: -261.505
elapsed seconds: 1.05048