~/Using SIMD Instructions in C plus plus

Sep 15, 2021


SIMD lets programs process multiple data points with a single instruction, boosting performance in data parallel tasks. In C plus plus, SIMD is accessed via intrinsics, libraries, or compiler auto-vectorization.

To use SIMD directly, include headers like <immintrin.h> for Intel x86 targets. For portability, libraries such as SIMD Everywhere or Highway provide abstractions.

Example of adding float arrays with AVX:

1
2
3
4
5
6
7
#include <immintrin.h>
void add_float_arrays_avx(const float* a, const float* b, float* result) {
    __m256 v1 = _mm256_loadu_ps(a);
    __m256 v2 = _mm256_loadu_ps(b);
    __m256 sum = _mm256_add_ps(v1, v2);
    _mm256_storeu_ps(result, sum);
}

To let the compiler auto-vectorize loops, enable -O2 or -O3 and check vectorization reports. See GCC auto-vectorization and Clang vectorization.

For best results, align data and consult compiler docs for supported extensions. Use SIMD when you need high performance in array processing or image/audio manipulation.

Tags: [simd] [cpp]