Available on x86 or x86-64 only.
Expand description
Functions§
- cvtne2ps2bf16 🔒 ⚠
- cvtne2ps2bf16_
256 🔒 ⚠ - cvtne2ps2bf16_
512 🔒 ⚠ - cvtneps2bf16_
256 🔒 ⚠ - cvtneps2bf16_
512 🔒 ⚠ - dpbf16ps 🔒 ⚠
- dpbf16ps_
256 🔒 ⚠ - dpbf16ps_
512 🔒 ⚠ - _mm256_
cvtne2ps_ ⚠pbh Experimental avx512bf16,avx512vl
- Convert packed single-precision (32-bit) floating-point elements in two 256-bit vectors a and b to packed BF16 (16-bit) floating-point elements, and store the results in a 256-bit wide vector. Intel’s documentation
- _mm256_
cvtneps_ ⚠pbh Experimental avx512bf16,avx512vl
- Convert packed single-precision (32-bit) floating-point elements in a to packed BF16 (16-bit) floating-point elements, and store the results in dst. Intel’s documentation
- _mm256_
cvtpbh_ ⚠ps Experimental avx512bf16,avx512vl
- Converts packed BF16 (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst.
- _mm256_
dpbf16_ ⚠ps Experimental avx512bf16,avx512vl
- Compute dot-product of BF16 (16-bit) floating-point pairs in a and b, accumulating the intermediate single-precision (32-bit) floating-point elements with elements in src, and store the results in dst. Intel’s documentation
- _mm256_
mask_ ⚠cvtne2ps_ pbh Experimental avx512bf16,avx512vl
- Convert packed single-precision (32-bit) floating-point elements in two vectors a and b to packed BF16 (16-bit) floating-point elements and store the results in single vector dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Intel’s documentation
- _mm256_
mask_ ⚠cvtneps_ pbh Experimental avx512bf16,avx512vl
- Convert packed single-precision (32-bit) floating-point elements in a to packed BF16 (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Intel’s documentation
- _mm256_
mask_ ⚠cvtpbh_ ps Experimental avx512bf16,avx512vl
- Converts packed BF16 (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_
mask_ ⚠dpbf16_ ps Experimental avx512bf16,avx512vl
- Compute dot-product of BF16 (16-bit) floating-point pairs in a and b, accumulating the intermediate single-precision (32-bit) floating-point elements with elements in src, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Intel’s documentation
- _mm256_
maskz_ ⚠cvtne2ps_ pbh Experimental avx512bf16,avx512vl
- Convert packed single-precision (32-bit) floating-point elements in two vectors a and b to packed BF16 (16-bit) floating-point elements, and store the results in single vector dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Intel’s documentation
- _mm256_
maskz_ ⚠cvtneps_ pbh Experimental avx512bf16,avx512vl
- Convert packed single-precision (32-bit) floating-point elements in a to packed BF16 (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Intel’s documentation
- _mm256_
maskz_ ⚠cvtpbh_ ps Experimental avx512bf16,avx512vl
- Converts packed BF16 (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_
maskz_ ⚠dpbf16_ ps Experimental avx512bf16,avx512vl
- Compute dot-product of BF16 (16-bit) floating-point pairs in a and b, accumulating the intermediate single-precision (32-bit) floating-point elements with elements in src, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Intel’s documentation
- _mm512_
cvtne2ps_ ⚠pbh Experimental avx512bf16,avx512f
- Convert packed single-precision (32-bit) floating-point elements in two 512-bit vectors a and b to packed BF16 (16-bit) floating-point elements, and store the results in a 512-bit wide vector. Intel’s documentation
- _mm512_
cvtneps_ ⚠pbh Experimental avx512bf16,avx512f
- Convert packed single-precision (32-bit) floating-point elements in a to packed BF16 (16-bit) floating-point elements, and store the results in dst. Intel’s documentation
- _mm512_
cvtpbh_ ⚠ps Experimental avx512bf16,avx512f
- Converts packed BF16 (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst.
- _mm512_
dpbf16_ ⚠ps Experimental avx512bf16,avx512f
- Compute dot-product of BF16 (16-bit) floating-point pairs in a and b, accumulating the intermediate single-precision (32-bit) floating-point elements with elements in src, and store the results in dst.Compute dot-product of BF16 (16-bit) floating-point pairs in a and b, accumulating the intermediate single-precision (32-bit) floating-point elements with elements in src, and store the results in dst. Intel’s documentation
- _mm512_
mask_ ⚠cvtne2ps_ pbh Experimental avx512bf16,avx512f
- Convert packed single-precision (32-bit) floating-point elements in two vectors a and b to packed BF16 (16-bit) floating-point elements, and store the results in single vector dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Intel’s documentation
- _mm512_
mask_ ⚠cvtneps_ pbh Experimental avx512bf16,avx512f
- Convert packed single-precision (32-bit) floating-point elements in a to packed BF16 (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Intel’s documentation
- _mm512_
mask_ ⚠cvtpbh_ ps Experimental avx512bf16,avx512f
- Converts packed BF16 (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_
mask_ ⚠dpbf16_ ps Experimental avx512bf16,avx512f
- Compute dot-product of BF16 (16-bit) floating-point pairs in a and b, accumulating the intermediate single-precision (32-bit) floating-point elements with elements in src, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Intel’s documentation
- _mm512_
maskz_ ⚠cvtne2ps_ pbh Experimental avx512bf16,avx512f
- Convert packed single-precision (32-bit) floating-point elements in two vectors a and b to packed BF16 (16-bit) floating-point elements, and store the results in single vector dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Intel’s documentation
- _mm512_
maskz_ ⚠cvtneps_ pbh Experimental avx512bf16,avx512f
- Convert packed single-precision (32-bit) floating-point elements in a to packed BF16 (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Intel’s documentation
- _mm512_
maskz_ ⚠cvtpbh_ ps Experimental avx512bf16,avx512f
- Converts packed BF16 (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ ⚠dpbf16_ ps Experimental avx512bf16,avx512f
- Compute dot-product of BF16 (16-bit) floating-point pairs in a and b, accumulating the intermediate single-precision (32-bit) floating-point elements with elements in src, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Intel’s documentation
- _mm_
cvtne2ps_ ⚠pbh Experimental avx512bf16,avx512vl
- Convert packed single-precision (32-bit) floating-point elements in two 128-bit vectors a and b to packed BF16 (16-bit) floating-point elements, and store the results in a 128-bit wide vector. Intel’s documentation
- _mm_
cvtneps_ ⚠pbh Experimental avx512bf16,avx512vl
- Converts packed single-precision (32-bit) floating-point elements in a to packed BF16 (16-bit) floating-point elements, and store the results in dst.
- _mm_
cvtness_ ⚠sbh Experimental avx512bf16,avx512vl
- Converts a single-precision (32-bit) floating-point element in a to a BF16 (16-bit) floating-point element, and store the result in dst.
- _mm_
cvtpbh_ ⚠ps Experimental avx512bf16,avx512vl
- Converts packed BF16 (16-bit) floating-point elements in a to single-precision (32-bit) floating-point elements, and store the results in dst.
- _mm_
cvtsbh_ ⚠ss Experimental avx512bf16,avx512f
- Converts a single BF16 (16-bit) floating-point element in a to a single-precision (32-bit) floating-point element, and store the result in dst.
- _mm_
dpbf16_ ⚠ps Experimental avx512bf16,avx512vl
- Compute dot-product of BF16 (16-bit) floating-point pairs in a and b, accumulating the intermediate single-precision (32-bit) floating-point elements with elements in src, and store the results in dst. Intel’s documentation
- _mm_
mask_ ⚠cvtne2ps_ pbh Experimental avx512bf16,avx512vl
- Convert packed single-precision (32-bit) floating-point elements in two vectors a and b to packed BF16 (16-bit) floating-point elements, and store the results in single vector dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Intel’s documentation
- _mm_
mask_ ⚠cvtneps_ pbh Experimental avx512bf16,avx512vl
- Converts packed single-precision (32-bit) floating-point elements in a to packed BF16 (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_
mask_ ⚠cvtpbh_ ps Experimental avx512bf16,avx512vl
- Converts packed BF16 (16-bit) floating-point elements in a to single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_
mask_ ⚠dpbf16_ ps Experimental avx512bf16,avx512vl
- Compute dot-product of BF16 (16-bit) floating-point pairs in a and b, accumulating the intermediate single-precision (32-bit) floating-point elements with elements in src, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Intel’s documentation
- _mm_
maskz_ ⚠cvtne2ps_ pbh Experimental avx512bf16,avx512vl
- Convert packed single-precision (32-bit) floating-point elements in two vectors a and b to packed BF16 (16-bit) floating-point elements, and store the results in single vector dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Intel’s documentation
- _mm_
maskz_ ⚠cvtneps_ pbh Experimental avx512bf16,avx512vl
- Converts packed single-precision (32-bit) floating-point elements in a to packed BF16 (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_
maskz_ ⚠cvtpbh_ ps Experimental avx512bf16,avx512vl
- Converts packed BF16 (16-bit) floating-point elements in a to single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_
maskz_ ⚠dpbf16_ ps Experimental avx512bf16,avx512vl
- Compute dot-product of BF16 (16-bit) floating-point pairs in a and b, accumulating the intermediate single-precision (32-bit) floating-point elements with elements in src, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Intel’s documentation