Module avx512bf16

Source

Available on x86 or x86-64 only.

Expand description

AVX512BF16 intrinsics.

Functions§

cvtne2ps2bf16 🔒 ^⚠
cvtne2ps2bf16_256 🔒 ^⚠
cvtne2ps2bf16_512 🔒 ^⚠
cvtneps2bf16_256 🔒 ^⚠
cvtneps2bf16_512 🔒 ^⚠
dpbf16ps 🔒 ^⚠
dpbf16ps_256 🔒 ^⚠
dpbf16ps_512 🔒 ^⚠
_mm256_cvtne2ps_pbh^⚠Experimentalavx512bf16,avx512vl: Convert packed single-precision (32-bit) floating-point elements in two 256-bit vectors a and b to packed BF16 (16-bit) floating-point elements, and store the results in a 256-bit wide vector. Intel’s documentation
_mm256_cvtneps_pbh^⚠Experimentalavx512bf16,avx512vl: Convert packed single-precision (32-bit) floating-point elements in a to packed BF16 (16-bit) floating-point elements, and store the results in dst. Intel’s documentation
_mm256_cvtpbh_ps^⚠Experimentalavx512bf16,avx512vl: Converts packed BF16 (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst.
_mm256_dpbf16_ps^⚠Experimentalavx512bf16,avx512vl: Compute dot-product of BF16 (16-bit) floating-point pairs in a and b, accumulating the intermediate single-precision (32-bit) floating-point elements with elements in src, and store the results in dst. Intel’s documentation
_mm256_mask_cvtne2ps_pbh^⚠Experimentalavx512bf16,avx512vl: Convert packed single-precision (32-bit) floating-point elements in two vectors a and b to packed BF16 (16-bit) floating-point elements and store the results in single vector dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Intel’s documentation
_mm256_mask_cvtneps_pbh^⚠Experimentalavx512bf16,avx512vl: Convert packed single-precision (32-bit) floating-point elements in a to packed BF16 (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Intel’s documentation
_mm256_mask_cvtpbh_ps^⚠Experimentalavx512bf16,avx512vl: Converts packed BF16 (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_mask_dpbf16_ps^⚠Experimentalavx512bf16,avx512vl: Compute dot-product of BF16 (16-bit) floating-point pairs in a and b, accumulating the intermediate single-precision (32-bit) floating-point elements with elements in src, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Intel’s documentation
_mm256_maskz_cvtne2ps_pbh^⚠Experimentalavx512bf16,avx512vl: Convert packed single-precision (32-bit) floating-point elements in two vectors a and b to packed BF16 (16-bit) floating-point elements, and store the results in single vector dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Intel’s documentation
_mm256_maskz_cvtneps_pbh^⚠Experimentalavx512bf16,avx512vl: Convert packed single-precision (32-bit) floating-point elements in a to packed BF16 (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Intel’s documentation
_mm256_maskz_cvtpbh_ps^⚠Experimentalavx512bf16,avx512vl: Converts packed BF16 (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_dpbf16_ps^⚠Experimentalavx512bf16,avx512vl: Compute dot-product of BF16 (16-bit) floating-point pairs in a and b, accumulating the intermediate single-precision (32-bit) floating-point elements with elements in src, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Intel’s documentation
_mm512_cvtne2ps_pbh^⚠Experimentalavx512bf16,avx512f: Convert packed single-precision (32-bit) floating-point elements in two 512-bit vectors a and b to packed BF16 (16-bit) floating-point elements, and store the results in a 512-bit wide vector. Intel’s documentation
_mm512_cvtneps_pbh^⚠Experimentalavx512bf16,avx512f: Convert packed single-precision (32-bit) floating-point elements in a to packed BF16 (16-bit) floating-point elements, and store the results in dst. Intel’s documentation
_mm512_cvtpbh_ps^⚠Experimentalavx512bf16,avx512f: Converts packed BF16 (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst.
_mm512_dpbf16_ps^⚠Experimentalavx512bf16,avx512f: Compute dot-product of BF16 (16-bit) floating-point pairs in a and b, accumulating the intermediate single-precision (32-bit) floating-point elements with elements in src, and store the results in dst.Compute dot-product of BF16 (16-bit) floating-point pairs in a and b, accumulating the intermediate single-precision (32-bit) floating-point elements with elements in src, and store the results in dst. Intel’s documentation
_mm512_mask_cvtne2ps_pbh^⚠Experimentalavx512bf16,avx512f: Convert packed single-precision (32-bit) floating-point elements in two vectors a and b to packed BF16 (16-bit) floating-point elements, and store the results in single vector dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Intel’s documentation
_mm512_mask_cvtneps_pbh^⚠Experimentalavx512bf16,avx512f: Convert packed single-precision (32-bit) floating-point elements in a to packed BF16 (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Intel’s documentation
_mm512_mask_cvtpbh_ps^⚠Experimentalavx512bf16,avx512f: Converts packed BF16 (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_dpbf16_ps^⚠Experimentalavx512bf16,avx512f: Compute dot-product of BF16 (16-bit) floating-point pairs in a and b, accumulating the intermediate single-precision (32-bit) floating-point elements with elements in src, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Intel’s documentation
_mm512_maskz_cvtne2ps_pbh^⚠Experimentalavx512bf16,avx512f: Convert packed single-precision (32-bit) floating-point elements in two vectors a and b to packed BF16 (16-bit) floating-point elements, and store the results in single vector dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Intel’s documentation
_mm512_maskz_cvtneps_pbh^⚠Experimentalavx512bf16,avx512f: Convert packed single-precision (32-bit) floating-point elements in a to packed BF16 (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Intel’s documentation
_mm512_maskz_cvtpbh_ps^⚠Experimentalavx512bf16,avx512f: Converts packed BF16 (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_dpbf16_ps^⚠Experimentalavx512bf16,avx512f: Compute dot-product of BF16 (16-bit) floating-point pairs in a and b, accumulating the intermediate single-precision (32-bit) floating-point elements with elements in src, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Intel’s documentation
_mm_cvtne2ps_pbh^⚠Experimentalavx512bf16,avx512vl: Convert packed single-precision (32-bit) floating-point elements in two 128-bit vectors a and b to packed BF16 (16-bit) floating-point elements, and store the results in a 128-bit wide vector. Intel’s documentation
_mm_cvtneps_pbh^⚠Experimentalavx512bf16,avx512vl: Converts packed single-precision (32-bit) floating-point elements in a to packed BF16 (16-bit) floating-point elements, and store the results in dst.
_mm_cvtness_sbh^⚠Experimentalavx512bf16,avx512vl: Converts a single-precision (32-bit) floating-point element in a to a BF16 (16-bit) floating-point element, and store the result in dst.
_mm_cvtpbh_ps^⚠Experimentalavx512bf16,avx512vl: Converts packed BF16 (16-bit) floating-point elements in a to single-precision (32-bit) floating-point elements, and store the results in dst.
_mm_cvtsbh_ss^⚠Experimentalavx512bf16,avx512f: Converts a single BF16 (16-bit) floating-point element in a to a single-precision (32-bit) floating-point element, and store the result in dst.
_mm_dpbf16_ps^⚠Experimentalavx512bf16,avx512vl: Compute dot-product of BF16 (16-bit) floating-point pairs in a and b, accumulating the intermediate single-precision (32-bit) floating-point elements with elements in src, and store the results in dst. Intel’s documentation
_mm_mask_cvtne2ps_pbh^⚠Experimentalavx512bf16,avx512vl: Convert packed single-precision (32-bit) floating-point elements in two vectors a and b to packed BF16 (16-bit) floating-point elements, and store the results in single vector dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Intel’s documentation
_mm_mask_cvtneps_pbh^⚠Experimentalavx512bf16,avx512vl: Converts packed single-precision (32-bit) floating-point elements in a to packed BF16 (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_mask_cvtpbh_ps^⚠Experimentalavx512bf16,avx512vl: Converts packed BF16 (16-bit) floating-point elements in a to single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_mask_dpbf16_ps^⚠Experimentalavx512bf16,avx512vl: Compute dot-product of BF16 (16-bit) floating-point pairs in a and b, accumulating the intermediate single-precision (32-bit) floating-point elements with elements in src, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Intel’s documentation
_mm_maskz_cvtne2ps_pbh^⚠Experimentalavx512bf16,avx512vl: Convert packed single-precision (32-bit) floating-point elements in two vectors a and b to packed BF16 (16-bit) floating-point elements, and store the results in single vector dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Intel’s documentation
_mm_maskz_cvtneps_pbh^⚠Experimentalavx512bf16,avx512vl: Converts packed single-precision (32-bit) floating-point elements in a to packed BF16 (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_maskz_cvtpbh_ps^⚠Experimentalavx512bf16,avx512vl: Converts packed BF16 (16-bit) floating-point elements in a to single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_maskz_dpbf16_ps^⚠Experimentalavx512bf16,avx512vl: Compute dot-product of BF16 (16-bit) floating-point pairs in a and b, accumulating the intermediate single-precision (32-bit) floating-point elements with elements in src, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Intel’s documentation

Module avx512bf16Copy item path

Functions§

Module avx512bf16