Module avx512fp16

Source

Available on x86 or x86-64 only.

Macros§

cmp_asm 🔒
fpclass_asm 🔒

Functions§

vaddph 🔒 ^⚠
vaddsh 🔒 ^⚠
vcmpsh 🔒 ^⚠
vcomish 🔒 ^⚠
vcvtdq2ph_128 🔒 ^⚠
vcvtdq2ph_256 🔒 ^⚠
vcvtdq2ph_512 🔒 ^⚠
vcvtpd2ph_128 🔒 ^⚠
vcvtpd2ph_256 🔒 ^⚠
vcvtpd2ph_512 🔒 ^⚠
vcvtph2dq_128 🔒 ^⚠
vcvtph2dq_256 🔒 ^⚠
vcvtph2dq_512 🔒 ^⚠
vcvtph2pd_128 🔒 ^⚠
vcvtph2pd_256 🔒 ^⚠
vcvtph2pd_512 🔒 ^⚠
vcvtph2psx_128 🔒 ^⚠
vcvtph2psx_256 🔒 ^⚠
vcvtph2psx_512 🔒 ^⚠
vcvtph2qq_128 🔒 ^⚠
vcvtph2qq_256 🔒 ^⚠
vcvtph2qq_512 🔒 ^⚠
vcvtph2udq_128 🔒 ^⚠
vcvtph2udq_256 🔒 ^⚠
vcvtph2udq_512 🔒 ^⚠
vcvtph2uqq_128 🔒 ^⚠
vcvtph2uqq_256 🔒 ^⚠
vcvtph2uqq_512 🔒 ^⚠
vcvtph2uw_128 🔒 ^⚠
vcvtph2uw_256 🔒 ^⚠
vcvtph2uw_512 🔒 ^⚠
vcvtph2w_128 🔒 ^⚠
vcvtph2w_256 🔒 ^⚠
vcvtph2w_512 🔒 ^⚠
vcvtps2phx_128 🔒 ^⚠
vcvtps2phx_256 🔒 ^⚠
vcvtps2phx_512 🔒 ^⚠
vcvtqq2ph_128 🔒 ^⚠
vcvtqq2ph_256 🔒 ^⚠
vcvtqq2ph_512 🔒 ^⚠
vcvtsd2sh 🔒 ^⚠
vcvtsh2sd 🔒 ^⚠
vcvtsh2si32 🔒 ^⚠
vcvtsh2ss 🔒 ^⚠
vcvtsh2usi32 🔒 ^⚠
vcvtsi2sh 🔒 ^⚠
vcvtss2sh 🔒 ^⚠
vcvttph2dq_128 🔒 ^⚠
vcvttph2dq_256 🔒 ^⚠
vcvttph2dq_512 🔒 ^⚠
vcvttph2qq_128 🔒 ^⚠
vcvttph2qq_256 🔒 ^⚠
vcvttph2qq_512 🔒 ^⚠
vcvttph2udq_128 🔒 ^⚠
vcvttph2udq_256 🔒 ^⚠
vcvttph2udq_512 🔒 ^⚠
vcvttph2uqq_128 🔒 ^⚠
vcvttph2uqq_256 🔒 ^⚠
vcvttph2uqq_512 🔒 ^⚠
vcvttph2uw_128 🔒 ^⚠
vcvttph2uw_256 🔒 ^⚠
vcvttph2uw_512 🔒 ^⚠
vcvttph2w_128 🔒 ^⚠
vcvttph2w_256 🔒 ^⚠
vcvttph2w_512 🔒 ^⚠
vcvttsh2si32 🔒 ^⚠
vcvttsh2usi32 🔒 ^⚠
vcvtudq2ph_128 🔒 ^⚠
vcvtudq2ph_256 🔒 ^⚠
vcvtudq2ph_512 🔒 ^⚠
vcvtuqq2ph_128 🔒 ^⚠
vcvtuqq2ph_256 🔒 ^⚠
vcvtuqq2ph_512 🔒 ^⚠
vcvtusi2sh 🔒 ^⚠
vcvtuw2ph_128 🔒 ^⚠
vcvtuw2ph_256 🔒 ^⚠
vcvtuw2ph_512 🔒 ^⚠
vcvtw2ph_128 🔒 ^⚠
vcvtw2ph_256 🔒 ^⚠
vcvtw2ph_512 🔒 ^⚠
vdivph 🔒 ^⚠
vdivsh 🔒 ^⚠
vfcmaddcph_mask3_128 🔒 ^⚠
vfcmaddcph_mask3_256 🔒 ^⚠
vfcmaddcph_mask3_512 🔒 ^⚠
vfcmaddcph_maskz_128 🔒 ^⚠
vfcmaddcph_maskz_256 🔒 ^⚠
vfcmaddcph_maskz_512 🔒 ^⚠
vfcmaddcsh_mask 🔒 ^⚠
vfcmaddcsh_maskz 🔒 ^⚠
vfcmulcph_128 🔒 ^⚠
vfcmulcph_256 🔒 ^⚠
vfcmulcph_512 🔒 ^⚠
vfcmulcsh 🔒 ^⚠
vfmaddcph_mask3_128 🔒 ^⚠
vfmaddcph_mask3_256 🔒 ^⚠
vfmaddcph_mask3_512 🔒 ^⚠
vfmaddcph_maskz_128 🔒 ^⚠
vfmaddcph_maskz_256 🔒 ^⚠
vfmaddcph_maskz_512 🔒 ^⚠
vfmaddcsh_mask 🔒 ^⚠
vfmaddcsh_maskz 🔒 ^⚠
vfmaddph_512 🔒 ^⚠
vfmaddsh 🔒 ^⚠
vfmaddsubph_128 🔒 ^⚠
vfmaddsubph_256 🔒 ^⚠
vfmaddsubph_512 🔒 ^⚠
vfmulcph_128 🔒 ^⚠
vfmulcph_256 🔒 ^⚠
vfmulcph_512 🔒 ^⚠
vfmulcsh 🔒 ^⚠
vfpclasssh 🔒 ^⚠
vgetexpph_128 🔒 ^⚠
vgetexpph_256 🔒 ^⚠
vgetexpph_512 🔒 ^⚠
vgetexpsh 🔒 ^⚠
vgetmantph_128 🔒 ^⚠
vgetmantph_256 🔒 ^⚠
vgetmantph_512 🔒 ^⚠
vgetmantsh 🔒 ^⚠
vmaxph_128 🔒 ^⚠
vmaxph_256 🔒 ^⚠
vmaxph_512 🔒 ^⚠
vmaxsh 🔒 ^⚠
vminph_128 🔒 ^⚠
vminph_256 🔒 ^⚠
vminph_512 🔒 ^⚠
vminsh 🔒 ^⚠
vmulph 🔒 ^⚠
vmulsh 🔒 ^⚠
vrcpph_128 🔒 ^⚠
vrcpph_256 🔒 ^⚠
vrcpph_512 🔒 ^⚠
vrcpsh 🔒 ^⚠
vreduceph_128 🔒 ^⚠
vreduceph_256 🔒 ^⚠
vreduceph_512 🔒 ^⚠
vreducesh 🔒 ^⚠
vrndscaleph_128 🔒 ^⚠
vrndscaleph_256 🔒 ^⚠
vrndscaleph_512 🔒 ^⚠
vrndscalesh 🔒 ^⚠
vrsqrtph_128 🔒 ^⚠
vrsqrtph_256 🔒 ^⚠
vrsqrtph_512 🔒 ^⚠
vrsqrtsh 🔒 ^⚠
vscalefph_128 🔒 ^⚠
vscalefph_256 🔒 ^⚠
vscalefph_512 🔒 ^⚠
vscalefsh 🔒 ^⚠
vsqrtph_512 🔒 ^⚠
vsqrtsh 🔒 ^⚠
vsubph 🔒 ^⚠
vsubsh 🔒 ^⚠
_mm256_abs_ph^⚠Experimentalavx512fp16,avx512vl: Finds the absolute value of each packed half-precision (16-bit) floating-point element in v2, storing the result in dst.
_mm256_add_ph^⚠Experimentalavx512fp16,avx512vl: Add packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst.
_mm256_castpd_ph^⚠Experimentalavx512fp16: Cast vector of type __m256d to type __m256h. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
_mm256_castph128_ph256^⚠Experimentalavx512fp16: Cast vector of type __m128h to type __m256h. The upper 8 elements of the result are undefined. In practice, the upper elements are zeroed. This intrinsic can generate the vzeroupper instruction, but most of the time it does not generate any instructions.
_mm256_castph256_ph128^⚠Experimentalavx512fp16: Cast vector of type __m256h to type __m128h. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
_mm256_castph_pd^⚠Experimentalavx512fp16: Cast vector of type __m256h to type __m256d. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
_mm256_castph_ps^⚠Experimentalavx512fp16: Cast vector of type __m256h to type __m256. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
_mm256_castph_si256^⚠Experimentalavx512fp16: Cast vector of type __m256h to type __m256i. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
_mm256_castps_ph^⚠Experimentalavx512fp16: Cast vector of type __m256 to type __m256h. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
_mm256_castsi256_ph^⚠Experimentalavx512fp16: Cast vector of type __m256i to type __m256h. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
_mm256_cmp_ph_mask^⚠Experimentalavx512fp16,avx512vl: Compare packed half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
_mm256_cmul_pch^⚠Experimentalavx512fp16,avx512vl: Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm256_conj_pch^⚠Experimentalavx512fp16,avx512vl: Compute the complex conjugates of complex numbers in a, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm256_cvtepi16_ph^⚠Experimentalavx512fp16,avx512vl: Convert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
_mm256_cvtepi32_ph^⚠Experimentalavx512fp16,avx512vl: Convert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
_mm256_cvtepi64_ph^⚠Experimentalavx512fp16,avx512vl: Convert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst. The upper 64 bits of dst are zeroed out.
_mm256_cvtepu16_ph^⚠Experimentalavx512fp16,avx512vl: Convert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
_mm256_cvtepu32_ph^⚠Experimentalavx512fp16,avx512vl: Convert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
_mm256_cvtepu64_ph^⚠Experimentalavx512fp16,avx512vl: Convert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst. The upper 64 bits of dst are zeroed out.
_mm256_cvtpd_ph^⚠Experimentalavx512fp16,avx512vl: Convert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst. The upper 64 bits of dst are zeroed out.
_mm256_cvtph_epi16^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst.
_mm256_cvtph_epi32^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst.
_mm256_cvtph_epi64^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst.
_mm256_cvtph_epu16^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst.
_mm256_cvtph_epu32^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers, and store the results in dst.
_mm256_cvtph_epu64^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst.
_mm256_cvtph_pd^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst.
_mm256_cvtsh_h^⚠Experimentalavx512fp16: Copy the lower half-precision (16-bit) floating-point element from a to dst.
_mm256_cvttph_epi16^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst.
_mm256_cvttph_epi32^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst.
_mm256_cvttph_epi64^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst.
_mm256_cvttph_epu16^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst.
_mm256_cvttph_epu32^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst.
_mm256_cvttph_epu64^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst.
_mm256_cvtxph_ps^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst.
_mm256_cvtxps_ph^⚠Experimentalavx512fp16,avx512vl: Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
_mm256_div_ph^⚠Experimentalavx512fp16,avx512vl: Divide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst.
_mm256_fcmadd_pch^⚠Experimentalavx512fp16,avx512vl: Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate to the corresponding complex numbers in c, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm256_fcmul_pch^⚠Experimentalavx512fp16,avx512vl: Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm256_fmadd_pch^⚠Experimentalavx512fp16,avx512vl: Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm256_fmadd_ph^⚠Experimentalavx512fp16,avx512vl: Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst.
_mm256_fmaddsub_ph^⚠Experimentalavx512fp16,avx512vl: Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst.
_mm256_fmsub_ph^⚠Experimentalavx512fp16,avx512vl: Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst.
_mm256_fmsubadd_ph^⚠Experimentalavx512fp16,avx512vl: Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst.
_mm256_fmul_pch^⚠Experimentalavx512fp16,avx512vl: Multiply packed complex numbers in a and b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm256_fnmadd_ph^⚠Experimentalavx512fp16,avx512vl: Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst.
_mm256_fnmsub_ph^⚠Experimentalavx512fp16,avx512vl: Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst.
_mm256_fpclass_ph_mask^⚠Experimentalavx512fp16,avx512vl: Test packed half-precision (16-bit) floating-point elements in a for special categories specified by imm8, and store the results in mask vector k. imm can be a combination of:
_mm256_getexp_ph^⚠Experimentalavx512fp16,avx512vl: Convert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision (16-bit) floating-point number representing the integer exponent, and store the results in dst. This intrinsic essentially calculates floor(log2(x)) for each element.
_mm256_getmant_ph^⚠Experimentalavx512fp16,avx512vl: Normalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store the results in dst. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign.
_mm256_load_ph^⚠Experimentalavx512fp16,avx512vl: Load 256-bits (composed of 16 packed half-precision (16-bit) floating-point elements) from memory into a new vector. The address must be aligned to 32 bytes or a general-protection exception may be generated.
_mm256_loadu_ph^⚠Experimentalavx512fp16,avx512vl: Load 256-bits (composed of 16 packed half-precision (16-bit) floating-point elements) from memory into a new vector. The address does not need to be aligned to any particular boundary.
_mm256_mask3_fcmadd_pch^⚠Experimentalavx512fp16,avx512vl: Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm256_mask3_fmadd_pch^⚠Experimentalavx512fp16,avx512vl: Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm256_mask3_fmadd_ph^⚠Experimentalavx512fp16,avx512vl: Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
_mm256_mask3_fmaddsub_ph^⚠Experimentalavx512fp16,avx512vl: Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
_mm256_mask3_fmsub_ph^⚠Experimentalavx512fp16,avx512vl: Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
_mm256_mask3_fmsubadd_ph^⚠Experimentalavx512fp16,avx512vl: Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
_mm256_mask3_fnmadd_ph^⚠Experimentalavx512fp16,avx512vl: Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
_mm256_mask3_fnmsub_ph^⚠Experimentalavx512fp16,avx512vl: Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
_mm256_mask_add_ph^⚠Experimentalavx512fp16,avx512vl: Add packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_mask_blend_ph^⚠Experimentalavx512fp16,avx512vl: Blend packed half-precision (16-bit) floating-point elements from a and b using control mask k, and store the results in dst.
_mm256_mask_cmp_ph_mask^⚠Experimentalavx512fp16,avx512vl: Compare packed half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_cmul_pch^⚠Experimentalavx512fp16,avx512vl: Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm256_mask_conj_pch^⚠Experimentalavx512fp16,avx512vl: Compute the complex conjugates of complex numbers in a, and store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm256_mask_cvtepi16_ph^⚠Experimentalavx512fp16,avx512vl: Convert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm256_mask_cvtepi32_ph^⚠Experimentalavx512fp16,avx512vl: Convert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm256_mask_cvtepi64_ph^⚠Experimentalavx512fp16,avx512vl: Convert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
_mm256_mask_cvtepu16_ph^⚠Experimentalavx512fp16,avx512vl: Convert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm256_mask_cvtepu32_ph^⚠Experimentalavx512fp16,avx512vl: Convert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm256_mask_cvtepu64_ph^⚠Experimentalavx512fp16,avx512vl: Convert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
_mm256_mask_cvtpd_ph^⚠Experimentalavx512fp16,avx512vl: Convert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
_mm256_mask_cvtph_epi16^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_mask_cvtph_epi32^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_mask_cvtph_epi64^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_mask_cvtph_epu16^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_mask_cvtph_epu32^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_mask_cvtph_epu64^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_mask_cvtph_pd^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm256_mask_cvttph_epi16^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_mask_cvttph_epi32^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_mask_cvttph_epi64^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_mask_cvttph_epu16^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_mask_cvttph_epu32^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_mask_cvttph_epu64^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_mask_cvtxph_ps^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm256_mask_cvtxps_ph^⚠Experimentalavx512fp16,avx512vl: Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm256_mask_div_ph^⚠Experimentalavx512fp16,avx512vl: Divide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_mask_fcmadd_pch^⚠Experimentalavx512fp16,avx512vl: Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm256_mask_fcmul_pch^⚠Experimentalavx512fp16,avx512vl: Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm256_mask_fmadd_pch^⚠Experimentalavx512fp16,avx512vl: Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm256_mask_fmadd_ph^⚠Experimentalavx512fp16,avx512vl: Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
_mm256_mask_fmaddsub_ph^⚠Experimentalavx512fp16,avx512vl: Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
_mm256_mask_fmsub_ph^⚠Experimentalavx512fp16,avx512vl: Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
_mm256_mask_fmsubadd_ph^⚠Experimentalavx512fp16,avx512vl: Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
_mm256_mask_fmul_pch^⚠Experimentalavx512fp16,avx512vl: Multiply packed complex numbers in a and b, and store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm256_mask_fnmadd_ph^⚠Experimentalavx512fp16,avx512vl: Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
_mm256_mask_fnmsub_ph^⚠Experimentalavx512fp16,avx512vl: Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
_mm256_mask_fpclass_ph_mask^⚠Experimentalavx512fp16,avx512vl: Test packed half-precision (16-bit) floating-point elements in a for special categories specified by imm8, and store the results in mask vector k using zeromask k (elements are zeroed out when the corresponding mask bit is not set). imm can be a combination of:
_mm256_mask_getexp_ph^⚠Experimentalavx512fp16,avx512vl: Convert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision (16-bit) floating-point number representing the integer exponent, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element.
_mm256_mask_getmant_ph^⚠Experimentalavx512fp16,avx512vl: Normalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign.
_mm256_mask_max_ph^⚠Experimentalavx512fp16,avx512vl: Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
_mm256_mask_min_ph^⚠Experimentalavx512fp16,avx512vl: Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
_mm256_mask_mul_pch^⚠Experimentalavx512fp16,avx512vl: Multiply packed complex numbers in a and b, and store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm256_mask_mul_ph^⚠Experimentalavx512fp16,avx512vl: Multiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_mask_rcp_ph^⚠Experimentalavx512fp16,avx512vl: Compute the approximate reciprocal of packed 16-bit floating-point elements in a and stores the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 1.5*2^-12.
_mm256_mask_reduce_ph^⚠Experimentalavx512fp16,avx512vl: Extract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_mask_roundscale_ph^⚠Experimentalavx512fp16,avx512vl: Round packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_mask_rsqrt_ph^⚠Experimentalavx512fp16,avx512vl: Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 1.5*2^-12.
_mm256_mask_scalef_ph^⚠Experimentalavx512fp16,avx512vl: Scale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_mask_sqrt_ph^⚠Experimentalavx512fp16,avx512vl: Compute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_mask_sub_ph^⚠Experimentalavx512fp16,avx512vl: Subtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_add_ph^⚠Experimentalavx512fp16,avx512vl: Add packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_cmul_pch^⚠Experimentalavx512fp16,avx512vl: Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm256_maskz_conj_pch^⚠Experimentalavx512fp16,avx512vl: Compute the complex conjugates of complex numbers in a, and store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm256_maskz_cvtepi16_ph^⚠Experimentalavx512fp16,avx512vl: Convert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_cvtepi32_ph^⚠Experimentalavx512fp16,avx512vl: Convert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_cvtepi64_ph^⚠Experimentalavx512fp16,avx512vl: Convert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
_mm256_maskz_cvtepu16_ph^⚠Experimentalavx512fp16,avx512vl: Convert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_cvtepu32_ph^⚠Experimentalavx512fp16,avx512vl: Convert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_cvtepu64_ph^⚠Experimentalavx512fp16,avx512vl: Convert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
_mm256_maskz_cvtpd_ph^⚠Experimentalavx512fp16,avx512vl: Convert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
_mm256_maskz_cvtph_epi16^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_cvtph_epi32^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_cvtph_epi64^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_cvtph_epu16^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_cvtph_epu32^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_cvtph_epu64^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_cvtph_pd^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_cvttph_epi16^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_cvttph_epi32^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_cvttph_epi64^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_cvttph_epu16^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_cvttph_epu32^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_cvttph_epu64^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_cvtxph_ps^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_cvtxps_ph^⚠Experimentalavx512fp16,avx512vl: Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_div_ph^⚠Experimentalavx512fp16,avx512vl: Divide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_fcmadd_pch^⚠Experimentalavx512fp16,avx512vl: Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate to the corresponding complex numbers in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm256_maskz_fcmul_pch^⚠Experimentalavx512fp16,avx512vl: Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm256_maskz_fmadd_pch^⚠Experimentalavx512fp16,avx512vl: Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm256_maskz_fmadd_ph^⚠Experimentalavx512fp16,avx512vl: Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
_mm256_maskz_fmaddsub_ph^⚠Experimentalavx512fp16,avx512vl: Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
_mm256_maskz_fmsub_ph^⚠Experimentalavx512fp16,avx512vl: Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
_mm256_maskz_fmsubadd_ph^⚠Experimentalavx512fp16,avx512vl: Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
_mm256_maskz_fmul_pch^⚠Experimentalavx512fp16,avx512vl: Multiply packed complex numbers in a and b, and store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm256_maskz_fnmadd_ph^⚠Experimentalavx512fp16,avx512vl: Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
_mm256_maskz_fnmsub_ph^⚠Experimentalavx512fp16,avx512vl: Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
_mm256_maskz_getexp_ph^⚠Experimentalavx512fp16,avx512vl: Convert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision (16-bit) floating-point number representing the integer exponent, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element.
_mm256_maskz_getmant_ph^⚠Experimentalavx512fp16,avx512vl: Normalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign.
_mm256_maskz_max_ph^⚠Experimentalavx512fp16,avx512vl: Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
_mm256_maskz_min_ph^⚠Experimentalavx512fp16,avx512vl: Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
_mm256_maskz_mul_pch^⚠Experimentalavx512fp16,avx512vl: Multiply packed complex numbers in a and b, and store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm256_maskz_mul_ph^⚠Experimentalavx512fp16,avx512vl: Multiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_rcp_ph^⚠Experimentalavx512fp16,avx512vl: Compute the approximate reciprocal of packed 16-bit floating-point elements in a and stores the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 1.5*2^-12.
_mm256_maskz_reduce_ph^⚠Experimentalavx512fp16,avx512vl: Extract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_roundscale_ph^⚠Experimentalavx512fp16,avx512vl: Round packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_rsqrt_ph^⚠Experimentalavx512fp16,avx512vl: Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 1.5*2^-12.
_mm256_maskz_scalef_ph^⚠Experimentalavx512fp16,avx512vl: Scale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_sqrt_ph^⚠Experimentalavx512fp16,avx512vl: Compute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_sub_ph^⚠Experimentalavx512fp16,avx512vl: Subtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_max_ph^⚠Experimentalavx512fp16,avx512vl: Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
_mm256_min_ph^⚠Experimentalavx512fp16,avx512vl: Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
_mm256_mul_pch^⚠Experimentalavx512fp16,avx512vl: Multiply packed complex numbers in a and b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm256_mul_ph^⚠Experimentalavx512fp16,avx512vl: Multiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst.
_mm256_permutex2var_ph^⚠Experimentalavx512fp16,avx512vl: Shuffle half-precision (16-bit) floating-point elements in a and b using the corresponding selector and index in idx, and store the results in dst.
_mm256_permutexvar_ph^⚠Experimentalavx512fp16,avx512vl: Shuffle half-precision (16-bit) floating-point elements in a using the corresponding index in idx, and store the results in dst.
_mm256_rcp_ph^⚠Experimentalavx512fp16,avx512vl: Compute the approximate reciprocal of packed 16-bit floating-point elements in a and stores the results in dst. The maximum relative error for this approximation is less than 1.5*2^-12.
_mm256_reduce_add_ph^⚠Experimentalavx512fp16,avx512vl: Reduce the packed half-precision (16-bit) floating-point elements in a by addition. Returns the sum of all elements in a.
_mm256_reduce_max_ph^⚠Experimentalavx512fp16,avx512vl: Reduce the packed half-precision (16-bit) floating-point elements in a by maximum. Returns the maximum of all elements in a.
_mm256_reduce_min_ph^⚠Experimentalavx512fp16,avx512vl: Reduce the packed half-precision (16-bit) floating-point elements in a by minimum. Returns the minimum of all elements in a.
_mm256_reduce_mul_ph^⚠Experimentalavx512fp16,avx512vl: Reduce the packed half-precision (16-bit) floating-point elements in a by multiplication. Returns the product of all elements in a.
_mm256_reduce_ph^⚠Experimentalavx512fp16,avx512vl: Extract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst.
_mm256_roundscale_ph^⚠Experimentalavx512fp16,avx512vl: Round packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst.
_mm256_rsqrt_ph^⚠Experimentalavx512fp16,avx512vl: Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst. The maximum relative error for this approximation is less than 1.5*2^-12.
_mm256_scalef_ph^⚠Experimentalavx512fp16,avx512vl: Scale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst.
_mm256_set1_ph^⚠Experimentalavx512fp16: Broadcast the half-precision (16-bit) floating-point value a to all elements of dst.
_mm256_set_ph^⚠Experimentalavx512fp16: Set packed half-precision (16-bit) floating-point elements in dst with the supplied values.
_mm256_setr_ph^⚠Experimentalavx512fp16: Set packed half-precision (16-bit) floating-point elements in dst with the supplied values in reverse order.
_mm256_setzero_ph^⚠Experimentalavx512fp16,avx512vl: Return vector of type __m256h with all elements set to zero.
_mm256_sqrt_ph^⚠Experimentalavx512fp16,avx512vl: Compute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst.
_mm256_store_ph^⚠Experimentalavx512fp16,avx512vl: Store 256-bits (composed of 16 packed half-precision (16-bit) floating-point elements) from a into memory. The address must be aligned to 32 bytes or a general-protection exception may be generated.
_mm256_storeu_ph^⚠Experimentalavx512fp16,avx512vl: Store 256-bits (composed of 16 packed half-precision (16-bit) floating-point elements) from a into memory. The address does not need to be aligned to any particular boundary.
_mm256_sub_ph^⚠Experimentalavx512fp16,avx512vl: Subtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst.
_mm256_undefined_ph^⚠Experimentalavx512fp16,avx512vl: Return vector of type __m256h with indetermination elements. Despite using the word “undefined” (following Intel’s naming scheme), this non-deterministically picks some valid value and is not equivalent to mem::MaybeUninit. In practice, this is typically equivalent to mem::zeroed.
_mm256_zextph128_ph256^⚠Experimentalavx512fp16: Cast vector of type __m256h to type __m128h. The upper 8 elements of the result are zeroed. This intrinsic can generate the vzeroupper instruction, but most of the time it does not generate any instructions.
_mm512_abs_ph^⚠Experimentalavx512fp16: Finds the absolute value of each packed half-precision (16-bit) floating-point element in v2, storing the result in dst.
_mm512_add_ph^⚠Experimentalavx512fp16: Add packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst.
_mm512_add_round_ph^⚠Experimentalavx512fp16: Add packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst. Rounding is done according to the rounding parameter, which can be one of:
_mm512_castpd_ph^⚠Experimentalavx512fp16: Cast vector of type __m512d to type __m512h. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
_mm512_castph128_ph512^⚠Experimentalavx512fp16: Cast vector of type __m128h to type __m512h. The upper 24 elements of the result are undefined. In practice, the upper elements are zeroed. This intrinsic can generate the vzeroupper instruction, but most of the time it does not generate any instructions.
_mm512_castph256_ph512^⚠Experimentalavx512fp16: Cast vector of type __m256h to type __m512h. The upper 16 elements of the result are undefined. In practice, the upper elements are zeroed. This intrinsic can generate the vzeroupper instruction, but most of the time it does not generate any instructions.
_mm512_castph512_ph128^⚠Experimentalavx512fp16: Cast vector of type __m512h to type __m128h. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
_mm512_castph512_ph256^⚠Experimentalavx512fp16: Cast vector of type __m512h to type __m256h. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
_mm512_castph_pd^⚠Experimentalavx512fp16: Cast vector of type __m512h to type __m512d. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
_mm512_castph_ps^⚠Experimentalavx512fp16: Cast vector of type __m512h to type __m512. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
_mm512_castph_si512^⚠Experimentalavx512fp16: Cast vector of type __m512h to type __m512i. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
_mm512_castps_ph^⚠Experimentalavx512fp16: Cast vector of type __m512 to type __m512h. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
_mm512_castsi512_ph^⚠Experimentalavx512fp16: Cast vector of type __m512i to type __m512h. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
_mm512_cmp_ph_mask^⚠Experimentalavx512fp16: Compare packed half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
_mm512_cmp_round_ph_mask^⚠Experimentalavx512fp16: Compare packed half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
_mm512_cmul_pch^⚠Experimentalavx512fp16: Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm512_cmul_round_pch^⚠Experimentalavx512fp16: Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm512_conj_pch^⚠Experimentalavx512fp16: Compute the complex conjugates of complex numbers in a, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm512_cvt_roundepi16_ph^⚠Experimentalavx512fp16: Convert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
_mm512_cvt_roundepi32_ph^⚠Experimentalavx512fp16: Convert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
_mm512_cvt_roundepi64_ph^⚠Experimentalavx512fp16: Convert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
_mm512_cvt_roundepu16_ph^⚠Experimentalavx512fp16: Convert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
_mm512_cvt_roundepu32_ph^⚠Experimentalavx512fp16: Convert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
_mm512_cvt_roundepu64_ph^⚠Experimentalavx512fp16: Convert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
_mm512_cvt_roundpd_ph^⚠Experimentalavx512fp16: Convert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
_mm512_cvt_roundph_epi16^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst.
_mm512_cvt_roundph_epi32^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst.
_mm512_cvt_roundph_epi64^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst.
_mm512_cvt_roundph_epu16^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst.
_mm512_cvt_roundph_epu32^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers, and store the results in dst.
_mm512_cvt_roundph_epu64^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst.
_mm512_cvt_roundph_pd^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst.
_mm512_cvtepi16_ph^⚠Experimentalavx512fp16: Convert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
_mm512_cvtepi32_ph^⚠Experimentalavx512fp16: Convert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
_mm512_cvtepi64_ph^⚠Experimentalavx512fp16: Convert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
_mm512_cvtepu16_ph^⚠Experimentalavx512fp16: Convert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
_mm512_cvtepu32_ph^⚠Experimentalavx512fp16: Convert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
_mm512_cvtepu64_ph^⚠Experimentalavx512fp16: Convert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
_mm512_cvtpd_ph^⚠Experimentalavx512fp16: Convert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
_mm512_cvtph_epi16^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst.
_mm512_cvtph_epi32^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst.
_mm512_cvtph_epi64^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst.
_mm512_cvtph_epu16^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst.
_mm512_cvtph_epu32^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers, and store the results in dst.
_mm512_cvtph_epu64^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst.
_mm512_cvtph_pd^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst.
_mm512_cvtsh_h^⚠Experimentalavx512fp16: Copy the lower half-precision (16-bit) floating-point element from a to dst.
_mm512_cvtt_roundph_epi16^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst.
_mm512_cvtt_roundph_epi32^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst.
_mm512_cvtt_roundph_epi64^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst.
_mm512_cvtt_roundph_epu16^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst.
_mm512_cvtt_roundph_epu32^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst.
_mm512_cvtt_roundph_epu64^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst.
_mm512_cvttph_epi16^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst.
_mm512_cvttph_epi32^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst.
_mm512_cvttph_epi64^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst.
_mm512_cvttph_epu16^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst.
_mm512_cvttph_epu32^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst.
_mm512_cvttph_epu64^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst.
_mm512_cvtx_roundph_ps^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst.
_mm512_cvtx_roundps_ph^⚠Experimentalavx512fp16: Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
_mm512_cvtxph_ps^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst.
_mm512_cvtxps_ph^⚠Experimentalavx512fp16: Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
_mm512_div_ph^⚠Experimentalavx512fp16: Divide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst.
_mm512_div_round_ph^⚠Experimentalavx512fp16: Divide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst. Rounding is done according to the rounding parameter, which can be one of:
_mm512_fcmadd_pch^⚠Experimentalavx512fp16: Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate to the corresponding complex numbers in c, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm512_fcmadd_round_pch^⚠Experimentalavx512fp16: Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate to the corresponding complex numbers in c, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm512_fcmul_pch^⚠Experimentalavx512fp16: Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm512_fcmul_round_pch^⚠Experimentalavx512fp16: Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1],
_mm512_fmadd_pch^⚠Experimentalavx512fp16: Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm512_fmadd_ph^⚠Experimentalavx512fp16: Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst.
_mm512_fmadd_round_pch^⚠Experimentalavx512fp16: Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm512_fmadd_round_ph^⚠Experimentalavx512fp16: Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst.
_mm512_fmaddsub_ph^⚠Experimentalavx512fp16: Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst.
_mm512_fmaddsub_round_ph^⚠Experimentalavx512fp16: Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst.
_mm512_fmsub_ph^⚠Experimentalavx512fp16: Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst.
_mm512_fmsub_round_ph^⚠Experimentalavx512fp16: Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst.
_mm512_fmsubadd_ph^⚠Experimentalavx512fp16: Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst.
_mm512_fmsubadd_round_ph^⚠Experimentalavx512fp16: Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst.
_mm512_fmul_pch^⚠Experimentalavx512fp16: Multiply packed complex numbers in a and b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm512_fmul_round_pch^⚠Experimentalavx512fp16: Multiply packed complex numbers in a and b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1]. Rounding is done according to the rounding parameter, which can be one of:
_mm512_fnmadd_ph^⚠Experimentalavx512fp16: Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst.
_mm512_fnmadd_round_ph^⚠Experimentalavx512fp16: Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst.
_mm512_fnmsub_ph^⚠Experimentalavx512fp16: Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst.
_mm512_fnmsub_round_ph^⚠Experimentalavx512fp16: Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst.
_mm512_fpclass_ph_mask^⚠Experimentalavx512fp16: Test packed half-precision (16-bit) floating-point elements in a for special categories specified by imm8, and store the results in mask vector k. imm can be a combination of:
_mm512_getexp_ph^⚠Experimentalavx512fp16: Convert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision (16-bit) floating-point number representing the integer exponent, and store the results in dst. This intrinsic essentially calculates floor(log2(x)) for each element.
_mm512_getexp_round_ph^⚠Experimentalavx512fp16: Convert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision (16-bit) floating-point number representing the integer exponent, and store the results in dst. This intrinsic essentially calculates floor(log2(x)) for each element. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
_mm512_getmant_ph^⚠Experimentalavx512fp16: Normalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store the results in dst. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign.
_mm512_getmant_round_ph^⚠Experimentalavx512fp16: Normalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store the results in dst. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
_mm512_load_ph^⚠Experimentalavx512fp16: Load 512-bits (composed of 32 packed half-precision (16-bit) floating-point elements) from memory into a new vector. The address must be aligned to 64 bytes or a general-protection exception may be generated.
_mm512_loadu_ph^⚠Experimentalavx512fp16: Load 512-bits (composed of 32 packed half-precision (16-bit) floating-point elements) from memory into a new vector. The address does not need to be aligned to any particular boundary.
_mm512_mask3_fcmadd_pch^⚠Experimentalavx512fp16: Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm512_mask3_fcmadd_round_pch^⚠Experimentalavx512fp16: Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate to the corresponding complex numbers in c using writemask k (the element is copied from c when the corresponding mask bit is not set), and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1, or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm512_mask3_fmadd_pch^⚠Experimentalavx512fp16: Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm512_mask3_fmadd_ph^⚠Experimentalavx512fp16: Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
_mm512_mask3_fmadd_round_pch^⚠Experimentalavx512fp16: Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm512_mask3_fmadd_round_ph^⚠Experimentalavx512fp16: Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
_mm512_mask3_fmaddsub_ph^⚠Experimentalavx512fp16: Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
_mm512_mask3_fmaddsub_round_ph^⚠Experimentalavx512fp16: Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
_mm512_mask3_fmsub_ph^⚠Experimentalavx512fp16: Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
_mm512_mask3_fmsub_round_ph^⚠Experimentalavx512fp16: Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
_mm512_mask3_fmsubadd_ph^⚠Experimentalavx512fp16: Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
_mm512_mask3_fmsubadd_round_ph^⚠Experimentalavx512fp16: Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
_mm512_mask3_fnmadd_ph^⚠Experimentalavx512fp16: Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
_mm512_mask3_fnmadd_round_ph^⚠Experimentalavx512fp16: Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
_mm512_mask3_fnmsub_ph^⚠Experimentalavx512fp16: Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
_mm512_mask3_fnmsub_round_ph^⚠Experimentalavx512fp16: Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
_mm512_mask_add_ph^⚠Experimentalavx512fp16: Add packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_add_round_ph^⚠Experimentalavx512fp16: Add packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Rounding is done according to the rounding parameter, which can be one of:
_mm512_mask_blend_ph^⚠Experimentalavx512fp16: Blend packed half-precision (16-bit) floating-point elements from a and b using control mask k, and store the results in dst.
_mm512_mask_cmp_ph_mask^⚠Experimentalavx512fp16: Compare packed half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_mask_cmp_round_ph_mask^⚠Experimentalavx512fp16: Compare packed half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_mask_cmul_pch^⚠Experimentalavx512fp16: Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm512_mask_cmul_round_pch^⚠Experimentalavx512fp16: Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm512_mask_conj_pch^⚠Experimentalavx512fp16: Compute the complex conjugates of complex numbers in a, and store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm512_mask_cvt_roundepi16_ph^⚠Experimentalavx512fp16: Convert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm512_mask_cvt_roundepi32_ph^⚠Experimentalavx512fp16: Convert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm512_mask_cvt_roundepi64_ph^⚠Experimentalavx512fp16: Convert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm512_mask_cvt_roundepu16_ph^⚠Experimentalavx512fp16: Convert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm512_mask_cvt_roundepu32_ph^⚠Experimentalavx512fp16: Convert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm512_mask_cvt_roundepu64_ph^⚠Experimentalavx512fp16: Convert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm512_mask_cvt_roundpd_ph^⚠Experimentalavx512fp16: Convert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm512_mask_cvt_roundph_epi16^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_cvt_roundph_epi32^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_cvt_roundph_epi64^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_cvt_roundph_epu16^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_cvt_roundph_epu32^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_cvt_roundph_epu64^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_cvt_roundph_pd^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm512_mask_cvtepi16_ph^⚠Experimentalavx512fp16: Convert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm512_mask_cvtepi32_ph^⚠Experimentalavx512fp16: Convert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm512_mask_cvtepi64_ph^⚠Experimentalavx512fp16: Convert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm512_mask_cvtepu16_ph^⚠Experimentalavx512fp16: Convert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm512_mask_cvtepu32_ph^⚠Experimentalavx512fp16: Convert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm512_mask_cvtepu64_ph^⚠Experimentalavx512fp16: Convert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm512_mask_cvtpd_ph^⚠Experimentalavx512fp16: Convert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm512_mask_cvtph_epi16^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_cvtph_epi32^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_cvtph_epi64^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_cvtph_epu16^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_cvtph_epu32^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_cvtph_epu64^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_cvtph_pd^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm512_mask_cvtt_roundph_epi16^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_cvtt_roundph_epi32^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_cvtt_roundph_epi64^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_cvtt_roundph_epu16^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_cvtt_roundph_epu32^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_cvtt_roundph_epu64^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_cvttph_epi16^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_cvttph_epi32^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_cvttph_epi64^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_cvttph_epu16^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_cvttph_epu32^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_cvttph_epu64^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_cvtx_roundph_ps^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm512_mask_cvtx_roundps_ph^⚠Experimentalavx512fp16: Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm512_mask_cvtxph_ps^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm512_mask_cvtxps_ph^⚠Experimentalavx512fp16: Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm512_mask_div_ph^⚠Experimentalavx512fp16: Divide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_div_round_ph^⚠Experimentalavx512fp16: Divide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Rounding is done according to the rounding parameter, which can be one of:
_mm512_mask_fcmadd_pch^⚠Experimentalavx512fp16: Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm512_mask_fcmadd_round_pch^⚠Experimentalavx512fp16: Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm512_mask_fcmul_pch^⚠Experimentalavx512fp16: Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm512_mask_fcmul_round_pch^⚠Experimentalavx512fp16: Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm512_mask_fmadd_pch^⚠Experimentalavx512fp16: Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm512_mask_fmadd_ph^⚠Experimentalavx512fp16: Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
_mm512_mask_fmadd_round_pch^⚠Experimentalavx512fp16: Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm512_mask_fmadd_round_ph^⚠Experimentalavx512fp16: Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
_mm512_mask_fmaddsub_ph^⚠Experimentalavx512fp16: Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
_mm512_mask_fmaddsub_round_ph^⚠Experimentalavx512fp16: Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
_mm512_mask_fmsub_ph^⚠Experimentalavx512fp16: Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
_mm512_mask_fmsub_round_ph^⚠Experimentalavx512fp16: Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
_mm512_mask_fmsubadd_ph^⚠Experimentalavx512fp16: Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
_mm512_mask_fmsubadd_round_ph^⚠Experimentalavx512fp16: Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
_mm512_mask_fmul_pch^⚠Experimentalavx512fp16: Multiply packed complex numbers in a and b, and store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm512_mask_fmul_round_pch^⚠Experimentalavx512fp16: Multiply packed complex numbers in a and b, and store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1]. Rounding is done according to the rounding parameter, which can be one of:
_mm512_mask_fnmadd_ph^⚠Experimentalavx512fp16: Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
_mm512_mask_fnmadd_round_ph^⚠Experimentalavx512fp16: Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
_mm512_mask_fnmsub_ph^⚠Experimentalavx512fp16: Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
_mm512_mask_fnmsub_round_ph^⚠Experimentalavx512fp16: Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
_mm512_mask_fpclass_ph_mask^⚠Experimentalavx512fp16: Test packed half-precision (16-bit) floating-point elements in a for special categories specified by imm8, and store the results in mask vector k using zeromask k (elements are zeroed out when the corresponding mask bit is not set). imm can be a combination of:
_mm512_mask_getexp_ph^⚠Experimentalavx512fp16: Convert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision (16-bit) floating-point number representing the integer exponent, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element.
_mm512_mask_getexp_round_ph^⚠Experimentalavx512fp16: Convert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision (16-bit) floating-point number representing the integer exponent, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
_mm512_mask_getmant_ph^⚠Experimentalavx512fp16: Normalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign.
_mm512_mask_getmant_round_ph^⚠Experimentalavx512fp16: Normalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
_mm512_mask_max_ph^⚠Experimentalavx512fp16: Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
_mm512_mask_max_round_ph^⚠Experimentalavx512fp16: Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
_mm512_mask_min_ph^⚠Experimentalavx512fp16: Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
_mm512_mask_min_round_ph^⚠Experimentalavx512fp16: Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
_mm512_mask_mul_pch^⚠Experimentalavx512fp16: Multiply packed complex numbers in a and b, and store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm512_mask_mul_ph^⚠Experimentalavx512fp16: Multiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_mul_round_pch^⚠Experimentalavx512fp16: Multiply the packed complex numbers in a and b, and store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm512_mask_mul_round_ph^⚠Experimentalavx512fp16: Multiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Rounding is done according to the rounding parameter, which can be one of:
_mm512_mask_rcp_ph^⚠Experimentalavx512fp16: Compute the approximate reciprocal of packed 16-bit floating-point elements in a and stores the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 1.5*2^-12.
_mm512_mask_reduce_ph^⚠Experimentalavx512fp16: Extract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_reduce_round_ph^⚠Experimentalavx512fp16: Extract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_roundscale_ph^⚠Experimentalavx512fp16: Round packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_roundscale_round_ph^⚠Experimentalavx512fp16: Round packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
_mm512_mask_rsqrt_ph^⚠Experimentalavx512fp16: Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 1.5*2^-12.
_mm512_mask_scalef_ph^⚠Experimentalavx512fp16: Scale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_scalef_round_ph^⚠Experimentalavx512fp16: Scale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_sqrt_ph^⚠Experimentalavx512fp16: Compute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_sqrt_round_ph^⚠Experimentalavx512fp16: Compute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Rounding is done according to the rounding parameter, which can be one of:
_mm512_mask_sub_ph^⚠Experimentalavx512fp16: Subtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_sub_round_ph^⚠Experimentalavx512fp16: Subtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Rounding is done according to the rounding parameter, which can be one of:
_mm512_maskz_add_ph^⚠Experimentalavx512fp16: Add packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_add_round_ph^⚠Experimentalavx512fp16: Add packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Rounding is done according to the rounding parameter, which can be one of:
_mm512_maskz_cmul_pch^⚠Experimentalavx512fp16: Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm512_maskz_cmul_round_pch^⚠Experimentalavx512fp16: Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm512_maskz_conj_pch^⚠Experimentalavx512fp16: Compute the complex conjugates of complex numbers in a, and store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm512_maskz_cvt_roundepi16_ph^⚠Experimentalavx512fp16: Convert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvt_roundepi32_ph^⚠Experimentalavx512fp16: Convert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvt_roundepi64_ph^⚠Experimentalavx512fp16: Convert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvt_roundepu16_ph^⚠Experimentalavx512fp16: Convert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvt_roundepu32_ph^⚠Experimentalavx512fp16: Convert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvt_roundepu64_ph^⚠Experimentalavx512fp16: Convert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvt_roundpd_ph^⚠Experimentalavx512fp16: Convert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvt_roundph_epi16^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvt_roundph_epi32^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvt_roundph_epi64^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvt_roundph_epu16^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvt_roundph_epu32^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvt_roundph_epu64^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvt_roundph_pd^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvtepi16_ph^⚠Experimentalavx512fp16: Convert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvtepi32_ph^⚠Experimentalavx512fp16: Convert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvtepi64_ph^⚠Experimentalavx512fp16: Convert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvtepu16_ph^⚠Experimentalavx512fp16: Convert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvtepu32_ph^⚠Experimentalavx512fp16: Convert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvtepu64_ph^⚠Experimentalavx512fp16: Convert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvtpd_ph^⚠Experimentalavx512fp16: Convert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvtph_epi16^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvtph_epi32^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvtph_epi64^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvtph_epu16^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvtph_epu32^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvtph_epu64^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvtph_pd^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvtt_roundph_epi16^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvtt_roundph_epi32^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvtt_roundph_epi64^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvtt_roundph_epu16^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvtt_roundph_epu32^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvtt_roundph_epu64^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvttph_epi16^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvttph_epi32^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvttph_epi64^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvttph_epu16^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvttph_epu32^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvttph_epu64^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvtx_roundph_ps^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvtx_roundps_ph^⚠Experimentalavx512fp16: Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvtxph_ps^⚠Experimentalavx512fp16: Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvtxps_ph^⚠Experimentalavx512fp16: Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_div_ph^⚠Experimentalavx512fp16: Divide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_div_round_ph^⚠Experimentalavx512fp16: Divide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Rounding is done according to the rounding parameter, which can be one of:
_mm512_maskz_fcmadd_pch^⚠Experimentalavx512fp16: Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate to the corresponding complex numbers in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm512_maskz_fcmadd_round_pch^⚠Experimentalavx512fp16: Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate to the corresponding complex numbers in c using zeromask k (the element is zeroed out when the corresponding mask bit is not set), and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1, or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm512_maskz_fcmul_pch^⚠Experimentalavx512fp16: Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm512_maskz_fcmul_round_pch^⚠Experimentalavx512fp16: Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm512_maskz_fmadd_pch^⚠Experimentalavx512fp16: Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm512_maskz_fmadd_ph^⚠Experimentalavx512fp16: Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
_mm512_maskz_fmadd_round_pch^⚠Experimentalavx512fp16: Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm512_maskz_fmadd_round_ph^⚠Experimentalavx512fp16: Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
_mm512_maskz_fmaddsub_ph^⚠Experimentalavx512fp16: Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
_mm512_maskz_fmaddsub_round_ph^⚠Experimentalavx512fp16: Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
_mm512_maskz_fmsub_ph^⚠Experimentalavx512fp16: Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
_mm512_maskz_fmsub_round_ph^⚠Experimentalavx512fp16: Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
_mm512_maskz_fmsubadd_ph^⚠Experimentalavx512fp16: Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
_mm512_maskz_fmsubadd_round_ph^⚠Experimentalavx512fp16: Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
_mm512_maskz_fmul_pch^⚠Experimentalavx512fp16: Multiply packed complex numbers in a and b, and store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm512_maskz_fmul_round_pch^⚠Experimentalavx512fp16: Multiply packed complex numbers in a and b, and store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1]. Rounding is done according to the rounding parameter, which can be one of:
_mm512_maskz_fnmadd_ph^⚠Experimentalavx512fp16: Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
_mm512_maskz_fnmadd_round_ph^⚠Experimentalavx512fp16: Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
_mm512_maskz_fnmsub_ph^⚠Experimentalavx512fp16: Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
_mm512_maskz_fnmsub_round_ph^⚠Experimentalavx512fp16: Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
_mm512_maskz_getexp_ph^⚠Experimentalavx512fp16: Convert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision (16-bit) floating-point number representing the integer exponent, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element.
_mm512_maskz_getexp_round_ph^⚠Experimentalavx512fp16: Convert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision (16-bit) floating-point number representing the integer exponent, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
_mm512_maskz_getmant_ph^⚠Experimentalavx512fp16: Normalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign.
_mm512_maskz_getmant_round_ph^⚠Experimentalavx512fp16: Normalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
_mm512_maskz_max_ph^⚠Experimentalavx512fp16: Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
_mm512_maskz_max_round_ph^⚠Experimentalavx512fp16: Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
_mm512_maskz_min_ph^⚠Experimentalavx512fp16: Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
_mm512_maskz_min_round_ph^⚠Experimentalavx512fp16: Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
_mm512_maskz_mul_pch^⚠Experimentalavx512fp16: Multiply packed complex numbers in a and b, and store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm512_maskz_mul_ph^⚠Experimentalavx512fp16: Multiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_mul_round_pch^⚠Experimentalavx512fp16: Multiply the packed complex numbers in a and b, and store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm512_maskz_mul_round_ph^⚠Experimentalavx512fp16: Multiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Rounding is done according to the rounding parameter, which can be one of:
_mm512_maskz_rcp_ph^⚠Experimentalavx512fp16: Compute the approximate reciprocal of packed 16-bit floating-point elements in a and stores the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 1.5*2^-12.
_mm512_maskz_reduce_ph^⚠Experimentalavx512fp16: Extract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_reduce_round_ph^⚠Experimentalavx512fp16: Extract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_roundscale_ph^⚠Experimentalavx512fp16: Round packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_roundscale_round_ph^⚠Experimentalavx512fp16: Round packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
_mm512_maskz_rsqrt_ph^⚠Experimentalavx512fp16: Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 1.5*2^-12.
_mm512_maskz_scalef_ph^⚠Experimentalavx512fp16: Scale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_scalef_round_ph^⚠Experimentalavx512fp16: Scale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_sqrt_ph^⚠Experimentalavx512fp16: Compute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_sqrt_round_ph^⚠Experimentalavx512fp16: Compute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Rounding is done according to the rounding parameter, which can be one of:
_mm512_maskz_sub_ph^⚠Experimentalavx512fp16: Subtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_sub_round_ph^⚠Experimentalavx512fp16: Subtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Rounding is done according to the rounding parameter, which can be one of:
_mm512_max_ph^⚠Experimentalavx512fp16: Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
_mm512_max_round_ph^⚠Experimentalavx512fp16: Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
_mm512_min_ph^⚠Experimentalavx512fp16: Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
_mm512_min_round_ph^⚠Experimentalavx512fp16: Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
_mm512_mul_pch^⚠Experimentalavx512fp16: Multiply packed complex numbers in a and b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm512_mul_ph^⚠Experimentalavx512fp16: Multiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst.
_mm512_mul_round_pch^⚠Experimentalavx512fp16: Multiply the packed complex numbers in a and b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm512_mul_round_ph^⚠Experimentalavx512fp16: Multiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst. Rounding is done according to the rounding parameter, which can be one of:
_mm512_permutex2var_ph^⚠Experimentalavx512fp16: Shuffle half-precision (16-bit) floating-point elements in a and b using the corresponding selector and index in idx, and store the results in dst.
_mm512_permutexvar_ph^⚠Experimentalavx512fp16: Shuffle half-precision (16-bit) floating-point elements in a using the corresponding index in idx, and store the results in dst.
_mm512_rcp_ph^⚠Experimentalavx512fp16: Compute the approximate reciprocal of packed 16-bit floating-point elements in a and stores the results in dst. The maximum relative error for this approximation is less than 1.5*2^-12.
_mm512_reduce_add_ph^⚠Experimentalavx512fp16: Reduce the packed half-precision (16-bit) floating-point elements in a by addition. Returns the sum of all elements in a.
_mm512_reduce_max_ph^⚠Experimentalavx512fp16: Reduce the packed half-precision (16-bit) floating-point elements in a by maximum. Returns the maximum of all elements in a.
_mm512_reduce_min_ph^⚠Experimentalavx512fp16: Reduce the packed half-precision (16-bit) floating-point elements in a by minimum. Returns the minimum of all elements in a.
_mm512_reduce_mul_ph^⚠Experimentalavx512fp16: Reduce the packed half-precision (16-bit) floating-point elements in a by multiplication. Returns the product of all elements in a.
_mm512_reduce_ph^⚠Experimentalavx512fp16: Extract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst.
_mm512_reduce_round_ph^⚠Experimentalavx512fp16: Extract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst.
_mm512_roundscale_ph^⚠Experimentalavx512fp16: Round packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst.
_mm512_roundscale_round_ph^⚠Experimentalavx512fp16: Round packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
_mm512_rsqrt_ph^⚠Experimentalavx512fp16: Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst. The maximum relative error for this approximation is less than 1.5*2^-12.
_mm512_scalef_ph^⚠Experimentalavx512fp16: Scale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst.
_mm512_scalef_round_ph^⚠Experimentalavx512fp16: Scale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst.
_mm512_set1_ph^⚠Experimentalavx512fp16: Broadcast the half-precision (16-bit) floating-point value a to all elements of dst.
_mm512_set_ph^⚠Experimentalavx512fp16: Set packed half-precision (16-bit) floating-point elements in dst with the supplied values.
_mm512_setr_ph^⚠Experimentalavx512fp16: Set packed half-precision (16-bit) floating-point elements in dst with the supplied values in reverse order.
_mm512_setzero_ph^⚠Experimentalavx512fp16: Return vector of type __m512h with all elements set to zero.
_mm512_sqrt_ph^⚠Experimentalavx512fp16: Compute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst.
_mm512_sqrt_round_ph^⚠Experimentalavx512fp16: Compute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst. Rounding is done according to the rounding parameter, which can be one of:
_mm512_store_ph^⚠Experimentalavx512fp16: Store 512-bits (composed of 32 packed half-precision (16-bit) floating-point elements) from a into memory. The address must be aligned to 64 bytes or a general-protection exception may be generated.
_mm512_storeu_ph^⚠Experimentalavx512fp16: Store 512-bits (composed of 32 packed half-precision (16-bit) floating-point elements) from a into memory. The address does not need to be aligned to any particular boundary.
_mm512_sub_ph^⚠Experimentalavx512fp16: Subtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst.
_mm512_sub_round_ph^⚠Experimentalavx512fp16: Subtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst. Rounding is done according to the rounding parameter, which can be one of:
_mm512_undefined_ph^⚠Experimentalavx512fp16: Return vector of type __m512h with indetermination elements. Despite using the word “undefined” (following Intel’s naming scheme), this non-deterministically picks some valid value and is not equivalent to mem::MaybeUninit. In practice, this is typically equivalent to mem::zeroed.
_mm512_zextph128_ph512^⚠Experimentalavx512fp16: Cast vector of type __m128h to type __m512h. The upper 24 elements of the result are zeroed. This intrinsic can generate the vzeroupper instruction, but most of the time it does not generate any instructions.
_mm512_zextph256_ph512^⚠Experimentalavx512fp16: Cast vector of type __m256h to type __m512h. The upper 16 elements of the result are zeroed. This intrinsic can generate the vzeroupper instruction, but most of the time it does not generate any instructions.
_mm_abs_ph^⚠Experimentalavx512fp16,avx512vl: Finds the absolute value of each packed half-precision (16-bit) floating-point element in v2, storing the results in dst.
_mm_add_ph^⚠Experimentalavx512fp16,avx512vl: Add packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst.
_mm_add_round_sh^⚠Experimentalavx512fp16: Add the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. Rounding is done according to the rounding parameter, which can be one of:
_mm_add_sh^⚠Experimentalavx512fp16: Add the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_castpd_ph^⚠Experimentalavx512fp16: Cast vector of type __m128d to type __m128h. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
_mm_castph_pd^⚠Experimentalavx512fp16: Cast vector of type __m128h to type __m128d. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
_mm_castph_ps^⚠Experimentalavx512fp16: Cast vector of type __m128h to type __m128. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
_mm_castph_si128^⚠Experimentalavx512fp16: Cast vector of type __m128h to type __m128i. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
_mm_castps_ph^⚠Experimentalavx512fp16: Cast vector of type __m128 to type __m128h. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
_mm_castsi128_ph^⚠Experimentalavx512fp16: Cast vector of type __m128i to type __m128h. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
_mm_cmp_ph_mask^⚠Experimentalavx512fp16,avx512vl: Compare packed half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
_mm_cmp_round_sh_mask^⚠Experimentalavx512fp16: Compare the lower half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the result in mask vector k. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
_mm_cmp_sh_mask^⚠Experimentalavx512fp16: Compare the lower half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the result in mask vector k.
_mm_cmul_pch^⚠Experimentalavx512fp16,avx512vl: Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_cmul_round_sch^⚠Experimentalavx512fp16: Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1],
_mm_cmul_sch^⚠Experimentalavx512fp16: Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1],
_mm_comi_round_sh^⚠Experimentalavx512fp16: Compare the lower half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and return the boolean result (0 or 1). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
_mm_comi_sh^⚠Experimentalavx512fp16: Compare the lower half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and return the boolean result (0 or 1).
_mm_comieq_sh^⚠Experimentalavx512fp16: Compare the lower half-precision (16-bit) floating-point elements in a and b for equality, and return the boolean result (0 or 1).
_mm_comige_sh^⚠Experimentalavx512fp16: Compare the lower half-precision (16-bit) floating-point elements in a and b for greater-than-or-equal, and return the boolean result (0 or 1).
_mm_comigt_sh^⚠Experimentalavx512fp16: Compare the lower half-precision (16-bit) floating-point elements in a and b for greater-than, and return the boolean result (0 or 1).
_mm_comile_sh^⚠Experimentalavx512fp16: Compare the lower half-precision (16-bit) floating-point elements in a and b for less-than-or-equal, and return the boolean result (0 or 1).
_mm_comilt_sh^⚠Experimentalavx512fp16: Compare the lower half-precision (16-bit) floating-point elements in a and b for less-than, and return the boolean result (0 or 1).
_mm_comineq_sh^⚠Experimentalavx512fp16: Compare the lower half-precision (16-bit) floating-point elements in a and b for not-equal, and return the boolean result (0 or 1).
_mm_conj_pch^⚠Experimentalavx512fp16,avx512vl: Compute the complex conjugates of complex numbers in a, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_cvt_roundi32_sh^⚠Experimentalavx512fp16: Convert the signed 32-bit integer b to a half-precision (16-bit) floating-point element, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_cvt_roundsd_sh^⚠Experimentalavx512fp16: Convert the lower double-precision (64-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_cvt_roundsh_i32^⚠Experimentalavx512fp16: Convert the lower half-precision (16-bit) floating-point element in a to a 32-bit integer, and store the result in dst.
_mm_cvt_roundsh_sd^⚠Experimentalavx512fp16: Convert the lower half-precision (16-bit) floating-point element in b to a double-precision (64-bit) floating-point element, store the result in the lower element of dst, and copy the upper element from a to the upper element of dst.
_mm_cvt_roundsh_ss^⚠Experimentalavx512fp16: Convert the lower half-precision (16-bit) floating-point element in b to a single-precision (32-bit) floating-point element, store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst.
_mm_cvt_roundsh_u32^⚠Experimentalavx512fp16: Convert the lower half-precision (16-bit) floating-point element in a to a 32-bit unsigned integer, and store the result in dst.
_mm_cvt_roundss_sh^⚠Experimentalavx512fp16: Convert the lower single-precision (32-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_cvt_roundu32_sh^⚠Experimentalavx512fp16: Convert the unsigned 32-bit integer b to a half-precision (16-bit) floating-point element, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_cvtepi16_ph^⚠Experimentalavx512fp16,avx512vl: Convert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
_mm_cvtepi32_ph^⚠Experimentalavx512fp16,avx512vl: Convert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst. The upper 64 bits of dst are zeroed out.
_mm_cvtepi64_ph^⚠Experimentalavx512fp16,avx512vl: Convert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst. The upper 96 bits of dst are zeroed out.
_mm_cvtepu16_ph^⚠Experimentalavx512fp16,avx512vl: Convert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
_mm_cvtepu32_ph^⚠Experimentalavx512fp16,avx512vl: Convert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst. The upper 64 bits of dst are zeroed out.
_mm_cvtepu64_ph^⚠Experimentalavx512fp16,avx512vl: Convert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst. The upper 96 bits of dst are zeroed out.
_mm_cvti32_sh^⚠Experimentalavx512fp16: Convert the signed 32-bit integer b to a half-precision (16-bit) floating-point element, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_cvtpd_ph^⚠Experimentalavx512fp16,avx512vl: Convert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst. The upper 96 bits of dst are zeroed out.
_mm_cvtph_epi16^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst.
_mm_cvtph_epi32^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst.
_mm_cvtph_epi64^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst.
_mm_cvtph_epu16^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst.
_mm_cvtph_epu32^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst.
_mm_cvtph_epu64^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst.
_mm_cvtph_pd^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst.
_mm_cvtsd_sh^⚠Experimentalavx512fp16: Convert the lower double-precision (64-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_cvtsh_h^⚠Experimentalavx512fp16: Copy the lower half-precision (16-bit) floating-point element from a to dst.
_mm_cvtsh_i32^⚠Experimentalavx512fp16: Convert the lower half-precision (16-bit) floating-point element in a to a 32-bit integer, and store the result in dst.
_mm_cvtsh_sd^⚠Experimentalavx512fp16: Convert the lower half-precision (16-bit) floating-point element in b to a double-precision (64-bit) floating-point element, store the result in the lower element of dst, and copy the upper element from a to the upper element of dst.
_mm_cvtsh_ss^⚠Experimentalavx512fp16: Convert the lower half-precision (16-bit) floating-point element in b to a single-precision (32-bit) floating-point element, store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst.
_mm_cvtsh_u32^⚠Experimentalavx512fp16: Convert the lower half-precision (16-bit) floating-point element in a to a 32-bit unsigned integer, and store the result in dst.
_mm_cvtsi16_si128^⚠Experimentalavx512fp16: Copy 16-bit integer a to the lower elements of dst, and zero the upper elements of dst.
_mm_cvtsi128_si16^⚠Experimentalavx512fp16: Copy the lower 16-bit integer in a to dst.
_mm_cvtss_sh^⚠Experimentalavx512fp16: Convert the lower single-precision (32-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_cvtt_roundsh_i32^⚠Experimentalavx512fp16: Convert the lower half-precision (16-bit) floating-point element in a to a 32-bit integer with truncation, and store the result in dst.
_mm_cvtt_roundsh_u32^⚠Experimentalavx512fp16: Convert the lower half-precision (16-bit) floating-point element in a to a 32-bit unsigned integer with truncation, and store the result in dst.
_mm_cvttph_epi16^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst.
_mm_cvttph_epi32^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst.
_mm_cvttph_epi64^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst.
_mm_cvttph_epu16^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst.
_mm_cvttph_epu32^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst.
_mm_cvttph_epu64^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst.
_mm_cvttsh_i32^⚠Experimentalavx512fp16: Convert the lower half-precision (16-bit) floating-point element in a to a 32-bit integer with truncation, and store the result in dst.
_mm_cvttsh_u32^⚠Experimentalavx512fp16: Convert the lower half-precision (16-bit) floating-point element in a to a 32-bit unsigned integer with truncation, and store the result in dst.
_mm_cvtu32_sh^⚠Experimentalavx512fp16: Convert the unsigned 32-bit integer b to a half-precision (16-bit) floating-point element, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_cvtxph_ps^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst.
_mm_cvtxps_ph^⚠Experimentalavx512fp16,avx512vl: Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
_mm_div_ph^⚠Experimentalavx512fp16,avx512vl: Divide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst.
_mm_div_round_sh^⚠Experimentalavx512fp16: Divide the lower half-precision (16-bit) floating-point elements in a by b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. Rounding is done according to the rounding parameter, which can be one of:
_mm_div_sh^⚠Experimentalavx512fp16: Divide the lower half-precision (16-bit) floating-point elements in a by b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_fcmadd_pch^⚠Experimentalavx512fp16,avx512vl: Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate to the corresponding complex numbers in c, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_fcmadd_round_sch^⚠Experimentalavx512fp16: Multiply the lower complex number in a by the complex conjugate of the lower complex number in b, accumulate to the lower complex number in c, and store the result in the lower elements of dst, and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_fcmadd_sch^⚠Experimentalavx512fp16: Multiply the lower complex number in a by the complex conjugate of the lower complex number in b, accumulate to the lower complex number in c, and store the result in the lower elements of dst, and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_fcmul_pch^⚠Experimentalavx512fp16,avx512vl: Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_fcmul_round_sch^⚠Experimentalavx512fp16: Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1],
_mm_fcmul_sch^⚠Experimentalavx512fp16: Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_fmadd_pch^⚠Experimentalavx512fp16,avx512vl: Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_fmadd_ph^⚠Experimentalavx512fp16,avx512vl: Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst.
_mm_fmadd_round_sch^⚠Experimentalavx512fp16: Multiply the lower complex numbers in a and b, accumulate to the lower complex number in c, and store the result in the lower elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_fmadd_round_sh^⚠Experimentalavx512fp16: Multiply the lower half-precision (16-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_fmadd_sch^⚠Experimentalavx512fp16: Multiply the lower complex numbers in a and b, accumulate to the lower complex number in c, and store the result in the lower elements of dst, and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_fmadd_sh^⚠Experimentalavx512fp16: Multiply the lower half-precision (16-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_fmaddsub_ph^⚠Experimentalavx512fp16,avx512vl: Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst.
_mm_fmsub_ph^⚠Experimentalavx512fp16,avx512vl: Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst.
_mm_fmsub_round_sh^⚠Experimentalavx512fp16: Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract packed elements in c from the intermediate result. Store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_fmsub_sh^⚠Experimentalavx512fp16: Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract packed elements in c from the intermediate result. Store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_fmsubadd_ph^⚠Experimentalavx512fp16,avx512vl: Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst.
_mm_fmul_pch^⚠Experimentalavx512fp16,avx512vl: Multiply packed complex numbers in a and b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_fmul_round_sch^⚠Experimentalavx512fp16: Multiply the lower complex numbers in a and b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_fmul_sch^⚠Experimentalavx512fp16: Multiply the lower complex numbers in a and b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_fnmadd_ph^⚠Experimentalavx512fp16,avx512vl: Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst.
_mm_fnmadd_round_sh^⚠Experimentalavx512fp16: Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_fnmadd_sh^⚠Experimentalavx512fp16: Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_fnmsub_ph^⚠Experimentalavx512fp16,avx512vl: Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst.
_mm_fnmsub_round_sh^⚠Experimentalavx512fp16: Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_fnmsub_sh^⚠Experimentalavx512fp16: Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_fpclass_ph_mask^⚠Experimentalavx512fp16,avx512vl: Test packed half-precision (16-bit) floating-point elements in a for special categories specified by imm8, and store the results in mask vector k. imm can be a combination of:
_mm_fpclass_sh_mask^⚠Experimentalavx512fp16: Test the lower half-precision (16-bit) floating-point element in a for special categories specified by imm8, and store the result in mask vector k. imm can be a combination of:
_mm_getexp_ph^⚠Experimentalavx512fp16,avx512vl: Convert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision (16-bit) floating-point number representing the integer exponent, and store the results in dst. This intrinsic essentially calculates floor(log2(x)) for each element.
_mm_getexp_round_sh^⚠Experimentalavx512fp16: Convert the exponent of the lower half-precision (16-bit) floating-point element in b to a half-precision (16-bit) floating-point number representing the integer exponent, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. This intrinsic essentially calculates floor(log2(x)) for the lower element. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
_mm_getexp_sh^⚠Experimentalavx512fp16: Convert the exponent of the lower half-precision (16-bit) floating-point element in b to a half-precision (16-bit) floating-point number representing the integer exponent, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. This intrinsic essentially calculates floor(log2(x)) for the lower element.
_mm_getmant_ph^⚠Experimentalavx512fp16,avx512vl: Normalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store the results in dst. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign.
_mm_getmant_round_sh^⚠Experimentalavx512fp16: Normalize the mantissas of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
_mm_getmant_sh^⚠Experimentalavx512fp16: Normalize the mantissas of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign.
_mm_load_ph^⚠Experimentalavx512fp16,avx512vl: Load 128-bits (composed of 8 packed half-precision (16-bit) floating-point elements) from memory into a new vector. The address must be aligned to 16 bytes or a general-protection exception may be generated.
_mm_load_sh^⚠Experimentalavx512fp16: Load a half-precision (16-bit) floating-point element from memory into the lower element of a new vector, and zero the upper elements
_mm_loadu_ph^⚠Experimentalavx512fp16,avx512vl: Load 128-bits (composed of 8 packed half-precision (16-bit) floating-point elements) from memory into a new vector. The address does not need to be aligned to any particular boundary.
_mm_mask3_fcmadd_pch^⚠Experimentalavx512fp16,avx512vl: Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_mask3_fcmadd_round_sch^⚠Experimentalavx512fp16: Multiply the lower complex number in a by the complex conjugate of the lower complex number in b, accumulate to the lower complex number in c, and store the result in the lower elements of dst using writemask k (the element is copied from c when the corresponding mask bit is not set), and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_mask3_fcmadd_sch^⚠Experimentalavx512fp16: Multiply the lower complex number in a by the complex conjugate of the lower complex number in b, accumulate to the lower complex number in c, and store the result in the lower elements of dst using writemask k (the element is copied from c when the corresponding mask bit is not set), and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_mask3_fmadd_pch^⚠Experimentalavx512fp16,avx512vl: Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_mask3_fmadd_ph^⚠Experimentalavx512fp16,avx512vl: Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
_mm_mask3_fmadd_round_sch^⚠Experimentalavx512fp16: Multiply the lower complex numbers in a and b, accumulate to the lower complex number in c, and store the result in the lower elements of dst using writemask k (elements are copied from c when mask bit 0 is not set), and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_mask3_fmadd_round_sh^⚠Experimentalavx512fp16: Multiply the lower half-precision (16-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from c when the mask bit 0 is not set), and copy the upper 7 packed elements from c to the upper elements of dst.
_mm_mask3_fmadd_sch^⚠Experimentalavx512fp16: Multiply the lower complex numbers in a and b, accumulate to the lower complex number in c, and store the result in the lower elements of dst using writemask k (elements are copied from c when mask bit 0 is not set), and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_mask3_fmadd_sh^⚠Experimentalavx512fp16: Multiply the lower half-precision (16-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from c when the mask bit 0 is not set), and copy the upper 7 packed elements from c to the upper elements of dst.
_mm_mask3_fmaddsub_ph^⚠Experimentalavx512fp16,avx512vl: Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
_mm_mask3_fmsub_ph^⚠Experimentalavx512fp16,avx512vl: Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
_mm_mask3_fmsub_round_sh^⚠Experimentalavx512fp16: Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract packed elements in c from the intermediate result. Store the result in the lower element of dst using writemask k (the element is copied from c when the mask bit 0 is not set), and copy the upper 7 packed elements from c to the upper elements of dst.
_mm_mask3_fmsub_sh^⚠Experimentalavx512fp16: Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract packed elements in c from the intermediate result. Store the result in the lower element of dst using writemask k (the element is copied from c when the mask bit 0 is not set), and copy the upper 7 packed elements from c to the upper elements of dst.
_mm_mask3_fmsubadd_ph^⚠Experimentalavx512fp16,avx512vl: Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
_mm_mask3_fnmadd_ph^⚠Experimentalavx512fp16,avx512vl: Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
_mm_mask3_fnmadd_round_sh^⚠Experimentalavx512fp16: Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from c when the mask bit 0 is not set), and copy the upper 7 packed elements from c to the upper elements of dst.
_mm_mask3_fnmadd_sh^⚠Experimentalavx512fp16: Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from c when the mask bit 0 is not set), and copy the upper 7 packed elements from c to the upper elements of dst.
_mm_mask3_fnmsub_ph^⚠Experimentalavx512fp16,avx512vl: Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
_mm_mask3_fnmsub_round_sh^⚠Experimentalavx512fp16: Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from c when the mask bit 0 is not set), and copy the upper 7 packed elements from c to the upper elements of dst.
_mm_mask3_fnmsub_sh^⚠Experimentalavx512fp16: Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from c when the mask bit 0 is not set), and copy the upper 7 packed elements from c to the upper elements of dst.
_mm_mask_add_ph^⚠Experimentalavx512fp16,avx512vl: Add packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_mask_add_round_sh^⚠Experimentalavx512fp16: Add the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using writemask k (the element is copied from src when mask bit 0 is not set). Rounding is done according to the rounding parameter, which can be one of:
_mm_mask_add_sh^⚠Experimentalavx512fp16: Add the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using writemask k (the element is copied from src when mask bit 0 is not set).
_mm_mask_blend_ph^⚠Experimentalavx512fp16,avx512vl: Blend packed half-precision (16-bit) floating-point elements from a and b using control mask k, and store the results in dst.
_mm_mask_cmp_ph_mask^⚠Experimentalavx512fp16,avx512vl: Compare packed half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_mask_cmp_round_sh_mask^⚠Experimentalavx512fp16: Compare the lower half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the result in mask vector k using zeromask k1. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
_mm_mask_cmp_sh_mask^⚠Experimentalavx512fp16: Compare the lower half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the result in mask vector k using zeromask k1.
_mm_mask_cmul_pch^⚠Experimentalavx512fp16,avx512vl: Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_mask_cmul_round_sch^⚠Experimentalavx512fp16: Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b, and store the results in dst using writemask k (the element is copied from src when mask bit 0 is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_mask_cmul_sch^⚠Experimentalavx512fp16: Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b, and store the results in dst using writemask k (the element is copied from src when mask bit 0 is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1],
_mm_mask_conj_pch^⚠Experimentalavx512fp16,avx512vl: Compute the complex conjugates of complex numbers in a, and store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_mask_cvt_roundsd_sh^⚠Experimentalavx512fp16: Convert the lower double-precision (64-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst using writemask k (the element if copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_mask_cvt_roundsh_sd^⚠Experimentalavx512fp16: Convert the lower half-precision (16-bit) floating-point element in b to a double-precision (64-bit) floating-point element, store the result in the lower element of dst using writemask k (the element is copied from src to dst when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
_mm_mask_cvt_roundsh_ss^⚠Experimentalavx512fp16: Convert the lower half-precision (16-bit) floating-point element in b to a single-precision (32-bit) floating-point element, store the result in the lower element of dst using writemask k (the element is copied from src to dst when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
_mm_mask_cvt_roundss_sh^⚠Experimentalavx512fp16: Convert the lower single-precision (32-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst using writemask k (the element if copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_mask_cvtepi16_ph^⚠Experimentalavx512fp16,avx512vl: Convert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm_mask_cvtepi32_ph^⚠Experimentalavx512fp16,avx512vl: Convert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
_mm_mask_cvtepi64_ph^⚠Experimentalavx512fp16,avx512vl: Convert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set). The upper 96 bits of dst are zeroed out.
_mm_mask_cvtepu16_ph^⚠Experimentalavx512fp16,avx512vl: Convert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm_mask_cvtepu32_ph^⚠Experimentalavx512fp16,avx512vl: Convert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
_mm_mask_cvtepu64_ph^⚠Experimentalavx512fp16,avx512vl: Convert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set). The upper 96 bits of dst are zeroed out.
_mm_mask_cvtpd_ph^⚠Experimentalavx512fp16,avx512vl: Convert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set). The upper 96 bits of dst are zeroed out.
_mm_mask_cvtph_epi16^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_mask_cvtph_epi32^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_mask_cvtph_epi64^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_mask_cvtph_epu16^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_mask_cvtph_epu32^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_mask_cvtph_epu64^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_mask_cvtph_pd^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm_mask_cvtsd_sh^⚠Experimentalavx512fp16: Convert the lower double-precision (64-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst using writemask k (the element if copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_mask_cvtsh_sd^⚠Experimentalavx512fp16: Convert the lower half-precision (16-bit) floating-point element in b to a double-precision (64-bit) floating-point element, store the result in the lower element of dst using writemask k (the element is copied from src to dst when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
_mm_mask_cvtsh_ss^⚠Experimentalavx512fp16: Convert the lower half-precision (16-bit) floating-point element in b to a single-precision (32-bit) floating-point element, store the result in the lower element of dst using writemask k (the element is copied from src to dst when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
_mm_mask_cvtss_sh^⚠Experimentalavx512fp16: Convert the lower single-precision (32-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst using writemask k (the element if copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_mask_cvttph_epi16^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_mask_cvttph_epi32^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_mask_cvttph_epi64^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_mask_cvttph_epu16^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_mask_cvttph_epu32^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_mask_cvttph_epu64^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_mask_cvtxph_ps^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm_mask_cvtxps_ph^⚠Experimentalavx512fp16,avx512vl: Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
_mm_mask_div_ph^⚠Experimentalavx512fp16,avx512vl: Divide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_mask_div_round_sh^⚠Experimentalavx512fp16: Divide the lower half-precision (16-bit) floating-point elements in a by b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using writemask k (the element is copied from src when mask bit 0 is not set). Rounding is done according to the rounding parameter, which can be one of:
_mm_mask_div_sh^⚠Experimentalavx512fp16: Divide the lower half-precision (16-bit) floating-point elements in a by b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using writemask k (the element is copied from src when mask bit 0 is not set).
_mm_mask_fcmadd_pch^⚠Experimentalavx512fp16,avx512vl: Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_mask_fcmadd_round_sch^⚠Experimentalavx512fp16: Multiply the lower complex number in a by the complex conjugate of the lower complex number in b, accumulate to the lower complex number in c, and store the result in the lower elements of dst using writemask k (the element is copied from a when the corresponding mask bit is not set), and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_mask_fcmadd_sch^⚠Experimentalavx512fp16: Multiply the lower complex number in a by the complex conjugate of the lower complex number in b, accumulate to the lower complex number in c, and store the result in the lower elements of dst using writemask k (the element is copied from a when the corresponding mask bit is not set), and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_mask_fcmul_pch^⚠Experimentalavx512fp16,avx512vl: Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_mask_fcmul_round_sch^⚠Experimentalavx512fp16: Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b, and store the results in dst using writemask k (the element is copied from src when mask bit 0 is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_mask_fcmul_sch^⚠Experimentalavx512fp16: Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b, and store the results in dst using writemask k (the element is copied from src when mask bit 0 is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_mask_fmadd_pch^⚠Experimentalavx512fp16,avx512vl: Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_mask_fmadd_ph^⚠Experimentalavx512fp16,avx512vl: Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
_mm_mask_fmadd_round_sch^⚠Experimentalavx512fp16: Multiply the lower complex numbers in a and b, accumulate to the lower complex number in c, and store the result in the lower elements of dst using writemask k (elements are copied from a when mask bit 0 is not set), and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_mask_fmadd_round_sh^⚠Experimentalavx512fp16: Multiply the lower half-precision (16-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from a when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_mask_fmadd_sch^⚠Experimentalavx512fp16: Multiply the lower complex numbers in a and b, accumulate to the lower complex number in c, and store the result in the lower elements of dst using writemask k (elements are copied from a when mask bit 0 is not set), and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_mask_fmadd_sh^⚠Experimentalavx512fp16: Multiply the lower half-precision (16-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from a when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_mask_fmaddsub_ph^⚠Experimentalavx512fp16,avx512vl: Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
_mm_mask_fmsub_ph^⚠Experimentalavx512fp16,avx512vl: Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
_mm_mask_fmsub_round_sh^⚠Experimentalavx512fp16: Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract packed elements in c from the intermediate result. Store the result in the lower element of dst using writemask k (the element is copied from a when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_mask_fmsub_sh^⚠Experimentalavx512fp16: Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract packed elements in c from the intermediate result. Store the result in the lower element of dst using writemask k (the element is copied from a when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_mask_fmsubadd_ph^⚠Experimentalavx512fp16,avx512vl: Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
_mm_mask_fmul_pch^⚠Experimentalavx512fp16,avx512vl: Multiply packed complex numbers in a and b, and store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_mask_fmul_round_sch^⚠Experimentalavx512fp16: Multiply the lower complex numbers in a and b, and store the results in dst using writemask k (the element is copied from src when mask bit 0 is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_mask_fmul_sch^⚠Experimentalavx512fp16: Multiply the lower complex numbers in a and b, and store the results in dst using writemask k (the element is copied from src when mask bit 0 is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_mask_fnmadd_ph^⚠Experimentalavx512fp16,avx512vl: Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
_mm_mask_fnmadd_round_sh^⚠Experimentalavx512fp16: Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from a when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_mask_fnmadd_sh^⚠Experimentalavx512fp16: Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from a when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_mask_fnmsub_ph^⚠Experimentalavx512fp16,avx512vl: Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
_mm_mask_fnmsub_round_sh^⚠Experimentalavx512fp16: Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from a when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_mask_fnmsub_sh^⚠Experimentalavx512fp16: Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from a when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_mask_fpclass_ph_mask^⚠Experimentalavx512fp16,avx512vl: Test packed half-precision (16-bit) floating-point elements in a for special categories specified by imm8, and store the results in mask vector k using zeromask k (elements are zeroed out when the corresponding mask bit is not set). imm can be a combination of:
_mm_mask_fpclass_sh_mask^⚠Experimentalavx512fp16: Test the lower half-precision (16-bit) floating-point element in a for special categories specified by imm8, and store the result in mask vector k using zeromask k (elements are zeroed out when the corresponding mask bit is not set). imm can be a combination of:
_mm_mask_getexp_ph^⚠Experimentalavx512fp16,avx512vl: Convert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision (16-bit) floating-point number representing the integer exponent, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element.
_mm_mask_getexp_round_sh^⚠Experimentalavx512fp16: Convert the exponent of the lower half-precision (16-bit) floating-point element in b to a half-precision (16-bit) floating-point number representing the integer exponent, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. This intrinsic essentially calculates floor(log2(x)) for the lower element. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
_mm_mask_getexp_sh^⚠Experimentalavx512fp16: Convert the exponent of the lower half-precision (16-bit) floating-point element in b to a half-precision (16-bit) floating-point number representing the integer exponent, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. This intrinsic essentially calculates floor(log2(x)) for the lower element.
_mm_mask_getmant_ph^⚠Experimentalavx512fp16,avx512vl: Normalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign.
_mm_mask_getmant_round_sh^⚠Experimentalavx512fp16: Normalize the mantissas of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
_mm_mask_getmant_sh^⚠Experimentalavx512fp16: Normalize the mantissas of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign.
_mm_mask_load_sh^⚠Experimentalavx512fp16: Load a half-precision (16-bit) floating-point element from memory into the lower element of a new vector using writemask k (the element is copied from src when mask bit 0 is not set), and zero the upper elements.
_mm_mask_max_ph^⚠Experimentalavx512fp16,avx512vl: Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
_mm_mask_max_round_sh^⚠Experimentalavx512fp16,avx512vl: Compare the lower half-precision (16-bit) floating-point elements in a and b, store the maximum value in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
_mm_mask_max_sh^⚠Experimentalavx512fp16,avx512vl: Compare the lower half-precision (16-bit) floating-point elements in a and b, store the maximum value in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
_mm_mask_min_ph^⚠Experimentalavx512fp16,avx512vl: Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
_mm_mask_min_round_sh^⚠Experimentalavx512fp16,avx512vl: Compare the lower half-precision (16-bit) floating-point elements in a and b, store the minimum value in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
_mm_mask_min_sh^⚠Experimentalavx512fp16,avx512vl: Compare the lower half-precision (16-bit) floating-point elements in a and b, store the minimum value in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
_mm_mask_move_sh^⚠Experimentalavx512fp16: Move the lower half-precision (16-bit) floating-point element from b to the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_mask_mul_pch^⚠Experimentalavx512fp16,avx512vl: Multiply packed complex numbers in a and b, and store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_mask_mul_ph^⚠Experimentalavx512fp16,avx512vl: Multiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_mask_mul_round_sch^⚠Experimentalavx512fp16: Multiply the lower complex numbers in a and b, and store the result in the lower elements of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_mask_mul_round_sh^⚠Experimentalavx512fp16: Multiply the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using writemask k (the element is copied from src when mask bit 0 is not set). Rounding is done according to the rounding parameter, which can be one of:
_mm_mask_mul_sch^⚠Experimentalavx512fp16: Multiply the lower complex numbers in a and b, and store the result in the lower elements of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_mask_mul_sh^⚠Experimentalavx512fp16: Multiply the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using writemask k (the element is copied from src when mask bit 0 is not set).
_mm_mask_rcp_ph^⚠Experimentalavx512fp16,avx512vl: Compute the approximate reciprocal of packed 16-bit floating-point elements in a and stores the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 1.5*2^-12.
_mm_mask_rcp_sh^⚠Experimentalavx512fp16: Compute the approximate reciprocal of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. The maximum relative error for this approximation is less than 1.5*2^-12.
_mm_mask_reduce_ph^⚠Experimentalavx512fp16,avx512vl: Extract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_mask_reduce_round_sh^⚠Experimentalavx512fp16: Extract the reduced argument of the lower half-precision (16-bit) floating-point element in b by the number of bits specified by imm8, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_mask_reduce_sh^⚠Experimentalavx512fp16: Extract the reduced argument of the lower half-precision (16-bit) floating-point element in b by the number of bits specified by imm8, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_mask_roundscale_ph^⚠Experimentalavx512fp16,avx512vl: Round packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_mask_roundscale_round_sh^⚠Experimentalavx512fp16: Round the lower half-precision (16-bit) floating-point element in b to the number of fraction bits specified by imm8, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_mask_roundscale_sh^⚠Experimentalavx512fp16: Round the lower half-precision (16-bit) floating-point element in b to the number of fraction bits specified by imm8, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_mask_rsqrt_ph^⚠Experimentalavx512fp16,avx512vl: Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 1.5*2^-12.
_mm_mask_rsqrt_sh^⚠Experimentalavx512fp16: Compute the approximate reciprocal square root of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. The maximum relative error for this approximation is less than 1.5*2^-12.
_mm_mask_scalef_ph^⚠Experimentalavx512fp16,avx512vl: Scale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_mask_scalef_round_sh^⚠Experimentalavx512fp16: Scale the packed single-precision (32-bit) floating-point elements in a using values from b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_mask_scalef_sh^⚠Experimentalavx512fp16: Scale the packed single-precision (32-bit) floating-point elements in a using values from b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_mask_sqrt_ph^⚠Experimentalavx512fp16,avx512vl: Compute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_mask_sqrt_round_sh^⚠Experimentalavx512fp16: Compute the square root of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. Rounding is done according to the rounding parameter, which can be one of:
_mm_mask_sqrt_sh^⚠Experimentalavx512fp16: Compute the square root of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_mask_store_sh^⚠Experimentalavx512fp16: Store the lower half-precision (16-bit) floating-point element from a into memory using writemask k
_mm_mask_sub_ph^⚠Experimentalavx512fp16,avx512vl: Subtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_mask_sub_round_sh^⚠Experimentalavx512fp16: Subtract the lower half-precision (16-bit) floating-point elements in b from a, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using writemask k (the element is copied from src when mask bit 0 is not set). Rounding is done according to the rounding parameter, which can be one of:
_mm_mask_sub_sh^⚠Experimentalavx512fp16: Subtract the lower half-precision (16-bit) floating-point elements in b from a, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using writemask k (the element is copied from src when mask bit 0 is not set).
_mm_maskz_add_ph^⚠Experimentalavx512fp16,avx512vl: Add packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_maskz_add_round_sh^⚠Experimentalavx512fp16: Add the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using zeromask k (the element is zeroed out when mask bit 0 is not set). Rounding is done according to the rounding parameter, which can be one of:
_mm_maskz_add_sh^⚠Experimentalavx512fp16: Add the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using zeromask k (the element is zeroed out when mask bit 0 is not set).
_mm_maskz_cmul_pch^⚠Experimentalavx512fp16,avx512vl: Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_maskz_cmul_round_sch^⚠Experimentalavx512fp16: Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b, and store the results in dst using zeromask k (the element is zeroed out when mask bit 0 is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_maskz_cmul_sch^⚠Experimentalavx512fp16: Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b, and store the results in dst using zeromask k (the element is zeroed out when mask bit 0 is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1],
_mm_maskz_conj_pch^⚠Experimentalavx512fp16,avx512vl: Compute the complex conjugates of complex numbers in a, and store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_maskz_cvt_roundsd_sh^⚠Experimentalavx512fp16: Convert the lower double-precision (64-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_maskz_cvt_roundsh_sd^⚠Experimentalavx512fp16: Convert the lower half-precision (16-bit) floating-point element in b to a double-precision (64-bit) floating-point element, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
_mm_maskz_cvt_roundsh_ss^⚠Experimentalavx512fp16: Convert the lower half-precision (16-bit) floating-point element in b to a single-precision (32-bit) floating-point element, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
_mm_maskz_cvt_roundss_sh^⚠Experimentalavx512fp16: Convert the lower single-precision (32-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_maskz_cvtepi16_ph^⚠Experimentalavx512fp16,avx512vl: Convert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_maskz_cvtepi32_ph^⚠Experimentalavx512fp16,avx512vl: Convert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
_mm_maskz_cvtepi64_ph^⚠Experimentalavx512fp16,avx512vl: Convert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The upper 96 bits of dst are zeroed out.
_mm_maskz_cvtepu16_ph^⚠Experimentalavx512fp16,avx512vl: Convert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_maskz_cvtepu32_ph^⚠Experimentalavx512fp16,avx512vl: Convert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
_mm_maskz_cvtepu64_ph^⚠Experimentalavx512fp16,avx512vl: Convert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The upper 96 bits of dst are zeroed out.
_mm_maskz_cvtpd_ph^⚠Experimentalavx512fp16,avx512vl: Convert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The upper 96 bits of dst are zeroed out.
_mm_maskz_cvtph_epi16^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_maskz_cvtph_epi32^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_maskz_cvtph_epi64^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_maskz_cvtph_epu16^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_maskz_cvtph_epu32^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_maskz_cvtph_epu64^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_maskz_cvtph_pd^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_maskz_cvtsd_sh^⚠Experimentalavx512fp16: Convert the lower double-precision (64-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_maskz_cvtsh_sd^⚠Experimentalavx512fp16: Convert the lower half-precision (16-bit) floating-point element in b to a double-precision (64-bit) floating-point element, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
_mm_maskz_cvtsh_ss^⚠Experimentalavx512fp16: Convert the lower half-precision (16-bit) floating-point element in b to a single-precision (32-bit) floating-point element, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
_mm_maskz_cvtss_sh^⚠Experimentalavx512fp16: Convert the lower single-precision (32-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_maskz_cvttph_epi16^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_maskz_cvttph_epi32^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_maskz_cvttph_epi64^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_maskz_cvttph_epu16^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_maskz_cvttph_epu32^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_maskz_cvttph_epu64^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_maskz_cvtxph_ps^⚠Experimentalavx512fp16,avx512vl: Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_maskz_cvtxps_ph^⚠Experimentalavx512fp16,avx512vl: Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
_mm_maskz_div_ph^⚠Experimentalavx512fp16,avx512vl: Divide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_maskz_div_round_sh^⚠Experimentalavx512fp16: Divide the lower half-precision (16-bit) floating-point elements in a by b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using zeromask k (the element is zeroed out when mask bit 0 is not set). Rounding is done according to the rounding parameter, which can be one of:
_mm_maskz_div_sh^⚠Experimentalavx512fp16: Divide the lower half-precision (16-bit) floating-point elements in a by b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using zeromask k (the element is zeroed out when mask bit 0 is not set).
_mm_maskz_fcmadd_pch^⚠Experimentalavx512fp16,avx512vl: Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate to the corresponding complex numbers in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_maskz_fcmadd_round_sch^⚠Experimentalavx512fp16: Multiply the lower complex number in a by the complex conjugate of the lower complex number in b, accumulate to the lower complex number in c using zeromask k (the element is zeroed out when the corresponding mask bit is not set), and store the result in the lower elements of dst, and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1, or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_maskz_fcmadd_sch^⚠Experimentalavx512fp16: Multiply the lower complex number in a by the complex conjugate of the lower complex number in b, accumulate to the lower complex number in c, and store the result in the lower elements of dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set), and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_maskz_fcmul_pch^⚠Experimentalavx512fp16,avx512vl: Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_maskz_fcmul_round_sch^⚠Experimentalavx512fp16: Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b, and store the results in dst using zeromask k (the element is zeroed out when mask bit 0 is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_maskz_fcmul_sch^⚠Experimentalavx512fp16: Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b, and store the results in dst using zeromask k (the element is zeroed out when mask bit 0 is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_maskz_fmadd_pch^⚠Experimentalavx512fp16,avx512vl: Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_maskz_fmadd_ph^⚠Experimentalavx512fp16,avx512vl: Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
_mm_maskz_fmadd_round_sch^⚠Experimentalavx512fp16: Multiply the lower complex numbers in a and b, accumulate to the lower complex number in c, and store the result in the lower elements of dst using zeromask k (elements are zeroed out when mask bit 0 is not set), and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_maskz_fmadd_round_sh^⚠Experimentalavx512fp16: Multiply the lower half-precision (16-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_maskz_fmadd_sch^⚠Experimentalavx512fp16: Multiply the lower complex numbers in a and b, accumulate to the lower complex number in c, and store the result in the lower elements of dst using zeromask k (elements are zeroed out when mask bit 0 is not set), and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_maskz_fmadd_sh^⚠Experimentalavx512fp16: Multiply the lower half-precision (16-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_maskz_fmaddsub_ph^⚠Experimentalavx512fp16,avx512vl: Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
_mm_maskz_fmsub_ph^⚠Experimentalavx512fp16,avx512vl: Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
_mm_maskz_fmsub_round_sh^⚠Experimentalavx512fp16: Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract packed elements in c from the intermediate result. Store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_maskz_fmsub_sh^⚠Experimentalavx512fp16: Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract packed elements in c from the intermediate result. Store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_maskz_fmsubadd_ph^⚠Experimentalavx512fp16,avx512vl: Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
_mm_maskz_fmul_pch^⚠Experimentalavx512fp16,avx512vl: Multiply packed complex numbers in a and b, and store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_maskz_fmul_round_sch^⚠Experimentalavx512fp16: Multiply the lower complex numbers in a and b, and store the results in dst using zeromask k (the element is zeroed out when mask bit 0 is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_maskz_fmul_sch^⚠Experimentalavx512fp16: Multiply the lower complex numbers in a and b, and store the results in dst using zeromask k (the element is zeroed out when mask bit 0 is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_maskz_fnmadd_ph^⚠Experimentalavx512fp16,avx512vl: Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
_mm_maskz_fnmadd_round_sh^⚠Experimentalavx512fp16: Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using zeromask k (the element is zeroed out when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_maskz_fnmadd_sh^⚠Experimentalavx512fp16: Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using zeromask k (the element is zeroed out when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_maskz_fnmsub_ph^⚠Experimentalavx512fp16,avx512vl: Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
_mm_maskz_fnmsub_round_sh^⚠Experimentalavx512fp16: Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using zeromask k (the element is zeroed out when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_maskz_fnmsub_sh^⚠Experimentalavx512fp16: Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using zeromask k (the element is zeroed out when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_maskz_getexp_ph^⚠Experimentalavx512fp16,avx512vl: Convert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision (16-bit) floating-point number representing the integer exponent, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element.
_mm_maskz_getexp_round_sh^⚠Experimentalavx512fp16: Convert the exponent of the lower half-precision (16-bit) floating-point element in b to a half-precision (16-bit) floating-point number representing the integer exponent, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. This intrinsic essentially calculates floor(log2(x)) for the lower element. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
_mm_maskz_getexp_sh^⚠Experimentalavx512fp16: Convert the exponent of the lower half-precision (16-bit) floating-point element in b to a half-precision (16-bit) floating-point number representing the integer exponent, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. This intrinsic essentially calculates floor(log2(x)) for the lower element.
_mm_maskz_getmant_ph^⚠Experimentalavx512fp16,avx512vl: Normalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign.
_mm_maskz_getmant_round_sh^⚠Experimentalavx512fp16: Normalize the mantissas of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
_mm_maskz_getmant_sh^⚠Experimentalavx512fp16: Normalize the mantissas of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign.
_mm_maskz_load_sh^⚠Experimentalavx512fp16: Load a half-precision (16-bit) floating-point element from memory into the lower element of a new vector using zeromask k (the element is zeroed out when mask bit 0 is not set), and zero the upper elements.
_mm_maskz_max_ph^⚠Experimentalavx512fp16,avx512vl: Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
_mm_maskz_max_round_sh^⚠Experimentalavx512fp16,avx512vl: Compare the lower half-precision (16-bit) floating-point elements in a and b, store the maximum value in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
_mm_maskz_max_sh^⚠Experimentalavx512fp16,avx512vl: Compare the lower half-precision (16-bit) floating-point elements in a and b, store the maximum value in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
_mm_maskz_min_ph^⚠Experimentalavx512fp16,avx512vl: Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
_mm_maskz_min_round_sh^⚠Experimentalavx512fp16,avx512vl: Compare the lower half-precision (16-bit) floating-point elements in a and b, store the minimum value in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
_mm_maskz_min_sh^⚠Experimentalavx512fp16,avx512vl: Compare the lower half-precision (16-bit) floating-point elements in a and b, store the minimum value in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
_mm_maskz_move_sh^⚠Experimentalavx512fp16: Move the lower half-precision (16-bit) floating-point element from b to the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_maskz_mul_pch^⚠Experimentalavx512fp16,avx512vl: Multiply packed complex numbers in a and b, and store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_maskz_mul_ph^⚠Experimentalavx512fp16,avx512vl: Multiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_maskz_mul_round_sch^⚠Experimentalavx512fp16: Multiply the lower complex numbers in a and b, and store the result in the lower elements of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_maskz_mul_round_sh^⚠Experimentalavx512fp16: Multiply the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using zeromask k (the element is zeroed out when mask bit 0 is not set). Rounding is done according to the rounding parameter, which can be one of:
_mm_maskz_mul_sch^⚠Experimentalavx512fp16: Multiply the lower complex numbers in a and b, and store the result in the lower elements of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_maskz_mul_sh^⚠Experimentalavx512fp16: Multiply the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using zeromask k (the element is zeroed out when mask bit 0 is not set).
_mm_maskz_rcp_ph^⚠Experimentalavx512fp16,avx512vl: Compute the approximate reciprocal of packed 16-bit floating-point elements in a and stores the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 1.5*2^-12.
_mm_maskz_rcp_sh^⚠Experimentalavx512fp16: Compute the approximate reciprocal of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. The maximum relative error for this approximation is less than 1.5*2^-12.
_mm_maskz_reduce_ph^⚠Experimentalavx512fp16,avx512vl: Extract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_maskz_reduce_round_sh^⚠Experimentalavx512fp16: Extract the reduced argument of the lower half-precision (16-bit) floating-point element in b by the number of bits specified by imm8, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_maskz_reduce_sh^⚠Experimentalavx512fp16: Extract the reduced argument of the lower half-precision (16-bit) floating-point element in b by the number of bits specified by imm8, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_maskz_roundscale_ph^⚠Experimentalavx512fp16,avx512vl: Round packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_maskz_roundscale_round_sh^⚠Experimentalavx512fp16: Round the lower half-precision (16-bit) floating-point element in b to the number of fraction bits specified by imm8, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_maskz_roundscale_sh^⚠Experimentalavx512fp16: Round the lower half-precision (16-bit) floating-point element in b to the number of fraction bits specified by imm8, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_maskz_rsqrt_ph^⚠Experimentalavx512fp16,avx512vl: Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 1.5*2^-12.
_mm_maskz_rsqrt_sh^⚠Experimentalavx512fp16: Compute the approximate reciprocal square root of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. The maximum relative error for this approximation is less than 1.5*2^-12.
_mm_maskz_scalef_ph^⚠Experimentalavx512fp16,avx512vl: Scale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_maskz_scalef_round_sh^⚠Experimentalavx512fp16: Scale the packed single-precision (32-bit) floating-point elements in a using values from b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_maskz_scalef_sh^⚠Experimentalavx512fp16: Scale the packed single-precision (32-bit) floating-point elements in a using values from b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_maskz_sqrt_ph^⚠Experimentalavx512fp16,avx512vl: Compute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_maskz_sqrt_round_sh^⚠Experimentalavx512fp16: Compute the square root of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. Rounding is done according to the rounding parameter, which can be one of:
_mm_maskz_sqrt_sh^⚠Experimentalavx512fp16: Compute the square root of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_maskz_sub_ph^⚠Experimentalavx512fp16,avx512vl: Subtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_maskz_sub_round_sh^⚠Experimentalavx512fp16: Subtract the lower half-precision (16-bit) floating-point elements in b from a, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using zeromask k (the element is zeroed out when mask bit 0 is not set). Rounding is done according to the rounding parameter, which can be one of:
_mm_maskz_sub_sh^⚠Experimentalavx512fp16: Subtract the lower half-precision (16-bit) floating-point elements in b from a, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using zeromask k (the element is zeroed out when mask bit 0 is not set).
_mm_max_ph^⚠Experimentalavx512fp16,avx512vl: Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
_mm_max_round_sh^⚠Experimentalavx512fp16,avx512vl: Compare the lower half-precision (16-bit) floating-point elements in a and b, store the maximum value in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
_mm_max_sh^⚠Experimentalavx512fp16,avx512vl: Compare the lower half-precision (16-bit) floating-point elements in a and b, store the maximum value in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
_mm_min_ph^⚠Experimentalavx512fp16,avx512vl: Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
_mm_min_round_sh^⚠Experimentalavx512fp16,avx512vl: Compare the lower half-precision (16-bit) floating-point elements in a and b, store the minimum value in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
_mm_min_sh^⚠Experimentalavx512fp16,avx512vl: Compare the lower half-precision (16-bit) floating-point elements in a and b, store the minimum value in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
_mm_move_sh^⚠Experimentalavx512fp16: Move the lower half-precision (16-bit) floating-point element from b to the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_mul_pch^⚠Experimentalavx512fp16,avx512vl: Multiply packed complex numbers in a and b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_mul_ph^⚠Experimentalavx512fp16,avx512vl: Multiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst.
_mm_mul_round_sch^⚠Experimentalavx512fp16: Multiply the lower complex numbers in a and b, and store the result in the lower elements of dst, and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_mul_round_sh^⚠Experimentalavx512fp16: Multiply the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. Rounding is done according to the rounding parameter, which can be one of:
_mm_mul_sch^⚠Experimentalavx512fp16: Multiply the lower complex numbers in a and b, and store the result in the lower elements of dst, and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_mul_sh^⚠Experimentalavx512fp16: Multiply the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_permutex2var_ph^⚠Experimentalavx512fp16,avx512vl: Shuffle half-precision (16-bit) floating-point elements in a and b using the corresponding selector and index in idx, and store the results in dst.
_mm_permutexvar_ph^⚠Experimentalavx512fp16,avx512vl: Shuffle half-precision (16-bit) floating-point elements in a using the corresponding index in idx, and store the results in dst.
_mm_rcp_ph^⚠Experimentalavx512fp16,avx512vl: Compute the approximate reciprocal of packed 16-bit floating-point elements in a and stores the results in dst. The maximum relative error for this approximation is less than 1.5*2^-12.
_mm_rcp_sh^⚠Experimentalavx512fp16: Compute the approximate reciprocal of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. The maximum relative error for this approximation is less than 1.5*2^-12.
_mm_reduce_add_ph^⚠Experimentalavx512fp16,avx512vl: Reduce the packed half-precision (16-bit) floating-point elements in a by addition. Returns the sum of all elements in a.
_mm_reduce_max_ph^⚠Experimentalavx512fp16,avx512vl: Reduce the packed half-precision (16-bit) floating-point elements in a by maximum. Returns the maximum of all elements in a.
_mm_reduce_min_ph^⚠Experimentalavx512fp16,avx512vl: Reduce the packed half-precision (16-bit) floating-point elements in a by minimum. Returns the minimum of all elements in a.
_mm_reduce_mul_ph^⚠Experimentalavx512fp16,avx512vl: Reduce the packed half-precision (16-bit) floating-point elements in a by multiplication. Returns the product of all elements in a.
_mm_reduce_ph^⚠Experimentalavx512fp16,avx512vl: Extract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst.
_mm_reduce_round_sh^⚠Experimentalavx512fp16: Extract the reduced argument of the lower half-precision (16-bit) floating-point element in b by the number of bits specified by imm8, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_reduce_sh^⚠Experimentalavx512fp16: Extract the reduced argument of the lower half-precision (16-bit) floating-point element in b by the number of bits specified by imm8, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_roundscale_ph^⚠Experimentalavx512fp16,avx512vl: Round packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst.
_mm_roundscale_round_sh^⚠Experimentalavx512fp16: Round the lower half-precision (16-bit) floating-point element in b to the number of fraction bits specified by imm8, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_roundscale_sh^⚠Experimentalavx512fp16: Round the lower half-precision (16-bit) floating-point element in b to the number of fraction bits specified by imm8, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_rsqrt_ph^⚠Experimentalavx512fp16,avx512vl: Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst. The maximum relative error for this approximation is less than 1.5*2^-12.
_mm_rsqrt_sh^⚠Experimentalavx512fp16: Compute the approximate reciprocal square root of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. The maximum relative error for this approximation is less than 1.5*2^-12.
_mm_scalef_ph^⚠Experimentalavx512fp16,avx512vl: Scale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst.
_mm_scalef_round_sh^⚠Experimentalavx512fp16: Scale the packed single-precision (32-bit) floating-point elements in a using values from b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_scalef_sh^⚠Experimentalavx512fp16: Scale the packed single-precision (32-bit) floating-point elements in a using values from b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_set1_ph^⚠Experimentalavx512fp16: Broadcast the half-precision (16-bit) floating-point value a to all elements of dst.
_mm_set_ph^⚠Experimentalavx512fp16: Set packed half-precision (16-bit) floating-point elements in dst with the supplied values.
_mm_set_sh^⚠Experimentalavx512fp16: Copy half-precision (16-bit) floating-point elements from a to the lower element of dst and zero the upper 7 elements.
_mm_setr_ph^⚠Experimentalavx512fp16: Set packed half-precision (16-bit) floating-point elements in dst with the supplied values in reverse order.
_mm_setzero_ph^⚠Experimentalavx512fp16,avx512vl: Return vector of type __m128h with all elements set to zero.
_mm_sqrt_ph^⚠Experimentalavx512fp16,avx512vl: Compute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst.
_mm_sqrt_round_sh^⚠Experimentalavx512fp16: Compute the square root of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. Rounding is done according to the rounding parameter, which can be one of:
_mm_sqrt_sh^⚠Experimentalavx512fp16: Compute the square root of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_store_ph^⚠Experimentalavx512fp16,avx512vl: Store 128-bits (composed of 8 packed half-precision (16-bit) floating-point elements) from a into memory. The address must be aligned to 16 bytes or a general-protection exception may be generated.
_mm_store_sh^⚠Experimentalavx512fp16: Store the lower half-precision (16-bit) floating-point element from a into memory.
_mm_storeu_ph^⚠Experimentalavx512fp16,avx512vl: Store 128-bits (composed of 8 packed half-precision (16-bit) floating-point elements) from a into memory. The address does not need to be aligned to any particular boundary.
_mm_sub_ph^⚠Experimentalavx512fp16,avx512vl: Subtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst.
_mm_sub_round_sh^⚠Experimentalavx512fp16: Subtract the lower half-precision (16-bit) floating-point elements in b from a, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. Rounding is done according to the rounding parameter, which can be one of:
_mm_sub_sh^⚠Experimentalavx512fp16: Subtract the lower half-precision (16-bit) floating-point elements in b from a, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_ucomieq_sh^⚠Experimentalavx512fp16: Compare the lower half-precision (16-bit) floating-point elements in a and b for equality, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.
_mm_ucomige_sh^⚠Experimentalavx512fp16: Compare the lower half-precision (16-bit) floating-point elements in a and b for greater-than-or-equal, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.
_mm_ucomigt_sh^⚠Experimentalavx512fp16: Compare the lower half-precision (16-bit) floating-point elements in a and b for greater-than, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.
_mm_ucomile_sh^⚠Experimentalavx512fp16: Compare the lower half-precision (16-bit) floating-point elements in a and b for less-than-or-equal, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.
_mm_ucomilt_sh^⚠Experimentalavx512fp16: Compare the lower half-precision (16-bit) floating-point elements in a and b for less-than, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.
_mm_ucomineq_sh^⚠Experimentalavx512fp16: Compare the lower half-precision (16-bit) floating-point elements in a and b for not-equal, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.
_mm_undefined_ph^⚠Experimentalavx512fp16,avx512vl: Return vector of type __m128h with indetermination elements. Despite using the word “undefined” (following Intel’s naming scheme), this non-deterministically picks some valid value and is not equivalent to mem::MaybeUninit. In practice, this is typically equivalent to mem::zeroed.

Module avx512fp16Copy item path

Macros§

Functions§

Module avx512fp16