Module avx512fp16

Source
Available on x86 or x86-64 only.

MacrosΒ§

cmp_asm πŸ”’
fpclass_asm πŸ”’

FunctionsΒ§

vaddph πŸ”’ ⚠
vaddsh πŸ”’ ⚠
vcmpsh πŸ”’ ⚠
vcomish πŸ”’ ⚠
vcvtdq2ph_128 πŸ”’ ⚠
vcvtdq2ph_256 πŸ”’ ⚠
vcvtdq2ph_512 πŸ”’ ⚠
vcvtpd2ph_128 πŸ”’ ⚠
vcvtpd2ph_256 πŸ”’ ⚠
vcvtpd2ph_512 πŸ”’ ⚠
vcvtph2dq_128 πŸ”’ ⚠
vcvtph2dq_256 πŸ”’ ⚠
vcvtph2dq_512 πŸ”’ ⚠
vcvtph2pd_128 πŸ”’ ⚠
vcvtph2pd_256 πŸ”’ ⚠
vcvtph2pd_512 πŸ”’ ⚠
vcvtph2psx_128 πŸ”’ ⚠
vcvtph2psx_256 πŸ”’ ⚠
vcvtph2psx_512 πŸ”’ ⚠
vcvtph2qq_128 πŸ”’ ⚠
vcvtph2qq_256 πŸ”’ ⚠
vcvtph2qq_512 πŸ”’ ⚠
vcvtph2udq_128 πŸ”’ ⚠
vcvtph2udq_256 πŸ”’ ⚠
vcvtph2udq_512 πŸ”’ ⚠
vcvtph2uqq_128 πŸ”’ ⚠
vcvtph2uqq_256 πŸ”’ ⚠
vcvtph2uqq_512 πŸ”’ ⚠
vcvtph2uw_128 πŸ”’ ⚠
vcvtph2uw_256 πŸ”’ ⚠
vcvtph2uw_512 πŸ”’ ⚠
vcvtph2w_128 πŸ”’ ⚠
vcvtph2w_256 πŸ”’ ⚠
vcvtph2w_512 πŸ”’ ⚠
vcvtps2phx_128 πŸ”’ ⚠
vcvtps2phx_256 πŸ”’ ⚠
vcvtps2phx_512 πŸ”’ ⚠
vcvtqq2ph_128 πŸ”’ ⚠
vcvtqq2ph_256 πŸ”’ ⚠
vcvtqq2ph_512 πŸ”’ ⚠
vcvtsd2sh πŸ”’ ⚠
vcvtsh2sd πŸ”’ ⚠
vcvtsh2si32 πŸ”’ ⚠
vcvtsh2ss πŸ”’ ⚠
vcvtsh2usi32 πŸ”’ ⚠
vcvtsi2sh πŸ”’ ⚠
vcvtss2sh πŸ”’ ⚠
vcvttph2dq_128 πŸ”’ ⚠
vcvttph2dq_256 πŸ”’ ⚠
vcvttph2dq_512 πŸ”’ ⚠
vcvttph2qq_128 πŸ”’ ⚠
vcvttph2qq_256 πŸ”’ ⚠
vcvttph2qq_512 πŸ”’ ⚠
vcvttph2udq_128 πŸ”’ ⚠
vcvttph2udq_256 πŸ”’ ⚠
vcvttph2udq_512 πŸ”’ ⚠
vcvttph2uqq_128 πŸ”’ ⚠
vcvttph2uqq_256 πŸ”’ ⚠
vcvttph2uqq_512 πŸ”’ ⚠
vcvttph2uw_128 πŸ”’ ⚠
vcvttph2uw_256 πŸ”’ ⚠
vcvttph2uw_512 πŸ”’ ⚠
vcvttph2w_128 πŸ”’ ⚠
vcvttph2w_256 πŸ”’ ⚠
vcvttph2w_512 πŸ”’ ⚠
vcvttsh2si32 πŸ”’ ⚠
vcvttsh2usi32 πŸ”’ ⚠
vcvtudq2ph_128 πŸ”’ ⚠
vcvtudq2ph_256 πŸ”’ ⚠
vcvtudq2ph_512 πŸ”’ ⚠
vcvtuqq2ph_128 πŸ”’ ⚠
vcvtuqq2ph_256 πŸ”’ ⚠
vcvtuqq2ph_512 πŸ”’ ⚠
vcvtusi2sh πŸ”’ ⚠
vcvtuw2ph_128 πŸ”’ ⚠
vcvtuw2ph_256 πŸ”’ ⚠
vcvtuw2ph_512 πŸ”’ ⚠
vcvtw2ph_128 πŸ”’ ⚠
vcvtw2ph_256 πŸ”’ ⚠
vcvtw2ph_512 πŸ”’ ⚠
vdivph πŸ”’ ⚠
vdivsh πŸ”’ ⚠
vfcmaddcph_mask3_128 πŸ”’ ⚠
vfcmaddcph_mask3_256 πŸ”’ ⚠
vfcmaddcph_mask3_512 πŸ”’ ⚠
vfcmaddcph_maskz_128 πŸ”’ ⚠
vfcmaddcph_maskz_256 πŸ”’ ⚠
vfcmaddcph_maskz_512 πŸ”’ ⚠
vfcmaddcsh_mask πŸ”’ ⚠
vfcmaddcsh_maskz πŸ”’ ⚠
vfcmulcph_128 πŸ”’ ⚠
vfcmulcph_256 πŸ”’ ⚠
vfcmulcph_512 πŸ”’ ⚠
vfcmulcsh πŸ”’ ⚠
vfmaddcph_mask3_128 πŸ”’ ⚠
vfmaddcph_mask3_256 πŸ”’ ⚠
vfmaddcph_mask3_512 πŸ”’ ⚠
vfmaddcph_maskz_128 πŸ”’ ⚠
vfmaddcph_maskz_256 πŸ”’ ⚠
vfmaddcph_maskz_512 πŸ”’ ⚠
vfmaddcsh_mask πŸ”’ ⚠
vfmaddcsh_maskz πŸ”’ ⚠
vfmaddph_512 πŸ”’ ⚠
vfmaddsh πŸ”’ ⚠
vfmaddsubph_128 πŸ”’ ⚠
vfmaddsubph_256 πŸ”’ ⚠
vfmaddsubph_512 πŸ”’ ⚠
vfmulcph_128 πŸ”’ ⚠
vfmulcph_256 πŸ”’ ⚠
vfmulcph_512 πŸ”’ ⚠
vfmulcsh πŸ”’ ⚠
vfpclasssh πŸ”’ ⚠
vgetexpph_128 πŸ”’ ⚠
vgetexpph_256 πŸ”’ ⚠
vgetexpph_512 πŸ”’ ⚠
vgetexpsh πŸ”’ ⚠
vgetmantph_128 πŸ”’ ⚠
vgetmantph_256 πŸ”’ ⚠
vgetmantph_512 πŸ”’ ⚠
vgetmantsh πŸ”’ ⚠
vmaxph_128 πŸ”’ ⚠
vmaxph_256 πŸ”’ ⚠
vmaxph_512 πŸ”’ ⚠
vmaxsh πŸ”’ ⚠
vminph_128 πŸ”’ ⚠
vminph_256 πŸ”’ ⚠
vminph_512 πŸ”’ ⚠
vminsh πŸ”’ ⚠
vmulph πŸ”’ ⚠
vmulsh πŸ”’ ⚠
vrcpph_128 πŸ”’ ⚠
vrcpph_256 πŸ”’ ⚠
vrcpph_512 πŸ”’ ⚠
vrcpsh πŸ”’ ⚠
vreduceph_128 πŸ”’ ⚠
vreduceph_256 πŸ”’ ⚠
vreduceph_512 πŸ”’ ⚠
vreducesh πŸ”’ ⚠
vrndscaleph_128 πŸ”’ ⚠
vrndscaleph_256 πŸ”’ ⚠
vrndscaleph_512 πŸ”’ ⚠
vrndscalesh πŸ”’ ⚠
vrsqrtph_128 πŸ”’ ⚠
vrsqrtph_256 πŸ”’ ⚠
vrsqrtph_512 πŸ”’ ⚠
vrsqrtsh πŸ”’ ⚠
vscalefph_128 πŸ”’ ⚠
vscalefph_256 πŸ”’ ⚠
vscalefph_512 πŸ”’ ⚠
vscalefsh πŸ”’ ⚠
vsqrtph_512 πŸ”’ ⚠
vsqrtsh πŸ”’ ⚠
vsubph πŸ”’ ⚠
vsubsh πŸ”’ ⚠
_mm256_abs_ph⚠Experimentalavx512fp16,avx512vl
Finds the absolute value of each packed half-precision (16-bit) floating-point element in v2, storing the result in dst.
_mm256_add_ph⚠Experimentalavx512fp16,avx512vl
Add packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst.
_mm256_castpd_ph⚠Experimentalavx512fp16
Cast vector of type __m256d to type __m256h. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
_mm256_castph128_ph256⚠Experimentalavx512fp16
Cast vector of type __m128h to type __m256h. The upper 8 elements of the result are undefined. In practice, the upper elements are zeroed. This intrinsic can generate the vzeroupper instruction, but most of the time it does not generate any instructions.
_mm256_castph256_ph128⚠Experimentalavx512fp16
Cast vector of type __m256h to type __m128h. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
_mm256_castph_pd⚠Experimentalavx512fp16
Cast vector of type __m256h to type __m256d. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
_mm256_castph_ps⚠Experimentalavx512fp16
Cast vector of type __m256h to type __m256. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
_mm256_castph_si256⚠Experimentalavx512fp16
Cast vector of type __m256h to type __m256i. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
_mm256_castps_ph⚠Experimentalavx512fp16
Cast vector of type __m256 to type __m256h. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
_mm256_castsi256_ph⚠Experimentalavx512fp16
Cast vector of type __m256i to type __m256h. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
_mm256_cmp_ph_mask⚠Experimentalavx512fp16,avx512vl
Compare packed half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
_mm256_cmul_pch⚠Experimentalavx512fp16,avx512vl
Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm256_conj_pch⚠Experimentalavx512fp16,avx512vl
Compute the complex conjugates of complex numbers in a, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm256_cvtepi16_ph⚠Experimentalavx512fp16,avx512vl
Convert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
_mm256_cvtepi32_ph⚠Experimentalavx512fp16,avx512vl
Convert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
_mm256_cvtepi64_ph⚠Experimentalavx512fp16,avx512vl
Convert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst. The upper 64 bits of dst are zeroed out.
_mm256_cvtepu16_ph⚠Experimentalavx512fp16,avx512vl
Convert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
_mm256_cvtepu32_ph⚠Experimentalavx512fp16,avx512vl
Convert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
_mm256_cvtepu64_ph⚠Experimentalavx512fp16,avx512vl
Convert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst. The upper 64 bits of dst are zeroed out.
_mm256_cvtpd_ph⚠Experimentalavx512fp16,avx512vl
Convert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst. The upper 64 bits of dst are zeroed out.
_mm256_cvtph_epi16⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst.
_mm256_cvtph_epi32⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst.
_mm256_cvtph_epi64⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst.
_mm256_cvtph_epu16⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst.
_mm256_cvtph_epu32⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers, and store the results in dst.
_mm256_cvtph_epu64⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst.
_mm256_cvtph_pd⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst.
_mm256_cvtsh_h⚠Experimentalavx512fp16
Copy the lower half-precision (16-bit) floating-point element from a to dst.
_mm256_cvttph_epi16⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst.
_mm256_cvttph_epi32⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst.
_mm256_cvttph_epi64⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst.
_mm256_cvttph_epu16⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst.
_mm256_cvttph_epu32⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst.
_mm256_cvttph_epu64⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst.
_mm256_cvtxph_ps⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst.
_mm256_cvtxps_ph⚠Experimentalavx512fp16,avx512vl
Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
_mm256_div_ph⚠Experimentalavx512fp16,avx512vl
Divide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst.
_mm256_fcmadd_pch⚠Experimentalavx512fp16,avx512vl
Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate to the corresponding complex numbers in c, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm256_fcmul_pch⚠Experimentalavx512fp16,avx512vl
Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm256_fmadd_pch⚠Experimentalavx512fp16,avx512vl
Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm256_fmadd_ph⚠Experimentalavx512fp16,avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst.
_mm256_fmaddsub_ph⚠Experimentalavx512fp16,avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst.
_mm256_fmsub_ph⚠Experimentalavx512fp16,avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst.
_mm256_fmsubadd_ph⚠Experimentalavx512fp16,avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst.
_mm256_fmul_pch⚠Experimentalavx512fp16,avx512vl
Multiply packed complex numbers in a and b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm256_fnmadd_ph⚠Experimentalavx512fp16,avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst.
_mm256_fnmsub_ph⚠Experimentalavx512fp16,avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst.
_mm256_fpclass_ph_mask⚠Experimentalavx512fp16,avx512vl
Test packed half-precision (16-bit) floating-point elements in a for special categories specified by imm8, and store the results in mask vector k. imm can be a combination of:
_mm256_getexp_ph⚠Experimentalavx512fp16,avx512vl
Convert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision (16-bit) floating-point number representing the integer exponent, and store the results in dst. This intrinsic essentially calculates floor(log2(x)) for each element.
_mm256_getmant_ph⚠Experimentalavx512fp16,avx512vl
Normalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store the results in dst. This intrinsic essentially calculates Β±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign.
_mm256_load_ph⚠Experimentalavx512fp16,avx512vl
Load 256-bits (composed of 16 packed half-precision (16-bit) floating-point elements) from memory into a new vector. The address must be aligned to 32 bytes or a general-protection exception may be generated.
_mm256_loadu_ph⚠Experimentalavx512fp16,avx512vl
Load 256-bits (composed of 16 packed half-precision (16-bit) floating-point elements) from memory into a new vector. The address does not need to be aligned to any particular boundary.
_mm256_mask3_fcmadd_pch⚠Experimentalavx512fp16,avx512vl
Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm256_mask3_fmadd_pch⚠Experimentalavx512fp16,avx512vl
Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm256_mask3_fmadd_ph⚠Experimentalavx512fp16,avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
_mm256_mask3_fmaddsub_ph⚠Experimentalavx512fp16,avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
_mm256_mask3_fmsub_ph⚠Experimentalavx512fp16,avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
_mm256_mask3_fmsubadd_ph⚠Experimentalavx512fp16,avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
_mm256_mask3_fnmadd_ph⚠Experimentalavx512fp16,avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
_mm256_mask3_fnmsub_ph⚠Experimentalavx512fp16,avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
_mm256_mask_add_ph⚠Experimentalavx512fp16,avx512vl
Add packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_mask_blend_ph⚠Experimentalavx512fp16,avx512vl
Blend packed half-precision (16-bit) floating-point elements from a and b using control mask k, and store the results in dst.
_mm256_mask_cmp_ph_mask⚠Experimentalavx512fp16,avx512vl
Compare packed half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_cmul_pch⚠Experimentalavx512fp16,avx512vl
Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm256_mask_conj_pch⚠Experimentalavx512fp16,avx512vl
Compute the complex conjugates of complex numbers in a, and store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm256_mask_cvtepi16_ph⚠Experimentalavx512fp16,avx512vl
Convert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm256_mask_cvtepi32_ph⚠Experimentalavx512fp16,avx512vl
Convert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm256_mask_cvtepi64_ph⚠Experimentalavx512fp16,avx512vl
Convert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
_mm256_mask_cvtepu16_ph⚠Experimentalavx512fp16,avx512vl
Convert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm256_mask_cvtepu32_ph⚠Experimentalavx512fp16,avx512vl
Convert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm256_mask_cvtepu64_ph⚠Experimentalavx512fp16,avx512vl
Convert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
_mm256_mask_cvtpd_ph⚠Experimentalavx512fp16,avx512vl
Convert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
_mm256_mask_cvtph_epi16⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_mask_cvtph_epi32⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_mask_cvtph_epi64⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_mask_cvtph_epu16⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_mask_cvtph_epu32⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_mask_cvtph_epu64⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_mask_cvtph_pd⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm256_mask_cvttph_epi16⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_mask_cvttph_epi32⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_mask_cvttph_epi64⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_mask_cvttph_epu16⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_mask_cvttph_epu32⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_mask_cvttph_epu64⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_mask_cvtxph_ps⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm256_mask_cvtxps_ph⚠Experimentalavx512fp16,avx512vl
Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm256_mask_div_ph⚠Experimentalavx512fp16,avx512vl
Divide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_mask_fcmadd_pch⚠Experimentalavx512fp16,avx512vl
Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm256_mask_fcmul_pch⚠Experimentalavx512fp16,avx512vl
Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm256_mask_fmadd_pch⚠Experimentalavx512fp16,avx512vl
Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm256_mask_fmadd_ph⚠Experimentalavx512fp16,avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
_mm256_mask_fmaddsub_ph⚠Experimentalavx512fp16,avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
_mm256_mask_fmsub_ph⚠Experimentalavx512fp16,avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
_mm256_mask_fmsubadd_ph⚠Experimentalavx512fp16,avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
_mm256_mask_fmul_pch⚠Experimentalavx512fp16,avx512vl
Multiply packed complex numbers in a and b, and store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm256_mask_fnmadd_ph⚠Experimentalavx512fp16,avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
_mm256_mask_fnmsub_ph⚠Experimentalavx512fp16,avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
_mm256_mask_fpclass_ph_mask⚠Experimentalavx512fp16,avx512vl
Test packed half-precision (16-bit) floating-point elements in a for special categories specified by imm8, and store the results in mask vector k using zeromask k (elements are zeroed out when the corresponding mask bit is not set). imm can be a combination of:
_mm256_mask_getexp_ph⚠Experimentalavx512fp16,avx512vl
Convert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision (16-bit) floating-point number representing the integer exponent, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element.
_mm256_mask_getmant_ph⚠Experimentalavx512fp16,avx512vl
Normalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates Β±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign.
_mm256_mask_max_ph⚠Experimentalavx512fp16,avx512vl
Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
_mm256_mask_min_ph⚠Experimentalavx512fp16,avx512vl
Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
_mm256_mask_mul_pch⚠Experimentalavx512fp16,avx512vl
Multiply packed complex numbers in a and b, and store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm256_mask_mul_ph⚠Experimentalavx512fp16,avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_mask_rcp_ph⚠Experimentalavx512fp16,avx512vl
Compute the approximate reciprocal of packed 16-bit floating-point elements in a and stores the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 1.5*2^-12.
_mm256_mask_reduce_ph⚠Experimentalavx512fp16,avx512vl
Extract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_mask_roundscale_ph⚠Experimentalavx512fp16,avx512vl
Round packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_mask_rsqrt_ph⚠Experimentalavx512fp16,avx512vl
Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 1.5*2^-12.
_mm256_mask_scalef_ph⚠Experimentalavx512fp16,avx512vl
Scale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_mask_sqrt_ph⚠Experimentalavx512fp16,avx512vl
Compute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_mask_sub_ph⚠Experimentalavx512fp16,avx512vl
Subtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_add_ph⚠Experimentalavx512fp16,avx512vl
Add packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_cmul_pch⚠Experimentalavx512fp16,avx512vl
Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm256_maskz_conj_pch⚠Experimentalavx512fp16,avx512vl
Compute the complex conjugates of complex numbers in a, and store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm256_maskz_cvtepi16_ph⚠Experimentalavx512fp16,avx512vl
Convert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_cvtepi32_ph⚠Experimentalavx512fp16,avx512vl
Convert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_cvtepi64_ph⚠Experimentalavx512fp16,avx512vl
Convert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
_mm256_maskz_cvtepu16_ph⚠Experimentalavx512fp16,avx512vl
Convert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_cvtepu32_ph⚠Experimentalavx512fp16,avx512vl
Convert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_cvtepu64_ph⚠Experimentalavx512fp16,avx512vl
Convert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
_mm256_maskz_cvtpd_ph⚠Experimentalavx512fp16,avx512vl
Convert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
_mm256_maskz_cvtph_epi16⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_cvtph_epi32⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_cvtph_epi64⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_cvtph_epu16⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_cvtph_epu32⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_cvtph_epu64⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_cvtph_pd⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_cvttph_epi16⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_cvttph_epi32⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_cvttph_epi64⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_cvttph_epu16⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_cvttph_epu32⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_cvttph_epu64⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_cvtxph_ps⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_cvtxps_ph⚠Experimentalavx512fp16,avx512vl
Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_div_ph⚠Experimentalavx512fp16,avx512vl
Divide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_fcmadd_pch⚠Experimentalavx512fp16,avx512vl
Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate to the corresponding complex numbers in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm256_maskz_fcmul_pch⚠Experimentalavx512fp16,avx512vl
Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm256_maskz_fmadd_pch⚠Experimentalavx512fp16,avx512vl
Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm256_maskz_fmadd_ph⚠Experimentalavx512fp16,avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
_mm256_maskz_fmaddsub_ph⚠Experimentalavx512fp16,avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
_mm256_maskz_fmsub_ph⚠Experimentalavx512fp16,avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
_mm256_maskz_fmsubadd_ph⚠Experimentalavx512fp16,avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
_mm256_maskz_fmul_pch⚠Experimentalavx512fp16,avx512vl
Multiply packed complex numbers in a and b, and store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm256_maskz_fnmadd_ph⚠Experimentalavx512fp16,avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
_mm256_maskz_fnmsub_ph⚠Experimentalavx512fp16,avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
_mm256_maskz_getexp_ph⚠Experimentalavx512fp16,avx512vl
Convert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision (16-bit) floating-point number representing the integer exponent, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element.
_mm256_maskz_getmant_ph⚠Experimentalavx512fp16,avx512vl
Normalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates Β±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign.
_mm256_maskz_max_ph⚠Experimentalavx512fp16,avx512vl
Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
_mm256_maskz_min_ph⚠Experimentalavx512fp16,avx512vl
Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
_mm256_maskz_mul_pch⚠Experimentalavx512fp16,avx512vl
Multiply packed complex numbers in a and b, and store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm256_maskz_mul_ph⚠Experimentalavx512fp16,avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_rcp_ph⚠Experimentalavx512fp16,avx512vl
Compute the approximate reciprocal of packed 16-bit floating-point elements in a and stores the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 1.5*2^-12.
_mm256_maskz_reduce_ph⚠Experimentalavx512fp16,avx512vl
Extract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_roundscale_ph⚠Experimentalavx512fp16,avx512vl
Round packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_rsqrt_ph⚠Experimentalavx512fp16,avx512vl
Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 1.5*2^-12.
_mm256_maskz_scalef_ph⚠Experimentalavx512fp16,avx512vl
Scale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_sqrt_ph⚠Experimentalavx512fp16,avx512vl
Compute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_sub_ph⚠Experimentalavx512fp16,avx512vl
Subtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_max_ph⚠Experimentalavx512fp16,avx512vl
Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
_mm256_min_ph⚠Experimentalavx512fp16,avx512vl
Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
_mm256_mul_pch⚠Experimentalavx512fp16,avx512vl
Multiply packed complex numbers in a and b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm256_mul_ph⚠Experimentalavx512fp16,avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst.
_mm256_permutex2var_ph⚠Experimentalavx512fp16,avx512vl
Shuffle half-precision (16-bit) floating-point elements in a and b using the corresponding selector and index in idx, and store the results in dst.
_mm256_permutexvar_ph⚠Experimentalavx512fp16,avx512vl
Shuffle half-precision (16-bit) floating-point elements in a using the corresponding index in idx, and store the results in dst.
_mm256_rcp_ph⚠Experimentalavx512fp16,avx512vl
Compute the approximate reciprocal of packed 16-bit floating-point elements in a and stores the results in dst. The maximum relative error for this approximation is less than 1.5*2^-12.
_mm256_reduce_add_ph⚠Experimentalavx512fp16,avx512vl
Reduce the packed half-precision (16-bit) floating-point elements in a by addition. Returns the sum of all elements in a.
_mm256_reduce_max_ph⚠Experimentalavx512fp16,avx512vl
Reduce the packed half-precision (16-bit) floating-point elements in a by maximum. Returns the maximum of all elements in a.
_mm256_reduce_min_ph⚠Experimentalavx512fp16,avx512vl
Reduce the packed half-precision (16-bit) floating-point elements in a by minimum. Returns the minimum of all elements in a.
_mm256_reduce_mul_ph⚠Experimentalavx512fp16,avx512vl
Reduce the packed half-precision (16-bit) floating-point elements in a by multiplication. Returns the product of all elements in a.
_mm256_reduce_ph⚠Experimentalavx512fp16,avx512vl
Extract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst.
_mm256_roundscale_ph⚠Experimentalavx512fp16,avx512vl
Round packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst.
_mm256_rsqrt_ph⚠Experimentalavx512fp16,avx512vl
Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst. The maximum relative error for this approximation is less than 1.5*2^-12.
_mm256_scalef_ph⚠Experimentalavx512fp16,avx512vl
Scale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst.
_mm256_set1_ph⚠Experimentalavx512fp16
Broadcast the half-precision (16-bit) floating-point value a to all elements of dst.
_mm256_set_ph⚠Experimentalavx512fp16
Set packed half-precision (16-bit) floating-point elements in dst with the supplied values.
_mm256_setr_ph⚠Experimentalavx512fp16
Set packed half-precision (16-bit) floating-point elements in dst with the supplied values in reverse order.
_mm256_setzero_ph⚠Experimentalavx512fp16,avx512vl
Return vector of type __m256h with all elements set to zero.
_mm256_sqrt_ph⚠Experimentalavx512fp16,avx512vl
Compute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst.
_mm256_store_ph⚠Experimentalavx512fp16,avx512vl
Store 256-bits (composed of 16 packed half-precision (16-bit) floating-point elements) from a into memory. The address must be aligned to 32 bytes or a general-protection exception may be generated.
_mm256_storeu_ph⚠Experimentalavx512fp16,avx512vl
Store 256-bits (composed of 16 packed half-precision (16-bit) floating-point elements) from a into memory. The address does not need to be aligned to any particular boundary.
_mm256_sub_ph⚠Experimentalavx512fp16,avx512vl
Subtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst.
_mm256_undefined_ph⚠Experimentalavx512fp16,avx512vl
Return vector of type __m256h with indetermination elements. Despite using the word β€œundefined” (following Intel’s naming scheme), this non-deterministically picks some valid value and is not equivalent to mem::MaybeUninit. In practice, this is typically equivalent to mem::zeroed.
_mm256_zextph128_ph256⚠Experimentalavx512fp16
Cast vector of type __m256h to type __m128h. The upper 8 elements of the result are zeroed. This intrinsic can generate the vzeroupper instruction, but most of the time it does not generate any instructions.
_mm512_abs_ph⚠Experimentalavx512fp16
Finds the absolute value of each packed half-precision (16-bit) floating-point element in v2, storing the result in dst.
_mm512_add_ph⚠Experimentalavx512fp16
Add packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst.
_mm512_add_round_ph⚠Experimentalavx512fp16
Add packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst. Rounding is done according to the rounding parameter, which can be one of:
_mm512_castpd_ph⚠Experimentalavx512fp16
Cast vector of type __m512d to type __m512h. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
_mm512_castph128_ph512⚠Experimentalavx512fp16
Cast vector of type __m128h to type __m512h. The upper 24 elements of the result are undefined. In practice, the upper elements are zeroed. This intrinsic can generate the vzeroupper instruction, but most of the time it does not generate any instructions.
_mm512_castph256_ph512⚠Experimentalavx512fp16
Cast vector of type __m256h to type __m512h. The upper 16 elements of the result are undefined. In practice, the upper elements are zeroed. This intrinsic can generate the vzeroupper instruction, but most of the time it does not generate any instructions.
_mm512_castph512_ph128⚠Experimentalavx512fp16
Cast vector of type __m512h to type __m128h. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
_mm512_castph512_ph256⚠Experimentalavx512fp16
Cast vector of type __m512h to type __m256h. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
_mm512_castph_pd⚠Experimentalavx512fp16
Cast vector of type __m512h to type __m512d. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
_mm512_castph_ps⚠Experimentalavx512fp16
Cast vector of type __m512h to type __m512. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
_mm512_castph_si512⚠Experimentalavx512fp16
Cast vector of type __m512h to type __m512i. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
_mm512_castps_ph⚠Experimentalavx512fp16
Cast vector of type __m512 to type __m512h. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
_mm512_castsi512_ph⚠Experimentalavx512fp16
Cast vector of type __m512i to type __m512h. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
_mm512_cmp_ph_mask⚠Experimentalavx512fp16
Compare packed half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
_mm512_cmp_round_ph_mask⚠Experimentalavx512fp16
Compare packed half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
_mm512_cmul_pch⚠Experimentalavx512fp16
Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm512_cmul_round_pch⚠Experimentalavx512fp16
Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm512_conj_pch⚠Experimentalavx512fp16
Compute the complex conjugates of complex numbers in a, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm512_cvt_roundepi16_ph⚠Experimentalavx512fp16
Convert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
_mm512_cvt_roundepi32_ph⚠Experimentalavx512fp16
Convert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
_mm512_cvt_roundepi64_ph⚠Experimentalavx512fp16
Convert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
_mm512_cvt_roundepu16_ph⚠Experimentalavx512fp16
Convert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
_mm512_cvt_roundepu32_ph⚠Experimentalavx512fp16
Convert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
_mm512_cvt_roundepu64_ph⚠Experimentalavx512fp16
Convert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
_mm512_cvt_roundpd_ph⚠Experimentalavx512fp16
Convert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
_mm512_cvt_roundph_epi16⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst.
_mm512_cvt_roundph_epi32⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst.
_mm512_cvt_roundph_epi64⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst.
_mm512_cvt_roundph_epu16⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst.
_mm512_cvt_roundph_epu32⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers, and store the results in dst.
_mm512_cvt_roundph_epu64⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst.
_mm512_cvt_roundph_pd⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst.
_mm512_cvtepi16_ph⚠Experimentalavx512fp16
Convert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
_mm512_cvtepi32_ph⚠Experimentalavx512fp16
Convert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
_mm512_cvtepi64_ph⚠Experimentalavx512fp16
Convert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
_mm512_cvtepu16_ph⚠Experimentalavx512fp16
Convert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
_mm512_cvtepu32_ph⚠Experimentalavx512fp16
Convert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
_mm512_cvtepu64_ph⚠Experimentalavx512fp16
Convert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
_mm512_cvtpd_ph⚠Experimentalavx512fp16
Convert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
_mm512_cvtph_epi16⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst.
_mm512_cvtph_epi32⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst.
_mm512_cvtph_epi64⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst.
_mm512_cvtph_epu16⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst.
_mm512_cvtph_epu32⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers, and store the results in dst.
_mm512_cvtph_epu64⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst.
_mm512_cvtph_pd⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst.
_mm512_cvtsh_h⚠Experimentalavx512fp16
Copy the lower half-precision (16-bit) floating-point element from a to dst.
_mm512_cvtt_roundph_epi16⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst.
_mm512_cvtt_roundph_epi32⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst.
_mm512_cvtt_roundph_epi64⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst.
_mm512_cvtt_roundph_epu16⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst.
_mm512_cvtt_roundph_epu32⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst.
_mm512_cvtt_roundph_epu64⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst.
_mm512_cvttph_epi16⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst.
_mm512_cvttph_epi32⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst.
_mm512_cvttph_epi64⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst.
_mm512_cvttph_epu16⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst.
_mm512_cvttph_epu32⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst.
_mm512_cvttph_epu64⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst.
_mm512_cvtx_roundph_ps⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst.
_mm512_cvtx_roundps_ph⚠Experimentalavx512fp16
Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
_mm512_cvtxph_ps⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst.
_mm512_cvtxps_ph⚠Experimentalavx512fp16
Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
_mm512_div_ph⚠Experimentalavx512fp16
Divide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst.
_mm512_div_round_ph⚠Experimentalavx512fp16
Divide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst. Rounding is done according to the rounding parameter, which can be one of:
_mm512_fcmadd_pch⚠Experimentalavx512fp16
Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate to the corresponding complex numbers in c, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm512_fcmadd_round_pch⚠Experimentalavx512fp16
Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate to the corresponding complex numbers in c, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm512_fcmul_pch⚠Experimentalavx512fp16
Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm512_fcmul_round_pch⚠Experimentalavx512fp16
Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1],
_mm512_fmadd_pch⚠Experimentalavx512fp16
Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm512_fmadd_ph⚠Experimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst.
_mm512_fmadd_round_pch⚠Experimentalavx512fp16
Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm512_fmadd_round_ph⚠Experimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst.
_mm512_fmaddsub_ph⚠Experimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst.
_mm512_fmaddsub_round_ph⚠Experimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst.
_mm512_fmsub_ph⚠Experimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst.
_mm512_fmsub_round_ph⚠Experimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst.
_mm512_fmsubadd_ph⚠Experimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst.
_mm512_fmsubadd_round_ph⚠Experimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst.
_mm512_fmul_pch⚠Experimentalavx512fp16
Multiply packed complex numbers in a and b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm512_fmul_round_pch⚠Experimentalavx512fp16
Multiply packed complex numbers in a and b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1]. Rounding is done according to the rounding parameter, which can be one of:
_mm512_fnmadd_ph⚠Experimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst.
_mm512_fnmadd_round_ph⚠Experimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst.
_mm512_fnmsub_ph⚠Experimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst.
_mm512_fnmsub_round_ph⚠Experimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst.
_mm512_fpclass_ph_mask⚠Experimentalavx512fp16
Test packed half-precision (16-bit) floating-point elements in a for special categories specified by imm8, and store the results in mask vector k. imm can be a combination of:
_mm512_getexp_ph⚠Experimentalavx512fp16
Convert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision (16-bit) floating-point number representing the integer exponent, and store the results in dst. This intrinsic essentially calculates floor(log2(x)) for each element.
_mm512_getexp_round_ph⚠Experimentalavx512fp16
Convert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision (16-bit) floating-point number representing the integer exponent, and store the results in dst. This intrinsic essentially calculates floor(log2(x)) for each element. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
_mm512_getmant_ph⚠Experimentalavx512fp16
Normalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store the results in dst. This intrinsic essentially calculates Β±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign.
_mm512_getmant_round_ph⚠Experimentalavx512fp16
Normalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store the results in dst. This intrinsic essentially calculates Β±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
_mm512_load_ph⚠Experimentalavx512fp16
Load 512-bits (composed of 32 packed half-precision (16-bit) floating-point elements) from memory into a new vector. The address must be aligned to 64 bytes or a general-protection exception may be generated.
_mm512_loadu_ph⚠Experimentalavx512fp16
Load 512-bits (composed of 32 packed half-precision (16-bit) floating-point elements) from memory into a new vector. The address does not need to be aligned to any particular boundary.
_mm512_mask3_fcmadd_pch⚠Experimentalavx512fp16
Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm512_mask3_fcmadd_round_pch⚠Experimentalavx512fp16
Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate to the corresponding complex numbers in c using writemask k (the element is copied from c when the corresponding mask bit is not set), and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1, or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm512_mask3_fmadd_pch⚠Experimentalavx512fp16
Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm512_mask3_fmadd_ph⚠Experimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
_mm512_mask3_fmadd_round_pch⚠Experimentalavx512fp16
Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm512_mask3_fmadd_round_ph⚠Experimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
_mm512_mask3_fmaddsub_ph⚠Experimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
_mm512_mask3_fmaddsub_round_ph⚠Experimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
_mm512_mask3_fmsub_ph⚠Experimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
_mm512_mask3_fmsub_round_ph⚠Experimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
_mm512_mask3_fmsubadd_ph⚠Experimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
_mm512_mask3_fmsubadd_round_ph⚠Experimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
_mm512_mask3_fnmadd_ph⚠Experimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
_mm512_mask3_fnmadd_round_ph⚠Experimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
_mm512_mask3_fnmsub_ph⚠Experimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
_mm512_mask3_fnmsub_round_ph⚠Experimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
_mm512_mask_add_ph⚠Experimentalavx512fp16
Add packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_add_round_ph⚠Experimentalavx512fp16
Add packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Rounding is done according to the rounding parameter, which can be one of:
_mm512_mask_blend_ph⚠Experimentalavx512fp16
Blend packed half-precision (16-bit) floating-point elements from a and b using control mask k, and store the results in dst.
_mm512_mask_cmp_ph_mask⚠Experimentalavx512fp16
Compare packed half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_mask_cmp_round_ph_mask⚠Experimentalavx512fp16
Compare packed half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_mask_cmul_pch⚠Experimentalavx512fp16
Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm512_mask_cmul_round_pch⚠Experimentalavx512fp16
Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm512_mask_conj_pch⚠Experimentalavx512fp16
Compute the complex conjugates of complex numbers in a, and store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm512_mask_cvt_roundepi16_ph⚠Experimentalavx512fp16
Convert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm512_mask_cvt_roundepi32_ph⚠Experimentalavx512fp16
Convert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm512_mask_cvt_roundepi64_ph⚠Experimentalavx512fp16
Convert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm512_mask_cvt_roundepu16_ph⚠Experimentalavx512fp16
Convert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm512_mask_cvt_roundepu32_ph⚠Experimentalavx512fp16
Convert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm512_mask_cvt_roundepu64_ph⚠Experimentalavx512fp16
Convert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm512_mask_cvt_roundpd_ph⚠Experimentalavx512fp16
Convert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm512_mask_cvt_roundph_epi16⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_cvt_roundph_epi32⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_cvt_roundph_epi64⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_cvt_roundph_epu16⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_cvt_roundph_epu32⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_cvt_roundph_epu64⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_cvt_roundph_pd⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm512_mask_cvtepi16_ph⚠Experimentalavx512fp16
Convert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm512_mask_cvtepi32_ph⚠Experimentalavx512fp16
Convert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm512_mask_cvtepi64_ph⚠Experimentalavx512fp16
Convert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm512_mask_cvtepu16_ph⚠Experimentalavx512fp16
Convert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm512_mask_cvtepu32_ph⚠Experimentalavx512fp16
Convert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm512_mask_cvtepu64_ph⚠Experimentalavx512fp16
Convert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm512_mask_cvtpd_ph⚠Experimentalavx512fp16
Convert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm512_mask_cvtph_epi16⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_cvtph_epi32⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_cvtph_epi64⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_cvtph_epu16⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_cvtph_epu32⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_cvtph_epu64⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_cvtph_pd⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm512_mask_cvtt_roundph_epi16⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_cvtt_roundph_epi32⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_cvtt_roundph_epi64⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_cvtt_roundph_epu16⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_cvtt_roundph_epu32⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_cvtt_roundph_epu64⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_cvttph_epi16⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_cvttph_epi32⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_cvttph_epi64⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_cvttph_epu16⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_cvttph_epu32⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_cvttph_epu64⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_cvtx_roundph_ps⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm512_mask_cvtx_roundps_ph⚠Experimentalavx512fp16
Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm512_mask_cvtxph_ps⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm512_mask_cvtxps_ph⚠Experimentalavx512fp16
Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm512_mask_div_ph⚠Experimentalavx512fp16
Divide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_div_round_ph⚠Experimentalavx512fp16
Divide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Rounding is done according to the rounding parameter, which can be one of:
_mm512_mask_fcmadd_pch⚠Experimentalavx512fp16
Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm512_mask_fcmadd_round_pch⚠Experimentalavx512fp16
Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm512_mask_fcmul_pch⚠Experimentalavx512fp16
Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm512_mask_fcmul_round_pch⚠Experimentalavx512fp16
Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm512_mask_fmadd_pch⚠Experimentalavx512fp16
Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm512_mask_fmadd_ph⚠Experimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
_mm512_mask_fmadd_round_pch⚠Experimentalavx512fp16
Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm512_mask_fmadd_round_ph⚠Experimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
_mm512_mask_fmaddsub_ph⚠Experimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
_mm512_mask_fmaddsub_round_ph⚠Experimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
_mm512_mask_fmsub_ph⚠Experimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
_mm512_mask_fmsub_round_ph⚠Experimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
_mm512_mask_fmsubadd_ph⚠Experimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
_mm512_mask_fmsubadd_round_ph⚠Experimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
_mm512_mask_fmul_pch⚠Experimentalavx512fp16
Multiply packed complex numbers in a and b, and store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm512_mask_fmul_round_pch⚠Experimentalavx512fp16
Multiply packed complex numbers in a and b, and store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1]. Rounding is done according to the rounding parameter, which can be one of:
_mm512_mask_fnmadd_ph⚠Experimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
_mm512_mask_fnmadd_round_ph⚠Experimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
_mm512_mask_fnmsub_ph⚠Experimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
_mm512_mask_fnmsub_round_ph⚠Experimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
_mm512_mask_fpclass_ph_mask⚠Experimentalavx512fp16
Test packed half-precision (16-bit) floating-point elements in a for special categories specified by imm8, and store the results in mask vector k using zeromask k (elements are zeroed out when the corresponding mask bit is not set). imm can be a combination of:
_mm512_mask_getexp_ph⚠Experimentalavx512fp16
Convert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision (16-bit) floating-point number representing the integer exponent, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element.
_mm512_mask_getexp_round_ph⚠Experimentalavx512fp16
Convert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision (16-bit) floating-point number representing the integer exponent, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
_mm512_mask_getmant_ph⚠Experimentalavx512fp16
Normalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates Β±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign.
_mm512_mask_getmant_round_ph⚠Experimentalavx512fp16
Normalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates Β±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
_mm512_mask_max_ph⚠Experimentalavx512fp16
Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
_mm512_mask_max_round_ph⚠Experimentalavx512fp16
Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
_mm512_mask_min_ph⚠Experimentalavx512fp16
Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
_mm512_mask_min_round_ph⚠Experimentalavx512fp16
Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
_mm512_mask_mul_pch⚠Experimentalavx512fp16
Multiply packed complex numbers in a and b, and store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm512_mask_mul_ph⚠Experimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_mul_round_pch⚠Experimentalavx512fp16
Multiply the packed complex numbers in a and b, and store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm512_mask_mul_round_ph⚠Experimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Rounding is done according to the rounding parameter, which can be one of:
_mm512_mask_rcp_ph⚠Experimentalavx512fp16
Compute the approximate reciprocal of packed 16-bit floating-point elements in a and stores the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 1.5*2^-12.
_mm512_mask_reduce_ph⚠Experimentalavx512fp16
Extract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_reduce_round_ph⚠Experimentalavx512fp16
Extract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_roundscale_ph⚠Experimentalavx512fp16
Round packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_roundscale_round_ph⚠Experimentalavx512fp16
Round packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
_mm512_mask_rsqrt_ph⚠Experimentalavx512fp16
Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 1.5*2^-12.
_mm512_mask_scalef_ph⚠Experimentalavx512fp16
Scale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_scalef_round_ph⚠Experimentalavx512fp16
Scale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_sqrt_ph⚠Experimentalavx512fp16
Compute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_sqrt_round_ph⚠Experimentalavx512fp16
Compute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Rounding is done according to the rounding parameter, which can be one of:
_mm512_mask_sub_ph⚠Experimentalavx512fp16
Subtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_sub_round_ph⚠Experimentalavx512fp16
Subtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Rounding is done according to the rounding parameter, which can be one of:
_mm512_maskz_add_ph⚠Experimentalavx512fp16
Add packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_add_round_ph⚠Experimentalavx512fp16
Add packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Rounding is done according to the rounding parameter, which can be one of:
_mm512_maskz_cmul_pch⚠Experimentalavx512fp16
Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm512_maskz_cmul_round_pch⚠Experimentalavx512fp16
Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm512_maskz_conj_pch⚠Experimentalavx512fp16
Compute the complex conjugates of complex numbers in a, and store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm512_maskz_cvt_roundepi16_ph⚠Experimentalavx512fp16
Convert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvt_roundepi32_ph⚠Experimentalavx512fp16
Convert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvt_roundepi64_ph⚠Experimentalavx512fp16
Convert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvt_roundepu16_ph⚠Experimentalavx512fp16
Convert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvt_roundepu32_ph⚠Experimentalavx512fp16
Convert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvt_roundepu64_ph⚠Experimentalavx512fp16
Convert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvt_roundpd_ph⚠Experimentalavx512fp16
Convert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvt_roundph_epi16⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvt_roundph_epi32⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvt_roundph_epi64⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvt_roundph_epu16⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvt_roundph_epu32⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvt_roundph_epu64⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvt_roundph_pd⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvtepi16_ph⚠Experimentalavx512fp16
Convert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvtepi32_ph⚠Experimentalavx512fp16
Convert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvtepi64_ph⚠Experimentalavx512fp16
Convert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvtepu16_ph⚠Experimentalavx512fp16
Convert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvtepu32_ph⚠Experimentalavx512fp16
Convert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvtepu64_ph⚠Experimentalavx512fp16
Convert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvtpd_ph⚠Experimentalavx512fp16
Convert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvtph_epi16⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvtph_epi32⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvtph_epi64⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvtph_epu16⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvtph_epu32⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvtph_epu64⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvtph_pd⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvtt_roundph_epi16⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvtt_roundph_epi32⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvtt_roundph_epi64⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvtt_roundph_epu16⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvtt_roundph_epu32⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvtt_roundph_epu64⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvttph_epi16⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvttph_epi32⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvttph_epi64⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvttph_epu16⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvttph_epu32⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvttph_epu64⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvtx_roundph_ps⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvtx_roundps_ph⚠Experimentalavx512fp16
Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvtxph_ps⚠Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvtxps_ph⚠Experimentalavx512fp16
Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_div_ph⚠Experimentalavx512fp16
Divide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_div_round_ph⚠Experimentalavx512fp16
Divide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Rounding is done according to the rounding parameter, which can be one of:
_mm512_maskz_fcmadd_pch⚠Experimentalavx512fp16
Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate to the corresponding complex numbers in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm512_maskz_fcmadd_round_pch⚠Experimentalavx512fp16
Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate to the corresponding complex numbers in c using zeromask k (the element is zeroed out when the corresponding mask bit is not set), and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1, or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm512_maskz_fcmul_pch⚠Experimentalavx512fp16
Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm512_maskz_fcmul_round_pch⚠Experimentalavx512fp16
Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm512_maskz_fmadd_pch⚠Experimentalavx512fp16
Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm512_maskz_fmadd_ph⚠Experimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
_mm512_maskz_fmadd_round_pch⚠Experimentalavx512fp16
Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm512_maskz_fmadd_round_ph⚠Experimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
_mm512_maskz_fmaddsub_ph⚠Experimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
_mm512_maskz_fmaddsub_round_ph⚠Experimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
_mm512_maskz_fmsub_ph⚠Experimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
_mm512_maskz_fmsub_round_ph⚠Experimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
_mm512_maskz_fmsubadd_ph⚠Experimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
_mm512_maskz_fmsubadd_round_ph⚠Experimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
_mm512_maskz_fmul_pch⚠Experimentalavx512fp16
Multiply packed complex numbers in a and b, and store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm512_maskz_fmul_round_pch⚠Experimentalavx512fp16
Multiply packed complex numbers in a and b, and store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1]. Rounding is done according to the rounding parameter, which can be one of:
_mm512_maskz_fnmadd_ph⚠Experimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
_mm512_maskz_fnmadd_round_ph⚠Experimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
_mm512_maskz_fnmsub_ph⚠Experimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
_mm512_maskz_fnmsub_round_ph⚠Experimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
_mm512_maskz_getexp_ph⚠Experimentalavx512fp16
Convert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision (16-bit) floating-point number representing the integer exponent, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element.
_mm512_maskz_getexp_round_ph⚠Experimentalavx512fp16
Convert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision (16-bit) floating-point number representing the integer exponent, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
_mm512_maskz_getmant_ph⚠Experimentalavx512fp16
Normalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates Β±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign.
_mm512_maskz_getmant_round_ph⚠Experimentalavx512fp16
Normalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates Β±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
_mm512_maskz_max_ph⚠Experimentalavx512fp16
Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
_mm512_maskz_max_round_ph⚠Experimentalavx512fp16
Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
_mm512_maskz_min_ph⚠Experimentalavx512fp16
Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
_mm512_maskz_min_round_ph⚠Experimentalavx512fp16
Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
_mm512_maskz_mul_pch⚠Experimentalavx512fp16
Multiply packed complex numbers in a and b, and store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm512_maskz_mul_ph⚠Experimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_mul_round_pch⚠Experimentalavx512fp16
Multiply the packed complex numbers in a and b, and store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm512_maskz_mul_round_ph⚠Experimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Rounding is done according to the rounding parameter, which can be one of:
_mm512_maskz_rcp_ph⚠Experimentalavx512fp16
Compute the approximate reciprocal of packed 16-bit floating-point elements in a and stores the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 1.5*2^-12.
_mm512_maskz_reduce_ph⚠Experimentalavx512fp16
Extract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_reduce_round_ph⚠Experimentalavx512fp16
Extract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_roundscale_ph⚠Experimentalavx512fp16
Round packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_roundscale_round_ph⚠Experimentalavx512fp16
Round packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
_mm512_maskz_rsqrt_ph⚠Experimentalavx512fp16
Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 1.5*2^-12.
_mm512_maskz_scalef_ph⚠Experimentalavx512fp16
Scale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_scalef_round_ph⚠Experimentalavx512fp16
Scale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_sqrt_ph⚠Experimentalavx512fp16
Compute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_sqrt_round_ph⚠Experimentalavx512fp16
Compute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Rounding is done according to the rounding parameter, which can be one of:
_mm512_maskz_sub_ph⚠Experimentalavx512fp16
Subtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_sub_round_ph⚠Experimentalavx512fp16
Subtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Rounding is done according to the rounding parameter, which can be one of:
_mm512_max_ph⚠Experimentalavx512fp16
Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
_mm512_max_round_ph⚠Experimentalavx512fp16
Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
_mm512_min_ph⚠Experimentalavx512fp16
Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
_mm512_min_round_ph⚠Experimentalavx512fp16
Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
_mm512_mul_pch⚠Experimentalavx512fp16
Multiply packed complex numbers in a and b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm512_mul_ph⚠Experimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst.
_mm512_mul_round_pch⚠Experimentalavx512fp16
Multiply the packed complex numbers in a and b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm512_mul_round_ph⚠Experimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst. Rounding is done according to the rounding parameter, which can be one of:
_mm512_permutex2var_ph⚠Experimentalavx512fp16
Shuffle half-precision (16-bit) floating-point elements in a and b using the corresponding selector and index in idx, and store the results in dst.
_mm512_permutexvar_ph⚠Experimentalavx512fp16
Shuffle half-precision (16-bit) floating-point elements in a using the corresponding index in idx, and store the results in dst.
_mm512_rcp_ph⚠Experimentalavx512fp16
Compute the approximate reciprocal of packed 16-bit floating-point elements in a and stores the results in dst. The maximum relative error for this approximation is less than 1.5*2^-12.
_mm512_reduce_add_ph⚠Experimentalavx512fp16
Reduce the packed half-precision (16-bit) floating-point elements in a by addition. Returns the sum of all elements in a.
_mm512_reduce_max_ph⚠Experimentalavx512fp16
Reduce the packed half-precision (16-bit) floating-point elements in a by maximum. Returns the maximum of all elements in a.
_mm512_reduce_min_ph⚠Experimentalavx512fp16
Reduce the packed half-precision (16-bit) floating-point elements in a by minimum. Returns the minimum of all elements in a.
_mm512_reduce_mul_ph⚠Experimentalavx512fp16
Reduce the packed half-precision (16-bit) floating-point elements in a by multiplication. Returns the product of all elements in a.
_mm512_reduce_ph⚠Experimentalavx512fp16
Extract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst.
_mm512_reduce_round_ph⚠Experimentalavx512fp16
Extract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst.
_mm512_roundscale_ph⚠Experimentalavx512fp16
Round packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst.
_mm512_roundscale_round_ph⚠Experimentalavx512fp16
Round packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
_mm512_rsqrt_ph⚠Experimentalavx512fp16
Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst. The maximum relative error for this approximation is less than 1.5*2^-12.
_mm512_scalef_ph⚠Experimentalavx512fp16
Scale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst.
_mm512_scalef_round_ph⚠Experimentalavx512fp16
Scale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst.
_mm512_set1_ph⚠Experimentalavx512fp16
Broadcast the half-precision (16-bit) floating-point value a to all elements of dst.
_mm512_set_ph⚠Experimentalavx512fp16
Set packed half-precision (16-bit) floating-point elements in dst with the supplied values.
_mm512_setr_ph⚠Experimentalavx512fp16
Set packed half-precision (16-bit) floating-point elements in dst with the supplied values in reverse order.
_mm512_setzero_ph⚠Experimentalavx512fp16
Return vector of type __m512h with all elements set to zero.
_mm512_sqrt_ph⚠Experimentalavx512fp16
Compute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst.
_mm512_sqrt_round_ph⚠Experimentalavx512fp16
Compute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst. Rounding is done according to the rounding parameter, which can be one of:
_mm512_store_ph⚠Experimentalavx512fp16
Store 512-bits (composed of 32 packed half-precision (16-bit) floating-point elements) from a into memory. The address must be aligned to 64 bytes or a general-protection exception may be generated.
_mm512_storeu_ph⚠Experimentalavx512fp16
Store 512-bits (composed of 32 packed half-precision (16-bit) floating-point elements) from a into memory. The address does not need to be aligned to any particular boundary.
_mm512_sub_ph⚠Experimentalavx512fp16
Subtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst.
_mm512_sub_round_ph⚠Experimentalavx512fp16
Subtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst. Rounding is done according to the rounding parameter, which can be one of:
_mm512_undefined_ph⚠Experimentalavx512fp16
Return vector of type __m512h with indetermination elements. Despite using the word β€œundefined” (following Intel’s naming scheme), this non-deterministically picks some valid value and is not equivalent to mem::MaybeUninit. In practice, this is typically equivalent to mem::zeroed.
_mm512_zextph128_ph512⚠Experimentalavx512fp16
Cast vector of type __m128h to type __m512h. The upper 24 elements of the result are zeroed. This intrinsic can generate the vzeroupper instruction, but most of the time it does not generate any instructions.
_mm512_zextph256_ph512⚠Experimentalavx512fp16
Cast vector of type __m256h to type __m512h. The upper 16 elements of the result are zeroed. This intrinsic can generate the vzeroupper instruction, but most of the time it does not generate any instructions.
_mm_abs_ph⚠Experimentalavx512fp16,avx512vl
Finds the absolute value of each packed half-precision (16-bit) floating-point element in v2, storing the results in dst.
_mm_add_ph⚠Experimentalavx512fp16,avx512vl
Add packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst.
_mm_add_round_sh⚠Experimentalavx512fp16
Add the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. Rounding is done according to the rounding parameter, which can be one of:
_mm_add_sh⚠Experimentalavx512fp16
Add the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_castpd_ph⚠Experimentalavx512fp16
Cast vector of type __m128d to type __m128h. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
_mm_castph_pd⚠Experimentalavx512fp16
Cast vector of type __m128h to type __m128d. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
_mm_castph_ps⚠Experimentalavx512fp16
Cast vector of type __m128h to type __m128. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
_mm_castph_si128⚠Experimentalavx512fp16
Cast vector of type __m128h to type __m128i. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
_mm_castps_ph⚠Experimentalavx512fp16
Cast vector of type __m128 to type __m128h. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
_mm_castsi128_ph⚠Experimentalavx512fp16
Cast vector of type __m128i to type __m128h. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
_mm_cmp_ph_mask⚠Experimentalavx512fp16,avx512vl
Compare packed half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
_mm_cmp_round_sh_mask⚠Experimentalavx512fp16
Compare the lower half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the result in mask vector k. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
_mm_cmp_sh_mask⚠Experimentalavx512fp16
Compare the lower half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the result in mask vector k.
_mm_cmul_pch⚠Experimentalavx512fp16,avx512vl
Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_cmul_round_sch⚠Experimentalavx512fp16
Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1],
_mm_cmul_sch⚠Experimentalavx512fp16
Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1],
_mm_comi_round_sh⚠Experimentalavx512fp16
Compare the lower half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and return the boolean result (0 or 1). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
_mm_comi_sh⚠Experimentalavx512fp16
Compare the lower half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and return the boolean result (0 or 1).
_mm_comieq_sh⚠Experimentalavx512fp16
Compare the lower half-precision (16-bit) floating-point elements in a and b for equality, and return the boolean result (0 or 1).
_mm_comige_sh⚠Experimentalavx512fp16
Compare the lower half-precision (16-bit) floating-point elements in a and b for greater-than-or-equal, and return the boolean result (0 or 1).
_mm_comigt_sh⚠Experimentalavx512fp16
Compare the lower half-precision (16-bit) floating-point elements in a and b for greater-than, and return the boolean result (0 or 1).
_mm_comile_sh⚠Experimentalavx512fp16
Compare the lower half-precision (16-bit) floating-point elements in a and b for less-than-or-equal, and return the boolean result (0 or 1).
_mm_comilt_sh⚠Experimentalavx512fp16
Compare the lower half-precision (16-bit) floating-point elements in a and b for less-than, and return the boolean result (0 or 1).
_mm_comineq_sh⚠Experimentalavx512fp16
Compare the lower half-precision (16-bit) floating-point elements in a and b for not-equal, and return the boolean result (0 or 1).
_mm_conj_pch⚠Experimentalavx512fp16,avx512vl
Compute the complex conjugates of complex numbers in a, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_cvt_roundi32_sh⚠Experimentalavx512fp16
Convert the signed 32-bit integer b to a half-precision (16-bit) floating-point element, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_cvt_roundsd_sh⚠Experimentalavx512fp16
Convert the lower double-precision (64-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_cvt_roundsh_i32⚠Experimentalavx512fp16
Convert the lower half-precision (16-bit) floating-point element in a to a 32-bit integer, and store the result in dst.
_mm_cvt_roundsh_sd⚠Experimentalavx512fp16
Convert the lower half-precision (16-bit) floating-point element in b to a double-precision (64-bit) floating-point element, store the result in the lower element of dst, and copy the upper element from a to the upper element of dst.
_mm_cvt_roundsh_ss⚠Experimentalavx512fp16
Convert the lower half-precision (16-bit) floating-point element in b to a single-precision (32-bit) floating-point element, store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst.
_mm_cvt_roundsh_u32⚠Experimentalavx512fp16
Convert the lower half-precision (16-bit) floating-point element in a to a 32-bit unsigned integer, and store the result in dst.
_mm_cvt_roundss_sh⚠Experimentalavx512fp16
Convert the lower single-precision (32-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_cvt_roundu32_sh⚠Experimentalavx512fp16
Convert the unsigned 32-bit integer b to a half-precision (16-bit) floating-point element, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_cvtepi16_ph⚠Experimentalavx512fp16,avx512vl
Convert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
_mm_cvtepi32_ph⚠Experimentalavx512fp16,avx512vl
Convert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst. The upper 64 bits of dst are zeroed out.
_mm_cvtepi64_ph⚠Experimentalavx512fp16,avx512vl
Convert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst. The upper 96 bits of dst are zeroed out.
_mm_cvtepu16_ph⚠Experimentalavx512fp16,avx512vl
Convert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
_mm_cvtepu32_ph⚠Experimentalavx512fp16,avx512vl
Convert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst. The upper 64 bits of dst are zeroed out.
_mm_cvtepu64_ph⚠Experimentalavx512fp16,avx512vl
Convert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst. The upper 96 bits of dst are zeroed out.
_mm_cvti32_sh⚠Experimentalavx512fp16
Convert the signed 32-bit integer b to a half-precision (16-bit) floating-point element, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_cvtpd_ph⚠Experimentalavx512fp16,avx512vl
Convert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst. The upper 96 bits of dst are zeroed out.
_mm_cvtph_epi16⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst.
_mm_cvtph_epi32⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst.
_mm_cvtph_epi64⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst.
_mm_cvtph_epu16⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst.
_mm_cvtph_epu32⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst.
_mm_cvtph_epu64⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst.
_mm_cvtph_pd⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst.
_mm_cvtsd_sh⚠Experimentalavx512fp16
Convert the lower double-precision (64-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_cvtsh_h⚠Experimentalavx512fp16
Copy the lower half-precision (16-bit) floating-point element from a to dst.
_mm_cvtsh_i32⚠Experimentalavx512fp16
Convert the lower half-precision (16-bit) floating-point element in a to a 32-bit integer, and store the result in dst.
_mm_cvtsh_sd⚠Experimentalavx512fp16
Convert the lower half-precision (16-bit) floating-point element in b to a double-precision (64-bit) floating-point element, store the result in the lower element of dst, and copy the upper element from a to the upper element of dst.
_mm_cvtsh_ss⚠Experimentalavx512fp16
Convert the lower half-precision (16-bit) floating-point element in b to a single-precision (32-bit) floating-point element, store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst.
_mm_cvtsh_u32⚠Experimentalavx512fp16
Convert the lower half-precision (16-bit) floating-point element in a to a 32-bit unsigned integer, and store the result in dst.
_mm_cvtsi16_si128⚠Experimentalavx512fp16
Copy 16-bit integer a to the lower elements of dst, and zero the upper elements of dst.
_mm_cvtsi128_si16⚠Experimentalavx512fp16
Copy the lower 16-bit integer in a to dst.
_mm_cvtss_sh⚠Experimentalavx512fp16
Convert the lower single-precision (32-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_cvtt_roundsh_i32⚠Experimentalavx512fp16
Convert the lower half-precision (16-bit) floating-point element in a to a 32-bit integer with truncation, and store the result in dst.
_mm_cvtt_roundsh_u32⚠Experimentalavx512fp16
Convert the lower half-precision (16-bit) floating-point element in a to a 32-bit unsigned integer with truncation, and store the result in dst.
_mm_cvttph_epi16⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst.
_mm_cvttph_epi32⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst.
_mm_cvttph_epi64⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst.
_mm_cvttph_epu16⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst.
_mm_cvttph_epu32⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst.
_mm_cvttph_epu64⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst.
_mm_cvttsh_i32⚠Experimentalavx512fp16
Convert the lower half-precision (16-bit) floating-point element in a to a 32-bit integer with truncation, and store the result in dst.
_mm_cvttsh_u32⚠Experimentalavx512fp16
Convert the lower half-precision (16-bit) floating-point element in a to a 32-bit unsigned integer with truncation, and store the result in dst.
_mm_cvtu32_sh⚠Experimentalavx512fp16
Convert the unsigned 32-bit integer b to a half-precision (16-bit) floating-point element, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_cvtxph_ps⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst.
_mm_cvtxps_ph⚠Experimentalavx512fp16,avx512vl
Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
_mm_div_ph⚠Experimentalavx512fp16,avx512vl
Divide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst.
_mm_div_round_sh⚠Experimentalavx512fp16
Divide the lower half-precision (16-bit) floating-point elements in a by b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. Rounding is done according to the rounding parameter, which can be one of:
_mm_div_sh⚠Experimentalavx512fp16
Divide the lower half-precision (16-bit) floating-point elements in a by b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_fcmadd_pch⚠Experimentalavx512fp16,avx512vl
Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate to the corresponding complex numbers in c, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_fcmadd_round_sch⚠Experimentalavx512fp16
Multiply the lower complex number in a by the complex conjugate of the lower complex number in b, accumulate to the lower complex number in c, and store the result in the lower elements of dst, and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_fcmadd_sch⚠Experimentalavx512fp16
Multiply the lower complex number in a by the complex conjugate of the lower complex number in b, accumulate to the lower complex number in c, and store the result in the lower elements of dst, and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_fcmul_pch⚠Experimentalavx512fp16,avx512vl
Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_fcmul_round_sch⚠Experimentalavx512fp16
Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1],
_mm_fcmul_sch⚠Experimentalavx512fp16
Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_fmadd_pch⚠Experimentalavx512fp16,avx512vl
Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_fmadd_ph⚠Experimentalavx512fp16,avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst.
_mm_fmadd_round_sch⚠Experimentalavx512fp16
Multiply the lower complex numbers in a and b, accumulate to the lower complex number in c, and store the result in the lower elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_fmadd_round_sh⚠Experimentalavx512fp16
Multiply the lower half-precision (16-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_fmadd_sch⚠Experimentalavx512fp16
Multiply the lower complex numbers in a and b, accumulate to the lower complex number in c, and store the result in the lower elements of dst, and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_fmadd_sh⚠Experimentalavx512fp16
Multiply the lower half-precision (16-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_fmaddsub_ph⚠Experimentalavx512fp16,avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst.
_mm_fmsub_ph⚠Experimentalavx512fp16,avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst.
_mm_fmsub_round_sh⚠Experimentalavx512fp16
Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract packed elements in c from the intermediate result. Store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_fmsub_sh⚠Experimentalavx512fp16
Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract packed elements in c from the intermediate result. Store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_fmsubadd_ph⚠Experimentalavx512fp16,avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst.
_mm_fmul_pch⚠Experimentalavx512fp16,avx512vl
Multiply packed complex numbers in a and b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_fmul_round_sch⚠Experimentalavx512fp16
Multiply the lower complex numbers in a and b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_fmul_sch⚠Experimentalavx512fp16
Multiply the lower complex numbers in a and b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_fnmadd_ph⚠Experimentalavx512fp16,avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst.
_mm_fnmadd_round_sh⚠Experimentalavx512fp16
Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_fnmadd_sh⚠Experimentalavx512fp16
Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_fnmsub_ph⚠Experimentalavx512fp16,avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst.
_mm_fnmsub_round_sh⚠Experimentalavx512fp16
Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_fnmsub_sh⚠Experimentalavx512fp16
Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_fpclass_ph_mask⚠Experimentalavx512fp16,avx512vl
Test packed half-precision (16-bit) floating-point elements in a for special categories specified by imm8, and store the results in mask vector k. imm can be a combination of:
_mm_fpclass_sh_mask⚠Experimentalavx512fp16
Test the lower half-precision (16-bit) floating-point element in a for special categories specified by imm8, and store the result in mask vector k. imm can be a combination of:
_mm_getexp_ph⚠Experimentalavx512fp16,avx512vl
Convert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision (16-bit) floating-point number representing the integer exponent, and store the results in dst. This intrinsic essentially calculates floor(log2(x)) for each element.
_mm_getexp_round_sh⚠Experimentalavx512fp16
Convert the exponent of the lower half-precision (16-bit) floating-point element in b to a half-precision (16-bit) floating-point number representing the integer exponent, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. This intrinsic essentially calculates floor(log2(x)) for the lower element. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
_mm_getexp_sh⚠Experimentalavx512fp16
Convert the exponent of the lower half-precision (16-bit) floating-point element in b to a half-precision (16-bit) floating-point number representing the integer exponent, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. This intrinsic essentially calculates floor(log2(x)) for the lower element.
_mm_getmant_ph⚠Experimentalavx512fp16,avx512vl
Normalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store the results in dst. This intrinsic essentially calculates Β±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign.
_mm_getmant_round_sh⚠Experimentalavx512fp16
Normalize the mantissas of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. This intrinsic essentially calculates Β±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
_mm_getmant_sh⚠Experimentalavx512fp16
Normalize the mantissas of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. This intrinsic essentially calculates Β±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign.
_mm_load_ph⚠Experimentalavx512fp16,avx512vl
Load 128-bits (composed of 8 packed half-precision (16-bit) floating-point elements) from memory into a new vector. The address must be aligned to 16 bytes or a general-protection exception may be generated.
_mm_load_sh⚠Experimentalavx512fp16
Load a half-precision (16-bit) floating-point element from memory into the lower element of a new vector, and zero the upper elements
_mm_loadu_ph⚠Experimentalavx512fp16,avx512vl
Load 128-bits (composed of 8 packed half-precision (16-bit) floating-point elements) from memory into a new vector. The address does not need to be aligned to any particular boundary.
_mm_mask3_fcmadd_pch⚠Experimentalavx512fp16,avx512vl
Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_mask3_fcmadd_round_sch⚠Experimentalavx512fp16
Multiply the lower complex number in a by the complex conjugate of the lower complex number in b, accumulate to the lower complex number in c, and store the result in the lower elements of dst using writemask k (the element is copied from c when the corresponding mask bit is not set), and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_mask3_fcmadd_sch⚠Experimentalavx512fp16
Multiply the lower complex number in a by the complex conjugate of the lower complex number in b, accumulate to the lower complex number in c, and store the result in the lower elements of dst using writemask k (the element is copied from c when the corresponding mask bit is not set), and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_mask3_fmadd_pch⚠Experimentalavx512fp16,avx512vl
Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_mask3_fmadd_ph⚠Experimentalavx512fp16,avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
_mm_mask3_fmadd_round_sch⚠Experimentalavx512fp16
Multiply the lower complex numbers in a and b, accumulate to the lower complex number in c, and store the result in the lower elements of dst using writemask k (elements are copied from c when mask bit 0 is not set), and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_mask3_fmadd_round_sh⚠Experimentalavx512fp16
Multiply the lower half-precision (16-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from c when the mask bit 0 is not set), and copy the upper 7 packed elements from c to the upper elements of dst.
_mm_mask3_fmadd_sch⚠Experimentalavx512fp16
Multiply the lower complex numbers in a and b, accumulate to the lower complex number in c, and store the result in the lower elements of dst using writemask k (elements are copied from c when mask bit 0 is not set), and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_mask3_fmadd_sh⚠Experimentalavx512fp16
Multiply the lower half-precision (16-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from c when the mask bit 0 is not set), and copy the upper 7 packed elements from c to the upper elements of dst.
_mm_mask3_fmaddsub_ph⚠Experimentalavx512fp16,avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
_mm_mask3_fmsub_ph⚠Experimentalavx512fp16,avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
_mm_mask3_fmsub_round_sh⚠Experimentalavx512fp16
Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract packed elements in c from the intermediate result. Store the result in the lower element of dst using writemask k (the element is copied from c when the mask bit 0 is not set), and copy the upper 7 packed elements from c to the upper elements of dst.
_mm_mask3_fmsub_sh⚠Experimentalavx512fp16
Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract packed elements in c from the intermediate result. Store the result in the lower element of dst using writemask k (the element is copied from c when the mask bit 0 is not set), and copy the upper 7 packed elements from c to the upper elements of dst.
_mm_mask3_fmsubadd_ph⚠Experimentalavx512fp16,avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
_mm_mask3_fnmadd_ph⚠Experimentalavx512fp16,avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
_mm_mask3_fnmadd_round_sh⚠Experimentalavx512fp16
Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from c when the mask bit 0 is not set), and copy the upper 7 packed elements from c to the upper elements of dst.
_mm_mask3_fnmadd_sh⚠Experimentalavx512fp16
Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from c when the mask bit 0 is not set), and copy the upper 7 packed elements from c to the upper elements of dst.
_mm_mask3_fnmsub_ph⚠Experimentalavx512fp16,avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
_mm_mask3_fnmsub_round_sh⚠Experimentalavx512fp16
Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from c when the mask bit 0 is not set), and copy the upper 7 packed elements from c to the upper elements of dst.
_mm_mask3_fnmsub_sh⚠Experimentalavx512fp16
Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from c when the mask bit 0 is not set), and copy the upper 7 packed elements from c to the upper elements of dst.
_mm_mask_add_ph⚠Experimentalavx512fp16,avx512vl
Add packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_mask_add_round_sh⚠Experimentalavx512fp16
Add the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using writemask k (the element is copied from src when mask bit 0 is not set). Rounding is done according to the rounding parameter, which can be one of:
_mm_mask_add_sh⚠Experimentalavx512fp16
Add the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using writemask k (the element is copied from src when mask bit 0 is not set).
_mm_mask_blend_ph⚠Experimentalavx512fp16,avx512vl
Blend packed half-precision (16-bit) floating-point elements from a and b using control mask k, and store the results in dst.
_mm_mask_cmp_ph_mask⚠Experimentalavx512fp16,avx512vl
Compare packed half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_mask_cmp_round_sh_mask⚠Experimentalavx512fp16
Compare the lower half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the result in mask vector k using zeromask k1. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
_mm_mask_cmp_sh_mask⚠Experimentalavx512fp16
Compare the lower half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the result in mask vector k using zeromask k1.
_mm_mask_cmul_pch⚠Experimentalavx512fp16,avx512vl
Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_mask_cmul_round_sch⚠Experimentalavx512fp16
Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b, and store the results in dst using writemask k (the element is copied from src when mask bit 0 is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_mask_cmul_sch⚠Experimentalavx512fp16
Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b, and store the results in dst using writemask k (the element is copied from src when mask bit 0 is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1],
_mm_mask_conj_pch⚠Experimentalavx512fp16,avx512vl
Compute the complex conjugates of complex numbers in a, and store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_mask_cvt_roundsd_sh⚠Experimentalavx512fp16
Convert the lower double-precision (64-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst using writemask k (the element if copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_mask_cvt_roundsh_sd⚠Experimentalavx512fp16
Convert the lower half-precision (16-bit) floating-point element in b to a double-precision (64-bit) floating-point element, store the result in the lower element of dst using writemask k (the element is copied from src to dst when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
_mm_mask_cvt_roundsh_ss⚠Experimentalavx512fp16
Convert the lower half-precision (16-bit) floating-point element in b to a single-precision (32-bit) floating-point element, store the result in the lower element of dst using writemask k (the element is copied from src to dst when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
_mm_mask_cvt_roundss_sh⚠Experimentalavx512fp16
Convert the lower single-precision (32-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst using writemask k (the element if copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_mask_cvtepi16_ph⚠Experimentalavx512fp16,avx512vl
Convert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm_mask_cvtepi32_ph⚠Experimentalavx512fp16,avx512vl
Convert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
_mm_mask_cvtepi64_ph⚠Experimentalavx512fp16,avx512vl
Convert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set). The upper 96 bits of dst are zeroed out.
_mm_mask_cvtepu16_ph⚠Experimentalavx512fp16,avx512vl
Convert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm_mask_cvtepu32_ph⚠Experimentalavx512fp16,avx512vl
Convert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
_mm_mask_cvtepu64_ph⚠Experimentalavx512fp16,avx512vl
Convert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set). The upper 96 bits of dst are zeroed out.
_mm_mask_cvtpd_ph⚠Experimentalavx512fp16,avx512vl
Convert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set). The upper 96 bits of dst are zeroed out.
_mm_mask_cvtph_epi16⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_mask_cvtph_epi32⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_mask_cvtph_epi64⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_mask_cvtph_epu16⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_mask_cvtph_epu32⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_mask_cvtph_epu64⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_mask_cvtph_pd⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm_mask_cvtsd_sh⚠Experimentalavx512fp16
Convert the lower double-precision (64-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst using writemask k (the element if copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_mask_cvtsh_sd⚠Experimentalavx512fp16
Convert the lower half-precision (16-bit) floating-point element in b to a double-precision (64-bit) floating-point element, store the result in the lower element of dst using writemask k (the element is copied from src to dst when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
_mm_mask_cvtsh_ss⚠Experimentalavx512fp16
Convert the lower half-precision (16-bit) floating-point element in b to a single-precision (32-bit) floating-point element, store the result in the lower element of dst using writemask k (the element is copied from src to dst when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
_mm_mask_cvtss_sh⚠Experimentalavx512fp16
Convert the lower single-precision (32-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst using writemask k (the element if copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_mask_cvttph_epi16⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_mask_cvttph_epi32⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_mask_cvttph_epi64⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_mask_cvttph_epu16⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_mask_cvttph_epu32⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_mask_cvttph_epu64⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_mask_cvtxph_ps⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm_mask_cvtxps_ph⚠Experimentalavx512fp16,avx512vl
Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
_mm_mask_div_ph⚠Experimentalavx512fp16,avx512vl
Divide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_mask_div_round_sh⚠Experimentalavx512fp16
Divide the lower half-precision (16-bit) floating-point elements in a by b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using writemask k (the element is copied from src when mask bit 0 is not set). Rounding is done according to the rounding parameter, which can be one of:
_mm_mask_div_sh⚠Experimentalavx512fp16
Divide the lower half-precision (16-bit) floating-point elements in a by b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using writemask k (the element is copied from src when mask bit 0 is not set).
_mm_mask_fcmadd_pch⚠Experimentalavx512fp16,avx512vl
Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_mask_fcmadd_round_sch⚠Experimentalavx512fp16
Multiply the lower complex number in a by the complex conjugate of the lower complex number in b, accumulate to the lower complex number in c, and store the result in the lower elements of dst using writemask k (the element is copied from a when the corresponding mask bit is not set), and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_mask_fcmadd_sch⚠Experimentalavx512fp16
Multiply the lower complex number in a by the complex conjugate of the lower complex number in b, accumulate to the lower complex number in c, and store the result in the lower elements of dst using writemask k (the element is copied from a when the corresponding mask bit is not set), and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_mask_fcmul_pch⚠Experimentalavx512fp16,avx512vl
Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_mask_fcmul_round_sch⚠Experimentalavx512fp16
Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b, and store the results in dst using writemask k (the element is copied from src when mask bit 0 is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_mask_fcmul_sch⚠Experimentalavx512fp16
Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b, and store the results in dst using writemask k (the element is copied from src when mask bit 0 is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_mask_fmadd_pch⚠Experimentalavx512fp16,avx512vl
Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_mask_fmadd_ph⚠Experimentalavx512fp16,avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
_mm_mask_fmadd_round_sch⚠Experimentalavx512fp16
Multiply the lower complex numbers in a and b, accumulate to the lower complex number in c, and store the result in the lower elements of dst using writemask k (elements are copied from a when mask bit 0 is not set), and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_mask_fmadd_round_sh⚠Experimentalavx512fp16
Multiply the lower half-precision (16-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from a when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_mask_fmadd_sch⚠Experimentalavx512fp16
Multiply the lower complex numbers in a and b, accumulate to the lower complex number in c, and store the result in the lower elements of dst using writemask k (elements are copied from a when mask bit 0 is not set), and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_mask_fmadd_sh⚠Experimentalavx512fp16
Multiply the lower half-precision (16-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from a when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_mask_fmaddsub_ph⚠Experimentalavx512fp16,avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
_mm_mask_fmsub_ph⚠Experimentalavx512fp16,avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
_mm_mask_fmsub_round_sh⚠Experimentalavx512fp16
Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract packed elements in c from the intermediate result. Store the result in the lower element of dst using writemask k (the element is copied from a when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_mask_fmsub_sh⚠Experimentalavx512fp16
Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract packed elements in c from the intermediate result. Store the result in the lower element of dst using writemask k (the element is copied from a when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_mask_fmsubadd_ph⚠Experimentalavx512fp16,avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
_mm_mask_fmul_pch⚠Experimentalavx512fp16,avx512vl
Multiply packed complex numbers in a and b, and store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_mask_fmul_round_sch⚠Experimentalavx512fp16
Multiply the lower complex numbers in a and b, and store the results in dst using writemask k (the element is copied from src when mask bit 0 is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_mask_fmul_sch⚠Experimentalavx512fp16
Multiply the lower complex numbers in a and b, and store the results in dst using writemask k (the element is copied from src when mask bit 0 is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_mask_fnmadd_ph⚠Experimentalavx512fp16,avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
_mm_mask_fnmadd_round_sh⚠Experimentalavx512fp16
Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from a when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_mask_fnmadd_sh⚠Experimentalavx512fp16
Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from a when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_mask_fnmsub_ph⚠Experimentalavx512fp16,avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
_mm_mask_fnmsub_round_sh⚠Experimentalavx512fp16
Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from a when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_mask_fnmsub_sh⚠Experimentalavx512fp16
Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from a when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_mask_fpclass_ph_mask⚠Experimentalavx512fp16,avx512vl
Test packed half-precision (16-bit) floating-point elements in a for special categories specified by imm8, and store the results in mask vector k using zeromask k (elements are zeroed out when the corresponding mask bit is not set). imm can be a combination of:
_mm_mask_fpclass_sh_mask⚠Experimentalavx512fp16
Test the lower half-precision (16-bit) floating-point element in a for special categories specified by imm8, and store the result in mask vector k using zeromask k (elements are zeroed out when the corresponding mask bit is not set). imm can be a combination of:
_mm_mask_getexp_ph⚠Experimentalavx512fp16,avx512vl
Convert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision (16-bit) floating-point number representing the integer exponent, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element.
_mm_mask_getexp_round_sh⚠Experimentalavx512fp16
Convert the exponent of the lower half-precision (16-bit) floating-point element in b to a half-precision (16-bit) floating-point number representing the integer exponent, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. This intrinsic essentially calculates floor(log2(x)) for the lower element. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
_mm_mask_getexp_sh⚠Experimentalavx512fp16
Convert the exponent of the lower half-precision (16-bit) floating-point element in b to a half-precision (16-bit) floating-point number representing the integer exponent, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. This intrinsic essentially calculates floor(log2(x)) for the lower element.
_mm_mask_getmant_ph⚠Experimentalavx512fp16,avx512vl
Normalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates Β±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign.
_mm_mask_getmant_round_sh⚠Experimentalavx512fp16
Normalize the mantissas of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. This intrinsic essentially calculates Β±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
_mm_mask_getmant_sh⚠Experimentalavx512fp16
Normalize the mantissas of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. This intrinsic essentially calculates Β±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign.
_mm_mask_load_sh⚠Experimentalavx512fp16
Load a half-precision (16-bit) floating-point element from memory into the lower element of a new vector using writemask k (the element is copied from src when mask bit 0 is not set), and zero the upper elements.
_mm_mask_max_ph⚠Experimentalavx512fp16,avx512vl
Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
_mm_mask_max_round_sh⚠Experimentalavx512fp16,avx512vl
Compare the lower half-precision (16-bit) floating-point elements in a and b, store the maximum value in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
_mm_mask_max_sh⚠Experimentalavx512fp16,avx512vl
Compare the lower half-precision (16-bit) floating-point elements in a and b, store the maximum value in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
_mm_mask_min_ph⚠Experimentalavx512fp16,avx512vl
Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
_mm_mask_min_round_sh⚠Experimentalavx512fp16,avx512vl
Compare the lower half-precision (16-bit) floating-point elements in a and b, store the minimum value in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
_mm_mask_min_sh⚠Experimentalavx512fp16,avx512vl
Compare the lower half-precision (16-bit) floating-point elements in a and b, store the minimum value in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
_mm_mask_move_sh⚠Experimentalavx512fp16
Move the lower half-precision (16-bit) floating-point element from b to the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_mask_mul_pch⚠Experimentalavx512fp16,avx512vl
Multiply packed complex numbers in a and b, and store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_mask_mul_ph⚠Experimentalavx512fp16,avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_mask_mul_round_sch⚠Experimentalavx512fp16
Multiply the lower complex numbers in a and b, and store the result in the lower elements of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_mask_mul_round_sh⚠Experimentalavx512fp16
Multiply the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using writemask k (the element is copied from src when mask bit 0 is not set). Rounding is done according to the rounding parameter, which can be one of:
_mm_mask_mul_sch⚠Experimentalavx512fp16
Multiply the lower complex numbers in a and b, and store the result in the lower elements of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_mask_mul_sh⚠Experimentalavx512fp16
Multiply the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using writemask k (the element is copied from src when mask bit 0 is not set).
_mm_mask_rcp_ph⚠Experimentalavx512fp16,avx512vl
Compute the approximate reciprocal of packed 16-bit floating-point elements in a and stores the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 1.5*2^-12.
_mm_mask_rcp_sh⚠Experimentalavx512fp16
Compute the approximate reciprocal of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. The maximum relative error for this approximation is less than 1.5*2^-12.
_mm_mask_reduce_ph⚠Experimentalavx512fp16,avx512vl
Extract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_mask_reduce_round_sh⚠Experimentalavx512fp16
Extract the reduced argument of the lower half-precision (16-bit) floating-point element in b by the number of bits specified by imm8, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_mask_reduce_sh⚠Experimentalavx512fp16
Extract the reduced argument of the lower half-precision (16-bit) floating-point element in b by the number of bits specified by imm8, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_mask_roundscale_ph⚠Experimentalavx512fp16,avx512vl
Round packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_mask_roundscale_round_sh⚠Experimentalavx512fp16
Round the lower half-precision (16-bit) floating-point element in b to the number of fraction bits specified by imm8, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_mask_roundscale_sh⚠Experimentalavx512fp16
Round the lower half-precision (16-bit) floating-point element in b to the number of fraction bits specified by imm8, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_mask_rsqrt_ph⚠Experimentalavx512fp16,avx512vl
Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 1.5*2^-12.
_mm_mask_rsqrt_sh⚠Experimentalavx512fp16
Compute the approximate reciprocal square root of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. The maximum relative error for this approximation is less than 1.5*2^-12.
_mm_mask_scalef_ph⚠Experimentalavx512fp16,avx512vl
Scale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_mask_scalef_round_sh⚠Experimentalavx512fp16
Scale the packed single-precision (32-bit) floating-point elements in a using values from b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_mask_scalef_sh⚠Experimentalavx512fp16
Scale the packed single-precision (32-bit) floating-point elements in a using values from b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_mask_sqrt_ph⚠Experimentalavx512fp16,avx512vl
Compute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_mask_sqrt_round_sh⚠Experimentalavx512fp16
Compute the square root of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. Rounding is done according to the rounding parameter, which can be one of:
_mm_mask_sqrt_sh⚠Experimentalavx512fp16
Compute the square root of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_mask_store_sh⚠Experimentalavx512fp16
Store the lower half-precision (16-bit) floating-point element from a into memory using writemask k
_mm_mask_sub_ph⚠Experimentalavx512fp16,avx512vl
Subtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_mask_sub_round_sh⚠Experimentalavx512fp16
Subtract the lower half-precision (16-bit) floating-point elements in b from a, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using writemask k (the element is copied from src when mask bit 0 is not set). Rounding is done according to the rounding parameter, which can be one of:
_mm_mask_sub_sh⚠Experimentalavx512fp16
Subtract the lower half-precision (16-bit) floating-point elements in b from a, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using writemask k (the element is copied from src when mask bit 0 is not set).
_mm_maskz_add_ph⚠Experimentalavx512fp16,avx512vl
Add packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_maskz_add_round_sh⚠Experimentalavx512fp16
Add the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using zeromask k (the element is zeroed out when mask bit 0 is not set). Rounding is done according to the rounding parameter, which can be one of:
_mm_maskz_add_sh⚠Experimentalavx512fp16
Add the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using zeromask k (the element is zeroed out when mask bit 0 is not set).
_mm_maskz_cmul_pch⚠Experimentalavx512fp16,avx512vl
Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_maskz_cmul_round_sch⚠Experimentalavx512fp16
Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b, and store the results in dst using zeromask k (the element is zeroed out when mask bit 0 is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_maskz_cmul_sch⚠Experimentalavx512fp16
Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b, and store the results in dst using zeromask k (the element is zeroed out when mask bit 0 is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1],
_mm_maskz_conj_pch⚠Experimentalavx512fp16,avx512vl
Compute the complex conjugates of complex numbers in a, and store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_maskz_cvt_roundsd_sh⚠Experimentalavx512fp16
Convert the lower double-precision (64-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_maskz_cvt_roundsh_sd⚠Experimentalavx512fp16
Convert the lower half-precision (16-bit) floating-point element in b to a double-precision (64-bit) floating-point element, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
_mm_maskz_cvt_roundsh_ss⚠Experimentalavx512fp16
Convert the lower half-precision (16-bit) floating-point element in b to a single-precision (32-bit) floating-point element, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
_mm_maskz_cvt_roundss_sh⚠Experimentalavx512fp16
Convert the lower single-precision (32-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_maskz_cvtepi16_ph⚠Experimentalavx512fp16,avx512vl
Convert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_maskz_cvtepi32_ph⚠Experimentalavx512fp16,avx512vl
Convert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
_mm_maskz_cvtepi64_ph⚠Experimentalavx512fp16,avx512vl
Convert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The upper 96 bits of dst are zeroed out.
_mm_maskz_cvtepu16_ph⚠Experimentalavx512fp16,avx512vl
Convert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_maskz_cvtepu32_ph⚠Experimentalavx512fp16,avx512vl
Convert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
_mm_maskz_cvtepu64_ph⚠Experimentalavx512fp16,avx512vl
Convert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The upper 96 bits of dst are zeroed out.
_mm_maskz_cvtpd_ph⚠Experimentalavx512fp16,avx512vl
Convert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The upper 96 bits of dst are zeroed out.
_mm_maskz_cvtph_epi16⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_maskz_cvtph_epi32⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_maskz_cvtph_epi64⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_maskz_cvtph_epu16⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_maskz_cvtph_epu32⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_maskz_cvtph_epu64⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_maskz_cvtph_pd⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_maskz_cvtsd_sh⚠Experimentalavx512fp16
Convert the lower double-precision (64-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_maskz_cvtsh_sd⚠Experimentalavx512fp16
Convert the lower half-precision (16-bit) floating-point element in b to a double-precision (64-bit) floating-point element, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
_mm_maskz_cvtsh_ss⚠Experimentalavx512fp16
Convert the lower half-precision (16-bit) floating-point element in b to a single-precision (32-bit) floating-point element, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
_mm_maskz_cvtss_sh⚠Experimentalavx512fp16
Convert the lower single-precision (32-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_maskz_cvttph_epi16⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_maskz_cvttph_epi32⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_maskz_cvttph_epi64⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_maskz_cvttph_epu16⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_maskz_cvttph_epu32⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_maskz_cvttph_epu64⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_maskz_cvtxph_ps⚠Experimentalavx512fp16,avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_maskz_cvtxps_ph⚠Experimentalavx512fp16,avx512vl
Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
_mm_maskz_div_ph⚠Experimentalavx512fp16,avx512vl
Divide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_maskz_div_round_sh⚠Experimentalavx512fp16
Divide the lower half-precision (16-bit) floating-point elements in a by b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using zeromask k (the element is zeroed out when mask bit 0 is not set). Rounding is done according to the rounding parameter, which can be one of:
_mm_maskz_div_sh⚠Experimentalavx512fp16
Divide the lower half-precision (16-bit) floating-point elements in a by b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using zeromask k (the element is zeroed out when mask bit 0 is not set).
_mm_maskz_fcmadd_pch⚠Experimentalavx512fp16,avx512vl
Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate to the corresponding complex numbers in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_maskz_fcmadd_round_sch⚠Experimentalavx512fp16
Multiply the lower complex number in a by the complex conjugate of the lower complex number in b, accumulate to the lower complex number in c using zeromask k (the element is zeroed out when the corresponding mask bit is not set), and store the result in the lower elements of dst, and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1, or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_maskz_fcmadd_sch⚠Experimentalavx512fp16
Multiply the lower complex number in a by the complex conjugate of the lower complex number in b, accumulate to the lower complex number in c, and store the result in the lower elements of dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set), and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_maskz_fcmul_pch⚠Experimentalavx512fp16,avx512vl
Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_maskz_fcmul_round_sch⚠Experimentalavx512fp16
Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b, and store the results in dst using zeromask k (the element is zeroed out when mask bit 0 is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_maskz_fcmul_sch⚠Experimentalavx512fp16
Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b, and store the results in dst using zeromask k (the element is zeroed out when mask bit 0 is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_maskz_fmadd_pch⚠Experimentalavx512fp16,avx512vl
Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_maskz_fmadd_ph⚠Experimentalavx512fp16,avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
_mm_maskz_fmadd_round_sch⚠Experimentalavx512fp16
Multiply the lower complex numbers in a and b, accumulate to the lower complex number in c, and store the result in the lower elements of dst using zeromask k (elements are zeroed out when mask bit 0 is not set), and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_maskz_fmadd_round_sh⚠Experimentalavx512fp16
Multiply the lower half-precision (16-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_maskz_fmadd_sch⚠Experimentalavx512fp16
Multiply the lower complex numbers in a and b, accumulate to the lower complex number in c, and store the result in the lower elements of dst using zeromask k (elements are zeroed out when mask bit 0 is not set), and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_maskz_fmadd_sh⚠Experimentalavx512fp16
Multiply the lower half-precision (16-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_maskz_fmaddsub_ph⚠Experimentalavx512fp16,avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
_mm_maskz_fmsub_ph⚠Experimentalavx512fp16,avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
_mm_maskz_fmsub_round_sh⚠Experimentalavx512fp16
Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract packed elements in c from the intermediate result. Store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_maskz_fmsub_sh⚠Experimentalavx512fp16
Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract packed elements in c from the intermediate result. Store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_maskz_fmsubadd_ph⚠Experimentalavx512fp16,avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
_mm_maskz_fmul_pch⚠Experimentalavx512fp16,avx512vl
Multiply packed complex numbers in a and b, and store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_maskz_fmul_round_sch⚠Experimentalavx512fp16
Multiply the lower complex numbers in a and b, and store the results in dst using zeromask k (the element is zeroed out when mask bit 0 is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_maskz_fmul_sch⚠Experimentalavx512fp16
Multiply the lower complex numbers in a and b, and store the results in dst using zeromask k (the element is zeroed out when mask bit 0 is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_maskz_fnmadd_ph⚠Experimentalavx512fp16,avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
_mm_maskz_fnmadd_round_sh⚠Experimentalavx512fp16
Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using zeromask k (the element is zeroed out when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_maskz_fnmadd_sh⚠Experimentalavx512fp16
Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using zeromask k (the element is zeroed out when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_maskz_fnmsub_ph⚠Experimentalavx512fp16,avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
_mm_maskz_fnmsub_round_sh⚠Experimentalavx512fp16
Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using zeromask k (the element is zeroed out when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_maskz_fnmsub_sh⚠Experimentalavx512fp16
Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using zeromask k (the element is zeroed out when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_maskz_getexp_ph⚠Experimentalavx512fp16,avx512vl
Convert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision (16-bit) floating-point number representing the integer exponent, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element.
_mm_maskz_getexp_round_sh⚠Experimentalavx512fp16
Convert the exponent of the lower half-precision (16-bit) floating-point element in b to a half-precision (16-bit) floating-point number representing the integer exponent, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. This intrinsic essentially calculates floor(log2(x)) for the lower element. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
_mm_maskz_getexp_sh⚠Experimentalavx512fp16
Convert the exponent of the lower half-precision (16-bit) floating-point element in b to a half-precision (16-bit) floating-point number representing the integer exponent, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. This intrinsic essentially calculates floor(log2(x)) for the lower element.
_mm_maskz_getmant_ph⚠Experimentalavx512fp16,avx512vl
Normalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates Β±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign.
_mm_maskz_getmant_round_sh⚠Experimentalavx512fp16
Normalize the mantissas of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. This intrinsic essentially calculates Β±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
_mm_maskz_getmant_sh⚠Experimentalavx512fp16
Normalize the mantissas of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. This intrinsic essentially calculates Β±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign.
_mm_maskz_load_sh⚠Experimentalavx512fp16
Load a half-precision (16-bit) floating-point element from memory into the lower element of a new vector using zeromask k (the element is zeroed out when mask bit 0 is not set), and zero the upper elements.
_mm_maskz_max_ph⚠Experimentalavx512fp16,avx512vl
Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
_mm_maskz_max_round_sh⚠Experimentalavx512fp16,avx512vl
Compare the lower half-precision (16-bit) floating-point elements in a and b, store the maximum value in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
_mm_maskz_max_sh⚠Experimentalavx512fp16,avx512vl
Compare the lower half-precision (16-bit) floating-point elements in a and b, store the maximum value in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
_mm_maskz_min_ph⚠Experimentalavx512fp16,avx512vl
Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
_mm_maskz_min_round_sh⚠Experimentalavx512fp16,avx512vl
Compare the lower half-precision (16-bit) floating-point elements in a and b, store the minimum value in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
_mm_maskz_min_sh⚠Experimentalavx512fp16,avx512vl
Compare the lower half-precision (16-bit) floating-point elements in a and b, store the minimum value in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
_mm_maskz_move_sh⚠Experimentalavx512fp16
Move the lower half-precision (16-bit) floating-point element from b to the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_maskz_mul_pch⚠Experimentalavx512fp16,avx512vl
Multiply packed complex numbers in a and b, and store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_maskz_mul_ph⚠Experimentalavx512fp16,avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_maskz_mul_round_sch⚠Experimentalavx512fp16
Multiply the lower complex numbers in a and b, and store the result in the lower elements of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_maskz_mul_round_sh⚠Experimentalavx512fp16
Multiply the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using zeromask k (the element is zeroed out when mask bit 0 is not set). Rounding is done according to the rounding parameter, which can be one of:
_mm_maskz_mul_sch⚠Experimentalavx512fp16
Multiply the lower complex numbers in a and b, and store the result in the lower elements of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_maskz_mul_sh⚠Experimentalavx512fp16
Multiply the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using zeromask k (the element is zeroed out when mask bit 0 is not set).
_mm_maskz_rcp_ph⚠Experimentalavx512fp16,avx512vl
Compute the approximate reciprocal of packed 16-bit floating-point elements in a and stores the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 1.5*2^-12.
_mm_maskz_rcp_sh⚠Experimentalavx512fp16
Compute the approximate reciprocal of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. The maximum relative error for this approximation is less than 1.5*2^-12.
_mm_maskz_reduce_ph⚠Experimentalavx512fp16,avx512vl
Extract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_maskz_reduce_round_sh⚠Experimentalavx512fp16
Extract the reduced argument of the lower half-precision (16-bit) floating-point element in b by the number of bits specified by imm8, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_maskz_reduce_sh⚠Experimentalavx512fp16
Extract the reduced argument of the lower half-precision (16-bit) floating-point element in b by the number of bits specified by imm8, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_maskz_roundscale_ph⚠Experimentalavx512fp16,avx512vl
Round packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_maskz_roundscale_round_sh⚠Experimentalavx512fp16
Round the lower half-precision (16-bit) floating-point element in b to the number of fraction bits specified by imm8, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_maskz_roundscale_sh⚠Experimentalavx512fp16
Round the lower half-precision (16-bit) floating-point element in b to the number of fraction bits specified by imm8, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_maskz_rsqrt_ph⚠Experimentalavx512fp16,avx512vl
Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 1.5*2^-12.
_mm_maskz_rsqrt_sh⚠Experimentalavx512fp16
Compute the approximate reciprocal square root of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. The maximum relative error for this approximation is less than 1.5*2^-12.
_mm_maskz_scalef_ph⚠Experimentalavx512fp16,avx512vl
Scale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_maskz_scalef_round_sh⚠Experimentalavx512fp16
Scale the packed single-precision (32-bit) floating-point elements in a using values from b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_maskz_scalef_sh⚠Experimentalavx512fp16
Scale the packed single-precision (32-bit) floating-point elements in a using values from b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_maskz_sqrt_ph⚠Experimentalavx512fp16,avx512vl
Compute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_maskz_sqrt_round_sh⚠Experimentalavx512fp16
Compute the square root of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. Rounding is done according to the rounding parameter, which can be one of:
_mm_maskz_sqrt_sh⚠Experimentalavx512fp16
Compute the square root of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_maskz_sub_ph⚠Experimentalavx512fp16,avx512vl
Subtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_maskz_sub_round_sh⚠Experimentalavx512fp16
Subtract the lower half-precision (16-bit) floating-point elements in b from a, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using zeromask k (the element is zeroed out when mask bit 0 is not set). Rounding is done according to the rounding parameter, which can be one of:
_mm_maskz_sub_sh⚠Experimentalavx512fp16
Subtract the lower half-precision (16-bit) floating-point elements in b from a, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using zeromask k (the element is zeroed out when mask bit 0 is not set).
_mm_max_ph⚠Experimentalavx512fp16,avx512vl
Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
_mm_max_round_sh⚠Experimentalavx512fp16,avx512vl
Compare the lower half-precision (16-bit) floating-point elements in a and b, store the maximum value in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
_mm_max_sh⚠Experimentalavx512fp16,avx512vl
Compare the lower half-precision (16-bit) floating-point elements in a and b, store the maximum value in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
_mm_min_ph⚠Experimentalavx512fp16,avx512vl
Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
_mm_min_round_sh⚠Experimentalavx512fp16,avx512vl
Compare the lower half-precision (16-bit) floating-point elements in a and b, store the minimum value in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
_mm_min_sh⚠Experimentalavx512fp16,avx512vl
Compare the lower half-precision (16-bit) floating-point elements in a and b, store the minimum value in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
_mm_move_sh⚠Experimentalavx512fp16
Move the lower half-precision (16-bit) floating-point element from b to the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_mul_pch⚠Experimentalavx512fp16,avx512vl
Multiply packed complex numbers in a and b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_mul_ph⚠Experimentalavx512fp16,avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst.
_mm_mul_round_sch⚠Experimentalavx512fp16
Multiply the lower complex numbers in a and b, and store the result in the lower elements of dst, and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_mul_round_sh⚠Experimentalavx512fp16
Multiply the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. Rounding is done according to the rounding parameter, which can be one of:
_mm_mul_sch⚠Experimentalavx512fp16
Multiply the lower complex numbers in a and b, and store the result in the lower elements of dst, and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_mul_sh⚠Experimentalavx512fp16
Multiply the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_permutex2var_ph⚠Experimentalavx512fp16,avx512vl
Shuffle half-precision (16-bit) floating-point elements in a and b using the corresponding selector and index in idx, and store the results in dst.
_mm_permutexvar_ph⚠Experimentalavx512fp16,avx512vl
Shuffle half-precision (16-bit) floating-point elements in a using the corresponding index in idx, and store the results in dst.
_mm_rcp_ph⚠Experimentalavx512fp16,avx512vl
Compute the approximate reciprocal of packed 16-bit floating-point elements in a and stores the results in dst. The maximum relative error for this approximation is less than 1.5*2^-12.
_mm_rcp_sh⚠Experimentalavx512fp16
Compute the approximate reciprocal of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. The maximum relative error for this approximation is less than 1.5*2^-12.
_mm_reduce_add_ph⚠Experimentalavx512fp16,avx512vl
Reduce the packed half-precision (16-bit) floating-point elements in a by addition. Returns the sum of all elements in a.
_mm_reduce_max_ph⚠Experimentalavx512fp16,avx512vl
Reduce the packed half-precision (16-bit) floating-point elements in a by maximum. Returns the maximum of all elements in a.
_mm_reduce_min_ph⚠Experimentalavx512fp16,avx512vl
Reduce the packed half-precision (16-bit) floating-point elements in a by minimum. Returns the minimum of all elements in a.
_mm_reduce_mul_ph⚠Experimentalavx512fp16,avx512vl
Reduce the packed half-precision (16-bit) floating-point elements in a by multiplication. Returns the product of all elements in a.
_mm_reduce_ph⚠Experimentalavx512fp16,avx512vl
Extract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst.
_mm_reduce_round_sh⚠Experimentalavx512fp16
Extract the reduced argument of the lower half-precision (16-bit) floating-point element in b by the number of bits specified by imm8, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_reduce_sh⚠Experimentalavx512fp16
Extract the reduced argument of the lower half-precision (16-bit) floating-point element in b by the number of bits specified by imm8, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_roundscale_ph⚠Experimentalavx512fp16,avx512vl
Round packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst.
_mm_roundscale_round_sh⚠Experimentalavx512fp16
Round the lower half-precision (16-bit) floating-point element in b to the number of fraction bits specified by imm8, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_roundscale_sh⚠Experimentalavx512fp16
Round the lower half-precision (16-bit) floating-point element in b to the number of fraction bits specified by imm8, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_rsqrt_ph⚠Experimentalavx512fp16,avx512vl
Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst. The maximum relative error for this approximation is less than 1.5*2^-12.
_mm_rsqrt_sh⚠Experimentalavx512fp16
Compute the approximate reciprocal square root of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. The maximum relative error for this approximation is less than 1.5*2^-12.
_mm_scalef_ph⚠Experimentalavx512fp16,avx512vl
Scale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst.
_mm_scalef_round_sh⚠Experimentalavx512fp16
Scale the packed single-precision (32-bit) floating-point elements in a using values from b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_scalef_sh⚠Experimentalavx512fp16
Scale the packed single-precision (32-bit) floating-point elements in a using values from b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_set1_ph⚠Experimentalavx512fp16
Broadcast the half-precision (16-bit) floating-point value a to all elements of dst.
_mm_set_ph⚠Experimentalavx512fp16
Set packed half-precision (16-bit) floating-point elements in dst with the supplied values.
_mm_set_sh⚠Experimentalavx512fp16
Copy half-precision (16-bit) floating-point elements from a to the lower element of dst and zero the upper 7 elements.
_mm_setr_ph⚠Experimentalavx512fp16
Set packed half-precision (16-bit) floating-point elements in dst with the supplied values in reverse order.
_mm_setzero_ph⚠Experimentalavx512fp16,avx512vl
Return vector of type __m128h with all elements set to zero.
_mm_sqrt_ph⚠Experimentalavx512fp16,avx512vl
Compute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst.
_mm_sqrt_round_sh⚠Experimentalavx512fp16
Compute the square root of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. Rounding is done according to the rounding parameter, which can be one of:
_mm_sqrt_sh⚠Experimentalavx512fp16
Compute the square root of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_store_ph⚠Experimentalavx512fp16,avx512vl
Store 128-bits (composed of 8 packed half-precision (16-bit) floating-point elements) from a into memory. The address must be aligned to 16 bytes or a general-protection exception may be generated.
_mm_store_sh⚠Experimentalavx512fp16
Store the lower half-precision (16-bit) floating-point element from a into memory.
_mm_storeu_ph⚠Experimentalavx512fp16,avx512vl
Store 128-bits (composed of 8 packed half-precision (16-bit) floating-point elements) from a into memory. The address does not need to be aligned to any particular boundary.
_mm_sub_ph⚠Experimentalavx512fp16,avx512vl
Subtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst.
_mm_sub_round_sh⚠Experimentalavx512fp16
Subtract the lower half-precision (16-bit) floating-point elements in b from a, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. Rounding is done according to the rounding parameter, which can be one of:
_mm_sub_sh⚠Experimentalavx512fp16
Subtract the lower half-precision (16-bit) floating-point elements in b from a, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_ucomieq_sh⚠Experimentalavx512fp16
Compare the lower half-precision (16-bit) floating-point elements in a and b for equality, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.
_mm_ucomige_sh⚠Experimentalavx512fp16
Compare the lower half-precision (16-bit) floating-point elements in a and b for greater-than-or-equal, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.
_mm_ucomigt_sh⚠Experimentalavx512fp16
Compare the lower half-precision (16-bit) floating-point elements in a and b for greater-than, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.
_mm_ucomile_sh⚠Experimentalavx512fp16
Compare the lower half-precision (16-bit) floating-point elements in a and b for less-than-or-equal, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.
_mm_ucomilt_sh⚠Experimentalavx512fp16
Compare the lower half-precision (16-bit) floating-point elements in a and b for less-than, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.
_mm_ucomineq_sh⚠Experimentalavx512fp16
Compare the lower half-precision (16-bit) floating-point elements in a and b for not-equal, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.
_mm_undefined_ph⚠Experimentalavx512fp16,avx512vl
Return vector of type __m128h with indetermination elements. Despite using the word β€œundefined” (following Intel’s naming scheme), this non-deterministically picks some valid value and is not equivalent to mem::MaybeUninit. In practice, this is typically equivalent to mem::zeroed.