Available on x86 or x86-64 only.
MacrosΒ§
- cmp_asm π
- fpclass_
asm π
FunctionsΒ§
- vaddph π β
- vaddsh π β
- vcmpsh π β
- vcomish π β
- vcvtdq2ph_
128 π β - vcvtdq2ph_
256 π β - vcvtdq2ph_
512 π β - vcvtpd2ph_
128 π β - vcvtpd2ph_
256 π β - vcvtpd2ph_
512 π β - vcvtph2dq_
128 π β - vcvtph2dq_
256 π β - vcvtph2dq_
512 π β - vcvtph2pd_
128 π β - vcvtph2pd_
256 π β - vcvtph2pd_
512 π β - vcvtph2psx_
128 π β - vcvtph2psx_
256 π β - vcvtph2psx_
512 π β - vcvtph2qq_
128 π β - vcvtph2qq_
256 π β - vcvtph2qq_
512 π β - vcvtph2udq_
128 π β - vcvtph2udq_
256 π β - vcvtph2udq_
512 π β - vcvtph2uqq_
128 π β - vcvtph2uqq_
256 π β - vcvtph2uqq_
512 π β - vcvtph2uw_
128 π β - vcvtph2uw_
256 π β - vcvtph2uw_
512 π β - vcvtph2w_
128 π β - vcvtph2w_
256 π β - vcvtph2w_
512 π β - vcvtps2phx_
128 π β - vcvtps2phx_
256 π β - vcvtps2phx_
512 π β - vcvtqq2ph_
128 π β - vcvtqq2ph_
256 π β - vcvtqq2ph_
512 π β - vcvtsd2sh π β
- vcvtsh2sd π β
- vcvtsh2si32 π β
- vcvtsh2ss π β
- vcvtsh2usi32 π β
- vcvtsi2sh π β
- vcvtss2sh π β
- vcvttph2dq_
128 π β - vcvttph2dq_
256 π β - vcvttph2dq_
512 π β - vcvttph2qq_
128 π β - vcvttph2qq_
256 π β - vcvttph2qq_
512 π β - vcvttph2udq_
128 π β - vcvttph2udq_
256 π β - vcvttph2udq_
512 π β - vcvttph2uqq_
128 π β - vcvttph2uqq_
256 π β - vcvttph2uqq_
512 π β - vcvttph2uw_
128 π β - vcvttph2uw_
256 π β - vcvttph2uw_
512 π β - vcvttph2w_
128 π β - vcvttph2w_
256 π β - vcvttph2w_
512 π β - vcvttsh2si32 π β
- vcvttsh2usi32 π β
- vcvtudq2ph_
128 π β - vcvtudq2ph_
256 π β - vcvtudq2ph_
512 π β - vcvtuqq2ph_
128 π β - vcvtuqq2ph_
256 π β - vcvtuqq2ph_
512 π β - vcvtusi2sh π β
- vcvtuw2ph_
128 π β - vcvtuw2ph_
256 π β - vcvtuw2ph_
512 π β - vcvtw2ph_
128 π β - vcvtw2ph_
256 π β - vcvtw2ph_
512 π β - vdivph π β
- vdivsh π β
- vfcmaddcph_
mask3_ π β128 - vfcmaddcph_
mask3_ π β256 - vfcmaddcph_
mask3_ π β512 - vfcmaddcph_
maskz_ π β128 - vfcmaddcph_
maskz_ π β256 - vfcmaddcph_
maskz_ π β512 - vfcmaddcsh_
mask π β - vfcmaddcsh_
maskz π β - vfcmulcph_
128 π β - vfcmulcph_
256 π β - vfcmulcph_
512 π β - vfcmulcsh π β
- vfmaddcph_
mask3_ π β128 - vfmaddcph_
mask3_ π β256 - vfmaddcph_
mask3_ π β512 - vfmaddcph_
maskz_ π β128 - vfmaddcph_
maskz_ π β256 - vfmaddcph_
maskz_ π β512 - vfmaddcsh_
mask π β - vfmaddcsh_
maskz π β - vfmaddph_
512 π β - vfmaddsh π β
- vfmaddsubph_
128 π β - vfmaddsubph_
256 π β - vfmaddsubph_
512 π β - vfmulcph_
128 π β - vfmulcph_
256 π β - vfmulcph_
512 π β - vfmulcsh π β
- vfpclasssh π β
- vgetexpph_
128 π β - vgetexpph_
256 π β - vgetexpph_
512 π β - vgetexpsh π β
- vgetmantph_
128 π β - vgetmantph_
256 π β - vgetmantph_
512 π β - vgetmantsh π β
- vmaxph_
128 π β - vmaxph_
256 π β - vmaxph_
512 π β - vmaxsh π β
- vminph_
128 π β - vminph_
256 π β - vminph_
512 π β - vminsh π β
- vmulph π β
- vmulsh π β
- vrcpph_
128 π β - vrcpph_
256 π β - vrcpph_
512 π β - vrcpsh π β
- vreduceph_
128 π β - vreduceph_
256 π β - vreduceph_
512 π β - vreducesh π β
- vrndscaleph_
128 π β - vrndscaleph_
256 π β - vrndscaleph_
512 π β - vrndscalesh π β
- vrsqrtph_
128 π β - vrsqrtph_
256 π β - vrsqrtph_
512 π β - vrsqrtsh π β
- vscalefph_
128 π β - vscalefph_
256 π β - vscalefph_
512 π β - vscalefsh π β
- vsqrtph_
512 π β - vsqrtsh π β
- vsubph π β
- vsubsh π β
- _mm256_
abs_ βph Experimental avx512fp16,avx512vl
- Finds the absolute value of each packed half-precision (16-bit) floating-point element in v2, storing the result in dst.
- _mm256_
add_ βph Experimental avx512fp16,avx512vl
- Add packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst.
- _mm256_
castpd_ βph Experimental avx512fp16
- Cast vector of type
__m256d
to type__m256h
. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. - _mm256_
castph128_ βph256 Experimental avx512fp16
- Cast vector of type
__m128h
to type__m256h
. The upper 8 elements of the result are undefined. In practice, the upper elements are zeroed. This intrinsic can generate thevzeroupper
instruction, but most of the time it does not generate any instructions. - _mm256_
castph256_ βph128 Experimental avx512fp16
- Cast vector of type
__m256h
to type__m128h
. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. - _mm256_
castph_ βpd Experimental avx512fp16
- Cast vector of type
__m256h
to type__m256d
. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. - _mm256_
castph_ βps Experimental avx512fp16
- Cast vector of type
__m256h
to type__m256
. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. - _mm256_
castph_ βsi256 Experimental avx512fp16
- Cast vector of type
__m256h
to type__m256i
. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. - _mm256_
castps_ βph Experimental avx512fp16
- Cast vector of type
__m256
to type__m256h
. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. - _mm256_
castsi256_ βph Experimental avx512fp16
- Cast vector of type
__m256i
to type__m256h
. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. - _mm256_
cmp_ βph_ mask Experimental avx512fp16,avx512vl
- Compare packed half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
- _mm256_
cmul_ βpch Experimental avx512fp16,avx512vl
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and
store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit)
floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm256_
conj_ βpch Experimental avx512fp16,avx512vl
- Compute the complex conjugates of complex numbers in a, and store the results in dst. Each complex number
is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex
number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm256_
cvtepi16_ βph Experimental avx512fp16,avx512vl
- Convert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm256_
cvtepi32_ βph Experimental avx512fp16,avx512vl
- Convert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm256_
cvtepi64_ βph Experimental avx512fp16,avx512vl
- Convert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst. The upper 64 bits of dst are zeroed out.
- _mm256_
cvtepu16_ βph Experimental avx512fp16,avx512vl
- Convert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm256_
cvtepu32_ βph Experimental avx512fp16,avx512vl
- Convert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm256_
cvtepu64_ βph Experimental avx512fp16,avx512vl
- Convert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst. The upper 64 bits of dst are zeroed out.
- _mm256_
cvtpd_ βph Experimental avx512fp16,avx512vl
- Convert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst. The upper 64 bits of dst are zeroed out.
- _mm256_
cvtph_ βepi16 Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst.
- _mm256_
cvtph_ βepi32 Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst.
- _mm256_
cvtph_ βepi64 Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst.
- _mm256_
cvtph_ βepu16 Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst.
- _mm256_
cvtph_ βepu32 Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers, and store the results in dst.
- _mm256_
cvtph_ βepu64 Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst.
- _mm256_
cvtph_ βpd Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst.
- _mm256_
cvtsh_ βh Experimental avx512fp16
- Copy the lower half-precision (16-bit) floating-point element from
a
todst
. - _mm256_
cvttph_ βepi16 Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst.
- _mm256_
cvttph_ βepi32 Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst.
- _mm256_
cvttph_ βepi64 Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst.
- _mm256_
cvttph_ βepu16 Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst.
- _mm256_
cvttph_ βepu32 Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst.
- _mm256_
cvttph_ βepu64 Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst.
- _mm256_
cvtxph_ βps Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst.
- _mm256_
cvtxps_ βph Experimental avx512fp16,avx512vl
- Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm256_
div_ βph Experimental avx512fp16,avx512vl
- Divide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst.
- _mm256_
fcmadd_ βpch Experimental avx512fp16,avx512vl
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate
to the corresponding complex numbers in c, and store the results in dst. Each complex number is composed
of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm256_
fcmul_ βpch Experimental avx512fp16,avx512vl
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and
store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit)
floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm256_
fmadd_ βpch Experimental avx512fp16,avx512vl
- Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c,
and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit)
floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm256_
fmadd_ βph Experimental avx512fp16,avx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst.
- _mm256_
fmaddsub_ βph Experimental avx512fp16,avx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst.
- _mm256_
fmsub_ βph Experimental avx512fp16,avx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst.
- _mm256_
fmsubadd_ βph Experimental avx512fp16,avx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst.
- _mm256_
fmul_ βpch Experimental avx512fp16,avx512vl
- Multiply packed complex numbers in a and b, and store the results in dst. Each complex number is
composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex
number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm256_
fnmadd_ βph Experimental avx512fp16,avx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst.
- _mm256_
fnmsub_ βph Experimental avx512fp16,avx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst.
- _mm256_
fpclass_ βph_ mask Experimental avx512fp16,avx512vl
- Test packed half-precision (16-bit) floating-point elements in a for special categories specified by imm8, and store the results in mask vector k. imm can be a combination of:
- _mm256_
getexp_ βph Experimental avx512fp16,avx512vl
- Convert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision
(16-bit) floating-point number representing the integer exponent, and store the results in dst.
This intrinsic essentially calculates
floor(log2(x))
for each element. - _mm256_
getmant_ βph Experimental avx512fp16,avx512vl
- Normalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store
the results in dst. This intrinsic essentially calculates
Β±(2^k)*|x.significand|
, where k depends on the interval range defined by norm and the sign depends on sign and the source sign. - _mm256_
load_ βph Experimental avx512fp16,avx512vl
- Load 256-bits (composed of 16 packed half-precision (16-bit) floating-point elements) from memory into a new vector. The address must be aligned to 32 bytes or a general-protection exception may be generated.
- _mm256_
loadu_ βph Experimental avx512fp16,avx512vl
- Load 256-bits (composed of 16 packed half-precision (16-bit) floating-point elements) from memory into a new vector. The address does not need to be aligned to any particular boundary.
- _mm256_
mask3_ βfcmadd_ pch Experimental avx512fp16,avx512vl
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate
to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is
copied from c when the corresponding mask bit is not set). Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm256_
mask3_ βfmadd_ pch Experimental avx512fp16,avx512vl
- Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c,
and store the results in dst using writemask k (the element is copied from c when the corresponding
mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit)
floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm256_
mask3_ βfmadd_ ph Experimental avx512fp16,avx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm256_
mask3_ βfmaddsub_ ph Experimental avx512fp16,avx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm256_
mask3_ βfmsub_ ph Experimental avx512fp16,avx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm256_
mask3_ βfmsubadd_ ph Experimental avx512fp16,avx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm256_
mask3_ βfnmadd_ ph Experimental avx512fp16,avx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm256_
mask3_ βfnmsub_ ph Experimental avx512fp16,avx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm256_
mask_ βadd_ ph Experimental avx512fp16,avx512vl
- Add packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_
mask_ βblend_ ph Experimental avx512fp16,avx512vl
- Blend packed half-precision (16-bit) floating-point elements from a and b using control mask k, and store the results in dst.
- _mm256_
mask_ βcmp_ ph_ mask Experimental avx512fp16,avx512vl
- Compare packed half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_
mask_ βcmul_ pch Experimental avx512fp16,avx512vl
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and
store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set).
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which
defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm256_
mask_ βconj_ pch Experimental avx512fp16,avx512vl
- Compute the complex conjugates of complex numbers in a, and store the results in dst using writemask k
(the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two
adjacent half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm256_
mask_ βcvtepi16_ ph Experimental avx512fp16,avx512vl
- Convert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm256_
mask_ βcvtepi32_ ph Experimental avx512fp16,avx512vl
- Convert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm256_
mask_ βcvtepi64_ ph Experimental avx512fp16,avx512vl
- Convert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
- _mm256_
mask_ βcvtepu16_ ph Experimental avx512fp16,avx512vl
- Convert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm256_
mask_ βcvtepu32_ ph Experimental avx512fp16,avx512vl
- Convert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm256_
mask_ βcvtepu64_ ph Experimental avx512fp16,avx512vl
- Convert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
- _mm256_
mask_ βcvtpd_ ph Experimental avx512fp16,avx512vl
- Convert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
- _mm256_
mask_ βcvtph_ epi16 Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_
mask_ βcvtph_ epi32 Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_
mask_ βcvtph_ epi64 Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_
mask_ βcvtph_ epu16 Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_
mask_ βcvtph_ epu32 Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_
mask_ βcvtph_ epu64 Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_
mask_ βcvtph_ pd Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm256_
mask_ βcvttph_ epi16 Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_
mask_ βcvttph_ epi32 Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_
mask_ βcvttph_ epi64 Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_
mask_ βcvttph_ epu16 Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_
mask_ βcvttph_ epu32 Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_
mask_ βcvttph_ epu64 Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_
mask_ βcvtxph_ ps Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm256_
mask_ βcvtxps_ ph Experimental avx512fp16,avx512vl
- Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm256_
mask_ βdiv_ ph Experimental avx512fp16,avx512vl
- Divide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_
mask_ βfcmadd_ pch Experimental avx512fp16,avx512vl
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate
to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is
copied from a when the corresponding mask bit is not set). Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm256_
mask_ βfcmul_ pch Experimental avx512fp16,avx512vl
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and
store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set).
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which
defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm256_
mask_ βfmadd_ pch Experimental avx512fp16,avx512vl
- Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c,
and store the results in dst using writemask k (the element is copied from a when the corresponding mask
bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point
elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm256_
mask_ βfmadd_ ph Experimental avx512fp16,avx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm256_
mask_ βfmaddsub_ ph Experimental avx512fp16,avx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm256_
mask_ βfmsub_ ph Experimental avx512fp16,avx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm256_
mask_ βfmsubadd_ ph Experimental avx512fp16,avx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm256_
mask_ βfmul_ pch Experimental avx512fp16,avx512vl
- Multiply packed complex numbers in a and b, and store the results in dst using writemask k (the element
is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision
(16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm256_
mask_ βfnmadd_ ph Experimental avx512fp16,avx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm256_
mask_ βfnmsub_ ph Experimental avx512fp16,avx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm256_
mask_ βfpclass_ ph_ mask Experimental avx512fp16,avx512vl
- Test packed half-precision (16-bit) floating-point elements in a for special categories specified by imm8, and store the results in mask vector k using zeromask k (elements are zeroed out when the corresponding mask bit is not set). imm can be a combination of:
- _mm256_
mask_ βgetexp_ ph Experimental avx512fp16,avx512vl
- Convert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision
(16-bit) floating-point number representing the integer exponent, and store the results in dst using writemask k
(elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates
floor(log2(x))
for each element. - _mm256_
mask_ βgetmant_ ph Experimental avx512fp16,avx512vl
- Normalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store
the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
This intrinsic essentially calculates
Β±(2^k)*|x.significand|
, where k depends on the interval range defined by norm and the sign depends on sign and the source sign. - _mm256_
mask_ βmax_ ph Experimental avx512fp16,avx512vl
- Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
- _mm256_
mask_ βmin_ ph Experimental avx512fp16,avx512vl
- Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
- _mm256_
mask_ βmul_ pch Experimental avx512fp16,avx512vl
- Multiply packed complex numbers in a and b, and store the results in dst using writemask k (the element
is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm256_
mask_ βmul_ ph Experimental avx512fp16,avx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_
mask_ βrcp_ ph Experimental avx512fp16,avx512vl
- Compute the approximate reciprocal of packed 16-bit floating-point elements in
a
and stores the results indst
using writemaskk
(elements are copied fromsrc
when the corresponding mask bit is not set). The maximum relative error for this approximation is less than1.5*2^-12
. - _mm256_
mask_ βreduce_ ph Experimental avx512fp16,avx512vl
- Extract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_
mask_ βroundscale_ ph Experimental avx512fp16,avx512vl
- Round packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_
mask_ βrsqrt_ ph Experimental avx512fp16,avx512vl
- Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point
elements in a, and store the results in dst using writemask k (elements are copied from src when
the corresponding mask bit is not set).
The maximum relative error for this approximation is less than
1.5*2^-12
. - _mm256_
mask_ βscalef_ ph Experimental avx512fp16,avx512vl
- Scale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_
mask_ βsqrt_ ph Experimental avx512fp16,avx512vl
- Compute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_
mask_ βsub_ ph Experimental avx512fp16,avx512vl
- Subtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_
maskz_ βadd_ ph Experimental avx512fp16,avx512vl
- Add packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_
maskz_ βcmul_ pch Experimental avx512fp16,avx512vl
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and
store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set).
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which
defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm256_
maskz_ βconj_ pch Experimental avx512fp16,avx512vl
- Compute the complex conjugates of complex numbers in a, and store the results in dst using zeromask k
(the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm256_
maskz_ βcvtepi16_ ph Experimental avx512fp16,avx512vl
- Convert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_
maskz_ βcvtepi32_ ph Experimental avx512fp16,avx512vl
- Convert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_
maskz_ βcvtepi64_ ph Experimental avx512fp16,avx512vl
- Convert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
- _mm256_
maskz_ βcvtepu16_ ph Experimental avx512fp16,avx512vl
- Convert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_
maskz_ βcvtepu32_ ph Experimental avx512fp16,avx512vl
- Convert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_
maskz_ βcvtepu64_ ph Experimental avx512fp16,avx512vl
- Convert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
- _mm256_
maskz_ βcvtpd_ ph Experimental avx512fp16,avx512vl
- Convert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
- _mm256_
maskz_ βcvtph_ epi16 Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_
maskz_ βcvtph_ epi32 Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_
maskz_ βcvtph_ epi64 Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_
maskz_ βcvtph_ epu16 Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_
maskz_ βcvtph_ epu32 Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_
maskz_ βcvtph_ epu64 Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_
maskz_ βcvtph_ pd Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_
maskz_ βcvttph_ epi16 Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_
maskz_ βcvttph_ epi32 Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_
maskz_ βcvttph_ epi64 Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_
maskz_ βcvttph_ epu16 Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_
maskz_ βcvttph_ epu32 Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_
maskz_ βcvttph_ epu64 Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_
maskz_ βcvtxph_ ps Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_
maskz_ βcvtxps_ ph Experimental avx512fp16,avx512vl
- Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_
maskz_ βdiv_ ph Experimental avx512fp16,avx512vl
- Divide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_
maskz_ βfcmadd_ pch Experimental avx512fp16,avx512vl
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate
to the corresponding complex numbers in c, and store the results in dst using zeromask k (the element is
zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm256_
maskz_ βfcmul_ pch Experimental avx512fp16,avx512vl
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and
store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set).
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which
defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm256_
maskz_ βfmadd_ pch Experimental avx512fp16,avx512vl
- Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c,
and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask
bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point
elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm256_
maskz_ βfmadd_ ph Experimental avx512fp16,avx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm256_
maskz_ βfmaddsub_ ph Experimental avx512fp16,avx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm256_
maskz_ βfmsub_ ph Experimental avx512fp16,avx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm256_
maskz_ βfmsubadd_ ph Experimental avx512fp16,avx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm256_
maskz_ βfmul_ pch Experimental avx512fp16,avx512vl
- Multiply packed complex numbers in a and b, and store the results in dst using zeromask k (the element
is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision
(16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm256_
maskz_ βfnmadd_ ph Experimental avx512fp16,avx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm256_
maskz_ βfnmsub_ ph Experimental avx512fp16,avx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm256_
maskz_ βgetexp_ ph Experimental avx512fp16,avx512vl
- Convert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision
(16-bit) floating-point number representing the integer exponent, and store the results in dst using zeromask
k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates
floor(log2(x))
for each element. - _mm256_
maskz_ βgetmant_ ph Experimental avx512fp16,avx512vl
- Normalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store
the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
This intrinsic essentially calculates
Β±(2^k)*|x.significand|
, where k depends on the interval range defined by norm and the sign depends on sign and the source sign. - _mm256_
maskz_ βmax_ ph Experimental avx512fp16,avx512vl
- Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
- _mm256_
maskz_ βmin_ ph Experimental avx512fp16,avx512vl
- Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
- _mm256_
maskz_ βmul_ pch Experimental avx512fp16,avx512vl
- Multiply packed complex numbers in a and b, and store the results in dst using zeromask k (the element
is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm256_
maskz_ βmul_ ph Experimental avx512fp16,avx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_
maskz_ βrcp_ ph Experimental avx512fp16,avx512vl
- Compute the approximate reciprocal of packed 16-bit floating-point elements in
a
and stores the results indst
using zeromaskk
(elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than1.5*2^-12
. - _mm256_
maskz_ βreduce_ ph Experimental avx512fp16,avx512vl
- Extract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_
maskz_ βroundscale_ ph Experimental avx512fp16,avx512vl
- Round packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_
maskz_ βrsqrt_ ph Experimental avx512fp16,avx512vl
- Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point
elements in a, and store the results in dst using zeromask k (elements are zeroed out when the
corresponding mask bit is not set).
The maximum relative error for this approximation is less than
1.5*2^-12
. - _mm256_
maskz_ βscalef_ ph Experimental avx512fp16,avx512vl
- Scale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_
maskz_ βsqrt_ ph Experimental avx512fp16,avx512vl
- Compute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_
maskz_ βsub_ ph Experimental avx512fp16,avx512vl
- Subtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_
max_ βph Experimental avx512fp16,avx512vl
- Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
- _mm256_
min_ βph Experimental avx512fp16,avx512vl
- Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
- _mm256_
mul_ βpch Experimental avx512fp16,avx512vl
- Multiply packed complex numbers in a and b, and store the results in dst. Each complex number is
composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex
number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm256_
mul_ βph Experimental avx512fp16,avx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst.
- _mm256_
permutex2var_ βph Experimental avx512fp16,avx512vl
- Shuffle half-precision (16-bit) floating-point elements in a and b using the corresponding selector and index in idx, and store the results in dst.
- _mm256_
permutexvar_ βph Experimental avx512fp16,avx512vl
- Shuffle half-precision (16-bit) floating-point elements in a using the corresponding index in idx, and store the results in dst.
- _mm256_
rcp_ βph Experimental avx512fp16,avx512vl
- Compute the approximate reciprocal of packed 16-bit floating-point elements in
a
and stores the results indst
. The maximum relative error for this approximation is less than1.5*2^-12
. - _mm256_
reduce_ βadd_ ph Experimental avx512fp16,avx512vl
- Reduce the packed half-precision (16-bit) floating-point elements in a by addition. Returns the sum of all elements in a.
- _mm256_
reduce_ βmax_ ph Experimental avx512fp16,avx512vl
- Reduce the packed half-precision (16-bit) floating-point elements in a by maximum. Returns the maximum of all elements in a.
- _mm256_
reduce_ βmin_ ph Experimental avx512fp16,avx512vl
- Reduce the packed half-precision (16-bit) floating-point elements in a by minimum. Returns the minimum of all elements in a.
- _mm256_
reduce_ βmul_ ph Experimental avx512fp16,avx512vl
- Reduce the packed half-precision (16-bit) floating-point elements in a by multiplication. Returns the product of all elements in a.
- _mm256_
reduce_ βph Experimental avx512fp16,avx512vl
- Extract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst.
- _mm256_
roundscale_ βph Experimental avx512fp16,avx512vl
- Round packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst.
- _mm256_
rsqrt_ βph Experimental avx512fp16,avx512vl
- Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point
elements in a, and store the results in dst.
The maximum relative error for this approximation is less than
1.5*2^-12
. - _mm256_
scalef_ βph Experimental avx512fp16,avx512vl
- Scale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst.
- _mm256_
set1_ βph Experimental avx512fp16
- Broadcast the half-precision (16-bit) floating-point value a to all elements of dst.
- _mm256_
set_ βph Experimental avx512fp16
- Set packed half-precision (16-bit) floating-point elements in dst with the supplied values.
- _mm256_
setr_ βph Experimental avx512fp16
- Set packed half-precision (16-bit) floating-point elements in dst with the supplied values in reverse order.
- _mm256_
setzero_ βph Experimental avx512fp16,avx512vl
- Return vector of type __m256h with all elements set to zero.
- _mm256_
sqrt_ βph Experimental avx512fp16,avx512vl
- Compute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst.
- _mm256_
store_ βph Experimental avx512fp16,avx512vl
- Store 256-bits (composed of 16 packed half-precision (16-bit) floating-point elements) from a into memory. The address must be aligned to 32 bytes or a general-protection exception may be generated.
- _mm256_
storeu_ βph Experimental avx512fp16,avx512vl
- Store 256-bits (composed of 16 packed half-precision (16-bit) floating-point elements) from a into memory. The address does not need to be aligned to any particular boundary.
- _mm256_
sub_ βph Experimental avx512fp16,avx512vl
- Subtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst.
- _mm256_
undefined_ βph Experimental avx512fp16,avx512vl
- Return vector of type
__m256h
with indetermination elements. Despite using the word βundefinedβ (following Intelβs naming scheme), this non-deterministically picks some valid value and is not equivalent tomem::MaybeUninit
. In practice, this is typically equivalent tomem::zeroed
. - _mm256_
zextph128_ βph256 Experimental avx512fp16
- Cast vector of type
__m256h
to type__m128h
. The upper 8 elements of the result are zeroed. This intrinsic can generate thevzeroupper
instruction, but most of the time it does not generate any instructions. - _mm512_
abs_ βph Experimental avx512fp16
- Finds the absolute value of each packed half-precision (16-bit) floating-point element in v2, storing the result in dst.
- _mm512_
add_ βph Experimental avx512fp16
- Add packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst.
- _mm512_
add_ βround_ ph Experimental avx512fp16
- Add packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst. Rounding is done according to the rounding parameter, which can be one of:
- _mm512_
castpd_ βph Experimental avx512fp16
- Cast vector of type
__m512d
to type__m512h
. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. - _mm512_
castph128_ βph512 Experimental avx512fp16
- Cast vector of type
__m128h
to type__m512h
. The upper 24 elements of the result are undefined. In practice, the upper elements are zeroed. This intrinsic can generate thevzeroupper
instruction, but most of the time it does not generate any instructions. - _mm512_
castph256_ βph512 Experimental avx512fp16
- Cast vector of type
__m256h
to type__m512h
. The upper 16 elements of the result are undefined. In practice, the upper elements are zeroed. This intrinsic can generate thevzeroupper
instruction, but most of the time it does not generate any instructions. - _mm512_
castph512_ βph128 Experimental avx512fp16
- Cast vector of type
__m512h
to type__m128h
. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. - _mm512_
castph512_ βph256 Experimental avx512fp16
- Cast vector of type
__m512h
to type__m256h
. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. - _mm512_
castph_ βpd Experimental avx512fp16
- Cast vector of type
__m512h
to type__m512d
. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. - _mm512_
castph_ βps Experimental avx512fp16
- Cast vector of type
__m512h
to type__m512
. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. - _mm512_
castph_ βsi512 Experimental avx512fp16
- Cast vector of type
__m512h
to type__m512i
. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. - _mm512_
castps_ βph Experimental avx512fp16
- Cast vector of type
__m512
to type__m512h
. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. - _mm512_
castsi512_ βph Experimental avx512fp16
- Cast vector of type
__m512i
to type__m512h
. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. - _mm512_
cmp_ βph_ mask Experimental avx512fp16
- Compare packed half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
- _mm512_
cmp_ βround_ ph_ mask Experimental avx512fp16
- Compare packed half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
- _mm512_
cmul_ βpch Experimental avx512fp16
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and
store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit)
floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm512_
cmul_ βround_ pch Experimental avx512fp16
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and
store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit)
floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm512_
conj_ βpch Experimental avx512fp16
- Compute the complex conjugates of complex numbers in a, and store the results in dst. Each complex number
is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex
number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm512_
cvt_ βroundepi16_ ph Experimental avx512fp16
- Convert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm512_
cvt_ βroundepi32_ ph Experimental avx512fp16
- Convert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm512_
cvt_ βroundepi64_ ph Experimental avx512fp16
- Convert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm512_
cvt_ βroundepu16_ ph Experimental avx512fp16
- Convert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm512_
cvt_ βroundepu32_ ph Experimental avx512fp16
- Convert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm512_
cvt_ βroundepu64_ ph Experimental avx512fp16
- Convert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm512_
cvt_ βroundpd_ ph Experimental avx512fp16
- Convert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm512_
cvt_ βroundph_ epi16 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst.
- _mm512_
cvt_ βroundph_ epi32 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst.
- _mm512_
cvt_ βroundph_ epi64 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst.
- _mm512_
cvt_ βroundph_ epu16 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst.
- _mm512_
cvt_ βroundph_ epu32 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers, and store the results in dst.
- _mm512_
cvt_ βroundph_ epu64 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst.
- _mm512_
cvt_ βroundph_ pd Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst.
- _mm512_
cvtepi16_ βph Experimental avx512fp16
- Convert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm512_
cvtepi32_ βph Experimental avx512fp16
- Convert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm512_
cvtepi64_ βph Experimental avx512fp16
- Convert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm512_
cvtepu16_ βph Experimental avx512fp16
- Convert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm512_
cvtepu32_ βph Experimental avx512fp16
- Convert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm512_
cvtepu64_ βph Experimental avx512fp16
- Convert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm512_
cvtpd_ βph Experimental avx512fp16
- Convert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm512_
cvtph_ βepi16 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst.
- _mm512_
cvtph_ βepi32 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst.
- _mm512_
cvtph_ βepi64 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst.
- _mm512_
cvtph_ βepu16 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst.
- _mm512_
cvtph_ βepu32 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers, and store the results in dst.
- _mm512_
cvtph_ βepu64 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst.
- _mm512_
cvtph_ βpd Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst.
- _mm512_
cvtsh_ βh Experimental avx512fp16
- Copy the lower half-precision (16-bit) floating-point element from
a
todst
. - _mm512_
cvtt_ βroundph_ epi16 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst.
- _mm512_
cvtt_ βroundph_ epi32 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst.
- _mm512_
cvtt_ βroundph_ epi64 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst.
- _mm512_
cvtt_ βroundph_ epu16 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst.
- _mm512_
cvtt_ βroundph_ epu32 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst.
- _mm512_
cvtt_ βroundph_ epu64 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst.
- _mm512_
cvttph_ βepi16 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst.
- _mm512_
cvttph_ βepi32 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst.
- _mm512_
cvttph_ βepi64 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst.
- _mm512_
cvttph_ βepu16 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst.
- _mm512_
cvttph_ βepu32 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst.
- _mm512_
cvttph_ βepu64 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst.
- _mm512_
cvtx_ βroundph_ ps Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst.
- _mm512_
cvtx_ βroundps_ ph Experimental avx512fp16
- Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm512_
cvtxph_ βps Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst.
- _mm512_
cvtxps_ βph Experimental avx512fp16
- Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm512_
div_ βph Experimental avx512fp16
- Divide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst.
- _mm512_
div_ βround_ ph Experimental avx512fp16
- Divide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst. Rounding is done according to the rounding parameter, which can be one of:
- _mm512_
fcmadd_ βpch Experimental avx512fp16
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate
to the corresponding complex numbers in c, and store the results in dst. Each complex number is composed
of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm512_
fcmadd_ βround_ pch Experimental avx512fp16
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate
to the corresponding complex numbers in c, and store the results in dst. Each complex number is composed
of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm512_
fcmul_ βpch Experimental avx512fp16
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and
store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit)
floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm512_
fcmul_ βround_ pch Experimental avx512fp16
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and
store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit)
floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, - _mm512_
fmadd_ βpch Experimental avx512fp16
- Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c,
and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit)
floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm512_
fmadd_ βph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst.
- _mm512_
fmadd_ βround_ pch Experimental avx512fp16
- Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c,
and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit)
floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm512_
fmadd_ βround_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst.
- _mm512_
fmaddsub_ βph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst.
- _mm512_
fmaddsub_ βround_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst.
- _mm512_
fmsub_ βph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst.
- _mm512_
fmsub_ βround_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst.
- _mm512_
fmsubadd_ βph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst.
- _mm512_
fmsubadd_ βround_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst.
- _mm512_
fmul_ βpch Experimental avx512fp16
- Multiply packed complex numbers in a and b, and store the results in dst. Each complex number is composed
of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm512_
fmul_ βround_ pch Experimental avx512fp16
- Multiply packed complex numbers in a and b, and store the results in dst. Each complex number is composed
of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. Rounding is done according to the rounding parameter, which can be one of: - _mm512_
fnmadd_ βph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst.
- _mm512_
fnmadd_ βround_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst.
- _mm512_
fnmsub_ βph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst.
- _mm512_
fnmsub_ βround_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst.
- _mm512_
fpclass_ βph_ mask Experimental avx512fp16
- Test packed half-precision (16-bit) floating-point elements in a for special categories specified by imm8, and store the results in mask vector k. imm can be a combination of:
- _mm512_
getexp_ βph Experimental avx512fp16
- Convert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision
(16-bit) floating-point number representing the integer exponent, and store the results in dst.
This intrinsic essentially calculates
floor(log2(x))
for each element. - _mm512_
getexp_ βround_ ph Experimental avx512fp16
- Convert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision
(16-bit) floating-point number representing the integer exponent, and store the results in dst.
This intrinsic essentially calculates
floor(log2(x))
for each element. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter - _mm512_
getmant_ βph Experimental avx512fp16
- Normalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store
the results in dst. This intrinsic essentially calculates
Β±(2^k)*|x.significand|
, where k depends on the interval range defined by norm and the sign depends on sign and the source sign. - _mm512_
getmant_ βround_ ph Experimental avx512fp16
- Normalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store
the results in dst. This intrinsic essentially calculates
Β±(2^k)*|x.significand|
, where k depends on the interval range defined by norm and the sign depends on sign and the source sign. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter - _mm512_
load_ βph Experimental avx512fp16
- Load 512-bits (composed of 32 packed half-precision (16-bit) floating-point elements) from memory into a new vector. The address must be aligned to 64 bytes or a general-protection exception may be generated.
- _mm512_
loadu_ βph Experimental avx512fp16
- Load 512-bits (composed of 32 packed half-precision (16-bit) floating-point elements) from memory into a new vector. The address does not need to be aligned to any particular boundary.
- _mm512_
mask3_ βfcmadd_ pch Experimental avx512fp16
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate
to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is
copied from c when the corresponding mask bit is not set). Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm512_
mask3_ βfcmadd_ round_ pch Experimental avx512fp16
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate
to the corresponding complex numbers in c using writemask k (the element is copied from c when the corresponding
mask bit is not set), and store the results in dst. Each complex number is composed of two adjacent half-precision
(16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm512_
mask3_ βfmadd_ pch Experimental avx512fp16
- Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c,
and store the results in dst using writemask k (the element is copied from c when the corresponding
mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit)
floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm512_
mask3_ βfmadd_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm512_
mask3_ βfmadd_ round_ pch Experimental avx512fp16
- Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c,
and store the results in dst using writemask k (the element is copied from c when the corresponding
mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit)
floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm512_
mask3_ βfmadd_ round_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm512_
mask3_ βfmaddsub_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm512_
mask3_ βfmaddsub_ round_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm512_
mask3_ βfmsub_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm512_
mask3_ βfmsub_ round_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm512_
mask3_ βfmsubadd_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm512_
mask3_ βfmsubadd_ round_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm512_
mask3_ βfnmadd_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm512_
mask3_ βfnmadd_ round_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm512_
mask3_ βfnmsub_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm512_
mask3_ βfnmsub_ round_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm512_
mask_ βadd_ ph Experimental avx512fp16
- Add packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_
mask_ βadd_ round_ ph Experimental avx512fp16
- Add packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Rounding is done according to the rounding parameter, which can be one of:
- _mm512_
mask_ βblend_ ph Experimental avx512fp16
- Blend packed half-precision (16-bit) floating-point elements from a and b using control mask k, and store the results in dst.
- _mm512_
mask_ βcmp_ ph_ mask Experimental avx512fp16
- Compare packed half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
mask_ βcmp_ round_ ph_ mask Experimental avx512fp16
- Compare packed half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
mask_ βcmul_ pch Experimental avx512fp16
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and
store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set).
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which
defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm512_
mask_ βcmul_ round_ pch Experimental avx512fp16
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and
store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set).
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which
defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm512_
mask_ βconj_ pch Experimental avx512fp16
- Compute the complex conjugates of complex numbers in a, and store the results in dst using writemask k
(the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two
adjacent half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm512_
mask_ βcvt_ roundepi16_ ph Experimental avx512fp16
- Convert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm512_
mask_ βcvt_ roundepi32_ ph Experimental avx512fp16
- Convert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm512_
mask_ βcvt_ roundepi64_ ph Experimental avx512fp16
- Convert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm512_
mask_ βcvt_ roundepu16_ ph Experimental avx512fp16
- Convert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm512_
mask_ βcvt_ roundepu32_ ph Experimental avx512fp16
- Convert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm512_
mask_ βcvt_ roundepu64_ ph Experimental avx512fp16
- Convert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm512_
mask_ βcvt_ roundpd_ ph Experimental avx512fp16
- Convert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm512_
mask_ βcvt_ roundph_ epi16 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_
mask_ βcvt_ roundph_ epi32 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_
mask_ βcvt_ roundph_ epi64 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_
mask_ βcvt_ roundph_ epu16 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_
mask_ βcvt_ roundph_ epu32 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_
mask_ βcvt_ roundph_ epu64 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_
mask_ βcvt_ roundph_ pd Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm512_
mask_ βcvtepi16_ ph Experimental avx512fp16
- Convert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm512_
mask_ βcvtepi32_ ph Experimental avx512fp16
- Convert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm512_
mask_ βcvtepi64_ ph Experimental avx512fp16
- Convert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm512_
mask_ βcvtepu16_ ph Experimental avx512fp16
- Convert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm512_
mask_ βcvtepu32_ ph Experimental avx512fp16
- Convert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm512_
mask_ βcvtepu64_ ph Experimental avx512fp16
- Convert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm512_
mask_ βcvtpd_ ph Experimental avx512fp16
- Convert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm512_
mask_ βcvtph_ epi16 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_
mask_ βcvtph_ epi32 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_
mask_ βcvtph_ epi64 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_
mask_ βcvtph_ epu16 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_
mask_ βcvtph_ epu32 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_
mask_ βcvtph_ epu64 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_
mask_ βcvtph_ pd Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm512_
mask_ βcvtt_ roundph_ epi16 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_
mask_ βcvtt_ roundph_ epi32 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_
mask_ βcvtt_ roundph_ epi64 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_
mask_ βcvtt_ roundph_ epu16 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_
mask_ βcvtt_ roundph_ epu32 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_
mask_ βcvtt_ roundph_ epu64 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_
mask_ βcvttph_ epi16 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_
mask_ βcvttph_ epi32 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_
mask_ βcvttph_ epi64 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_
mask_ βcvttph_ epu16 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_
mask_ βcvttph_ epu32 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_
mask_ βcvttph_ epu64 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_
mask_ βcvtx_ roundph_ ps Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm512_
mask_ βcvtx_ roundps_ ph Experimental avx512fp16
- Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm512_
mask_ βcvtxph_ ps Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm512_
mask_ βcvtxps_ ph Experimental avx512fp16
- Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm512_
mask_ βdiv_ ph Experimental avx512fp16
- Divide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_
mask_ βdiv_ round_ ph Experimental avx512fp16
- Divide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Rounding is done according to the rounding parameter, which can be one of:
- _mm512_
mask_ βfcmadd_ pch Experimental avx512fp16
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate
to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is
copied from a when the corresponding mask bit is not set). Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm512_
mask_ βfcmadd_ round_ pch Experimental avx512fp16
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate
to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is
copied from a when the corresponding mask bit is not set). Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm512_
mask_ βfcmul_ pch Experimental avx512fp16
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and
store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set).
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which
defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm512_
mask_ βfcmul_ round_ pch Experimental avx512fp16
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and
store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set).
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which
defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm512_
mask_ βfmadd_ pch Experimental avx512fp16
- Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c,
and store the results in dst using writemask k (the element is copied from a when the corresponding mask
bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point
elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm512_
mask_ βfmadd_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm512_
mask_ βfmadd_ round_ pch Experimental avx512fp16
- Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c,
and store the results in dst using writemask k (the element is copied from a when the corresponding mask
bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point
elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm512_
mask_ βfmadd_ round_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm512_
mask_ βfmaddsub_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm512_
mask_ βfmaddsub_ round_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm512_
mask_ βfmsub_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm512_
mask_ βfmsub_ round_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm512_
mask_ βfmsubadd_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm512_
mask_ βfmsubadd_ round_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm512_
mask_ βfmul_ pch Experimental avx512fp16
- Multiply packed complex numbers in a and b, and store the results in dst using writemask k (the element
is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision
(16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm512_
mask_ βfmul_ round_ pch Experimental avx512fp16
- Multiply packed complex numbers in a and b, and store the results in dst using writemask k (the element
is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision
(16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. Rounding is done according to the rounding parameter, which can be one of: - _mm512_
mask_ βfnmadd_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm512_
mask_ βfnmadd_ round_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm512_
mask_ βfnmsub_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm512_
mask_ βfnmsub_ round_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm512_
mask_ βfpclass_ ph_ mask Experimental avx512fp16
- Test packed half-precision (16-bit) floating-point elements in a for special categories specified by imm8, and store the results in mask vector k using zeromask k (elements are zeroed out when the corresponding mask bit is not set). imm can be a combination of:
- _mm512_
mask_ βgetexp_ ph Experimental avx512fp16
- Convert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision
(16-bit) floating-point number representing the integer exponent, and store the results in dst using writemask k
(elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates
floor(log2(x))
for each element. - _mm512_
mask_ βgetexp_ round_ ph Experimental avx512fp16
- Convert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision
(16-bit) floating-point number representing the integer exponent, and store the results in dst using writemask k
(elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates
floor(log2(x))
for each element. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter - _mm512_
mask_ βgetmant_ ph Experimental avx512fp16
- Normalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store
the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
This intrinsic essentially calculates
Β±(2^k)*|x.significand|
, where k depends on the interval range defined by norm and the sign depends on sign and the source sign. - _mm512_
mask_ βgetmant_ round_ ph Experimental avx512fp16
- Normalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store
the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
This intrinsic essentially calculates
Β±(2^k)*|x.significand|
, where k depends on the interval range defined by norm and the sign depends on sign and the source sign. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter - _mm512_
mask_ βmax_ ph Experimental avx512fp16
- Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
- _mm512_
mask_ βmax_ round_ ph Experimental avx512fp16
- Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
- _mm512_
mask_ βmin_ ph Experimental avx512fp16
- Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
- _mm512_
mask_ βmin_ round_ ph Experimental avx512fp16
- Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
- _mm512_
mask_ βmul_ pch Experimental avx512fp16
- Multiply packed complex numbers in a and b, and store the results in dst using writemask k (the element
is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm512_
mask_ βmul_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_
mask_ βmul_ round_ pch Experimental avx512fp16
- Multiply the packed complex numbers in a and b, and store the results in dst using writemask k (the element
is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm512_
mask_ βmul_ round_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Rounding is done according to the rounding parameter, which can be one of:
- _mm512_
mask_ βrcp_ ph Experimental avx512fp16
- Compute the approximate reciprocal of packed 16-bit floating-point elements in
a
and stores the results indst
using writemaskk
(elements are copied fromsrc
when the corresponding mask bit is not set). The maximum relative error for this approximation is less than1.5*2^-12
. - _mm512_
mask_ βreduce_ ph Experimental avx512fp16
- Extract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_
mask_ βreduce_ round_ ph Experimental avx512fp16
- Extract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_
mask_ βroundscale_ ph Experimental avx512fp16
- Round packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_
mask_ βroundscale_ round_ ph Experimental avx512fp16
- Round packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
- _mm512_
mask_ βrsqrt_ ph Experimental avx512fp16
- Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point
elements in a, and store the results in dst using writemask k (elements are copied from src when
the corresponding mask bit is not set).
The maximum relative error for this approximation is less than
1.5*2^-12
. - _mm512_
mask_ βscalef_ ph Experimental avx512fp16
- Scale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_
mask_ βscalef_ round_ ph Experimental avx512fp16
- Scale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_
mask_ βsqrt_ ph Experimental avx512fp16
- Compute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_
mask_ βsqrt_ round_ ph Experimental avx512fp16
- Compute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Rounding is done according to the rounding parameter, which can be one of:
- _mm512_
mask_ βsub_ ph Experimental avx512fp16
- Subtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_
mask_ βsub_ round_ ph Experimental avx512fp16
- Subtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Rounding is done according to the rounding parameter, which can be one of:
- _mm512_
maskz_ βadd_ ph Experimental avx512fp16
- Add packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ βadd_ round_ ph Experimental avx512fp16
- Add packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Rounding is done according to the rounding parameter, which can be one of:
- _mm512_
maskz_ βcmul_ pch Experimental avx512fp16
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and
store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set).
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which
defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm512_
maskz_ βcmul_ round_ pch Experimental avx512fp16
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and
store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set).
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which
defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm512_
maskz_ βconj_ pch Experimental avx512fp16
- Compute the complex conjugates of complex numbers in a, and store the results in dst using zeromask k
(the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm512_
maskz_ βcvt_ roundepi16_ ph Experimental avx512fp16
- Convert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ βcvt_ roundepi32_ ph Experimental avx512fp16
- Convert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ βcvt_ roundepi64_ ph Experimental avx512fp16
- Convert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ βcvt_ roundepu16_ ph Experimental avx512fp16
- Convert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ βcvt_ roundepu32_ ph Experimental avx512fp16
- Convert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ βcvt_ roundepu64_ ph Experimental avx512fp16
- Convert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ βcvt_ roundpd_ ph Experimental avx512fp16
- Convert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ βcvt_ roundph_ epi16 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ βcvt_ roundph_ epi32 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ βcvt_ roundph_ epi64 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ βcvt_ roundph_ epu16 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ βcvt_ roundph_ epu32 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ βcvt_ roundph_ epu64 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ βcvt_ roundph_ pd Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ βcvtepi16_ ph Experimental avx512fp16
- Convert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ βcvtepi32_ ph Experimental avx512fp16
- Convert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ βcvtepi64_ ph Experimental avx512fp16
- Convert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ βcvtepu16_ ph Experimental avx512fp16
- Convert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ βcvtepu32_ ph Experimental avx512fp16
- Convert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ βcvtepu64_ ph Experimental avx512fp16
- Convert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ βcvtpd_ ph Experimental avx512fp16
- Convert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ βcvtph_ epi16 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ βcvtph_ epi32 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ βcvtph_ epi64 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ βcvtph_ epu16 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ βcvtph_ epu32 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ βcvtph_ epu64 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ βcvtph_ pd Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ βcvtt_ roundph_ epi16 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ βcvtt_ roundph_ epi32 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ βcvtt_ roundph_ epi64 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ βcvtt_ roundph_ epu16 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ βcvtt_ roundph_ epu32 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ βcvtt_ roundph_ epu64 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ βcvttph_ epi16 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ βcvttph_ epi32 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ βcvttph_ epi64 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ βcvttph_ epu16 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ βcvttph_ epu32 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ βcvttph_ epu64 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ βcvtx_ roundph_ ps Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ βcvtx_ roundps_ ph Experimental avx512fp16
- Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ βcvtxph_ ps Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ βcvtxps_ ph Experimental avx512fp16
- Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ βdiv_ ph Experimental avx512fp16
- Divide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ βdiv_ round_ ph Experimental avx512fp16
- Divide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Rounding is done according to the rounding parameter, which can be one of:
- _mm512_
maskz_ βfcmadd_ pch Experimental avx512fp16
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate
to the corresponding complex numbers in c, and store the results in dst using zeromask k (the element is
zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm512_
maskz_ βfcmadd_ round_ pch Experimental avx512fp16
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate
to the corresponding complex numbers in c using zeromask k (the element is zeroed out when the corresponding
mask bit is not set), and store the results in dst. Each complex number is composed of two adjacent half-precision
(16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm512_
maskz_ βfcmul_ pch Experimental avx512fp16
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and
store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set).
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which
defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm512_
maskz_ βfcmul_ round_ pch Experimental avx512fp16
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and
store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set).
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which
defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm512_
maskz_ βfmadd_ pch Experimental avx512fp16
- Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c,
and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask
bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point
elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm512_
maskz_ βfmadd_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ βfmadd_ round_ pch Experimental avx512fp16
- Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c,
and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask
bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point
elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm512_
maskz_ βfmadd_ round_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ βfmaddsub_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ βfmaddsub_ round_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ βfmsub_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ βfmsub_ round_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ βfmsubadd_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ βfmsubadd_ round_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ βfmul_ pch Experimental avx512fp16
- Multiply packed complex numbers in a and b, and store the results in dst using zeromask k (the element
is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision
(16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm512_
maskz_ βfmul_ round_ pch Experimental avx512fp16
- Multiply packed complex numbers in a and b, and store the results in dst using zeromask k (the element
is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision
(16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. Rounding is done according to the rounding parameter, which can be one of: - _mm512_
maskz_ βfnmadd_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ βfnmadd_ round_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ βfnmsub_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ βfnmsub_ round_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ βgetexp_ ph Experimental avx512fp16
- Convert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision
(16-bit) floating-point number representing the integer exponent, and store the results in dst using zeromask
k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates
floor(log2(x))
for each element. - _mm512_
maskz_ βgetexp_ round_ ph Experimental avx512fp16
- Convert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision
(16-bit) floating-point number representing the integer exponent, and store the results in dst using zeromask
k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates
floor(log2(x))
for each element. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter - _mm512_
maskz_ βgetmant_ ph Experimental avx512fp16
- Normalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store
the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
This intrinsic essentially calculates
Β±(2^k)*|x.significand|
, where k depends on the interval range defined by norm and the sign depends on sign and the source sign. - _mm512_
maskz_ βgetmant_ round_ ph Experimental avx512fp16
- Normalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store
the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
This intrinsic essentially calculates
Β±(2^k)*|x.significand|
, where k depends on the interval range defined by norm and the sign depends on sign and the source sign. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter - _mm512_
maskz_ βmax_ ph Experimental avx512fp16
- Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
- _mm512_
maskz_ βmax_ round_ ph Experimental avx512fp16
- Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
- _mm512_
maskz_ βmin_ ph Experimental avx512fp16
- Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
- _mm512_
maskz_ βmin_ round_ ph Experimental avx512fp16
- Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
- _mm512_
maskz_ βmul_ pch Experimental avx512fp16
- Multiply packed complex numbers in a and b, and store the results in dst using zeromask k (the element
is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm512_
maskz_ βmul_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ βmul_ round_ pch Experimental avx512fp16
- Multiply the packed complex numbers in a and b, and store the results in dst using zeromask k (the element
is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm512_
maskz_ βmul_ round_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Rounding is done according to the rounding parameter, which can be one of:
- _mm512_
maskz_ βrcp_ ph Experimental avx512fp16
- Compute the approximate reciprocal of packed 16-bit floating-point elements in
a
and stores the results indst
using zeromaskk
(elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than1.5*2^-12
. - _mm512_
maskz_ βreduce_ ph Experimental avx512fp16
- Extract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ βreduce_ round_ ph Experimental avx512fp16
- Extract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ βroundscale_ ph Experimental avx512fp16
- Round packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ βroundscale_ round_ ph Experimental avx512fp16
- Round packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
- _mm512_
maskz_ βrsqrt_ ph Experimental avx512fp16
- Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point
elements in a, and store the results in dst using zeromask k (elements are zeroed out when the
corresponding mask bit is not set).
The maximum relative error for this approximation is less than
1.5*2^-12
. - _mm512_
maskz_ βscalef_ ph Experimental avx512fp16
- Scale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ βscalef_ round_ ph Experimental avx512fp16
- Scale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ βsqrt_ ph Experimental avx512fp16
- Compute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ βsqrt_ round_ ph Experimental avx512fp16
- Compute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Rounding is done according to the rounding parameter, which can be one of:
- _mm512_
maskz_ βsub_ ph Experimental avx512fp16
- Subtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ βsub_ round_ ph Experimental avx512fp16
- Subtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Rounding is done according to the rounding parameter, which can be one of:
- _mm512_
max_ βph Experimental avx512fp16
- Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
- _mm512_
max_ βround_ ph Experimental avx512fp16
- Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
- _mm512_
min_ βph Experimental avx512fp16
- Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
- _mm512_
min_ βround_ ph Experimental avx512fp16
- Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
- _mm512_
mul_ βpch Experimental avx512fp16
- Multiply packed complex numbers in a and b, and store the results in dst. Each complex number is
composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex
number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm512_
mul_ βph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst.
- _mm512_
mul_ βround_ pch Experimental avx512fp16
- Multiply the packed complex numbers in a and b, and store the results in dst. Each complex number is
composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex
number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm512_
mul_ βround_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst. Rounding is done according to the rounding parameter, which can be one of:
- _mm512_
permutex2var_ βph Experimental avx512fp16
- Shuffle half-precision (16-bit) floating-point elements in a and b using the corresponding selector and index in idx, and store the results in dst.
- _mm512_
permutexvar_ βph Experimental avx512fp16
- Shuffle half-precision (16-bit) floating-point elements in a using the corresponding index in idx, and store the results in dst.
- _mm512_
rcp_ βph Experimental avx512fp16
- Compute the approximate reciprocal of packed 16-bit floating-point elements in
a
and stores the results indst
. The maximum relative error for this approximation is less than1.5*2^-12
. - _mm512_
reduce_ βadd_ ph Experimental avx512fp16
- Reduce the packed half-precision (16-bit) floating-point elements in a by addition. Returns the sum of all elements in a.
- _mm512_
reduce_ βmax_ ph Experimental avx512fp16
- Reduce the packed half-precision (16-bit) floating-point elements in a by maximum. Returns the maximum of all elements in a.
- _mm512_
reduce_ βmin_ ph Experimental avx512fp16
- Reduce the packed half-precision (16-bit) floating-point elements in a by minimum. Returns the minimum of all elements in a.
- _mm512_
reduce_ βmul_ ph Experimental avx512fp16
- Reduce the packed half-precision (16-bit) floating-point elements in a by multiplication. Returns the product of all elements in a.
- _mm512_
reduce_ βph Experimental avx512fp16
- Extract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst.
- _mm512_
reduce_ βround_ ph Experimental avx512fp16
- Extract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst.
- _mm512_
roundscale_ βph Experimental avx512fp16
- Round packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst.
- _mm512_
roundscale_ βround_ ph Experimental avx512fp16
- Round packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
- _mm512_
rsqrt_ βph Experimental avx512fp16
- Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point
elements in a, and store the results in dst.
The maximum relative error for this approximation is less than
1.5*2^-12
. - _mm512_
scalef_ βph Experimental avx512fp16
- Scale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst.
- _mm512_
scalef_ βround_ ph Experimental avx512fp16
- Scale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst.
- _mm512_
set1_ βph Experimental avx512fp16
- Broadcast the half-precision (16-bit) floating-point value a to all elements of dst.
- _mm512_
set_ βph Experimental avx512fp16
- Set packed half-precision (16-bit) floating-point elements in dst with the supplied values.
- _mm512_
setr_ βph Experimental avx512fp16
- Set packed half-precision (16-bit) floating-point elements in dst with the supplied values in reverse order.
- _mm512_
setzero_ βph Experimental avx512fp16
- Return vector of type __m512h with all elements set to zero.
- _mm512_
sqrt_ βph Experimental avx512fp16
- Compute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst.
- _mm512_
sqrt_ βround_ ph Experimental avx512fp16
- Compute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst. Rounding is done according to the rounding parameter, which can be one of:
- _mm512_
store_ βph Experimental avx512fp16
- Store 512-bits (composed of 32 packed half-precision (16-bit) floating-point elements) from a into memory. The address must be aligned to 64 bytes or a general-protection exception may be generated.
- _mm512_
storeu_ βph Experimental avx512fp16
- Store 512-bits (composed of 32 packed half-precision (16-bit) floating-point elements) from a into memory. The address does not need to be aligned to any particular boundary.
- _mm512_
sub_ βph Experimental avx512fp16
- Subtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst.
- _mm512_
sub_ βround_ ph Experimental avx512fp16
- Subtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst. Rounding is done according to the rounding parameter, which can be one of:
- _mm512_
undefined_ βph Experimental avx512fp16
- Return vector of type
__m512h
with indetermination elements. Despite using the word βundefinedβ (following Intelβs naming scheme), this non-deterministically picks some valid value and is not equivalent tomem::MaybeUninit
. In practice, this is typically equivalent tomem::zeroed
. - _mm512_
zextph128_ βph512 Experimental avx512fp16
- Cast vector of type
__m128h
to type__m512h
. The upper 24 elements of the result are zeroed. This intrinsic can generate thevzeroupper
instruction, but most of the time it does not generate any instructions. - _mm512_
zextph256_ βph512 Experimental avx512fp16
- Cast vector of type
__m256h
to type__m512h
. The upper 16 elements of the result are zeroed. This intrinsic can generate thevzeroupper
instruction, but most of the time it does not generate any instructions. - _mm_
abs_ βph Experimental avx512fp16,avx512vl
- Finds the absolute value of each packed half-precision (16-bit) floating-point element in v2, storing the results in dst.
- _mm_
add_ βph Experimental avx512fp16,avx512vl
- Add packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst.
- _mm_
add_ βround_ sh Experimental avx512fp16
- Add the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. Rounding is done according to the rounding parameter, which can be one of:
- _mm_
add_ βsh Experimental avx512fp16
- Add the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
castpd_ βph Experimental avx512fp16
- Cast vector of type
__m128d
to type__m128h
. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. - _mm_
castph_ βpd Experimental avx512fp16
- Cast vector of type
__m128h
to type__m128d
. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. - _mm_
castph_ βps Experimental avx512fp16
- Cast vector of type
__m128h
to type__m128
. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. - _mm_
castph_ βsi128 Experimental avx512fp16
- Cast vector of type
__m128h
to type__m128i
. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. - _mm_
castps_ βph Experimental avx512fp16
- Cast vector of type
__m128
to type__m128h
. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. - _mm_
castsi128_ βph Experimental avx512fp16
- Cast vector of type
__m128i
to type__m128h
. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. - _mm_
cmp_ βph_ mask Experimental avx512fp16,avx512vl
- Compare packed half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
- _mm_
cmp_ βround_ sh_ mask Experimental avx512fp16
- Compare the lower half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the result in mask vector k. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_
cmp_ βsh_ mask Experimental avx512fp16
- Compare the lower half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the result in mask vector k.
- _mm_
cmul_ βpch Experimental avx512fp16,avx512vl
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and
store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit)
floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm_
cmul_ βround_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b,
and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit)
floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, - _mm_
cmul_ βsch Experimental avx512fp16
- Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b,
and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit)
floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, - _mm_
comi_ βround_ sh Experimental avx512fp16
- Compare the lower half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and return the boolean result (0 or 1). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_
comi_ βsh Experimental avx512fp16
- Compare the lower half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and return the boolean result (0 or 1).
- _mm_
comieq_ βsh Experimental avx512fp16
- Compare the lower half-precision (16-bit) floating-point elements in a and b for equality, and return the boolean result (0 or 1).
- _mm_
comige_ βsh Experimental avx512fp16
- Compare the lower half-precision (16-bit) floating-point elements in a and b for greater-than-or-equal, and return the boolean result (0 or 1).
- _mm_
comigt_ βsh Experimental avx512fp16
- Compare the lower half-precision (16-bit) floating-point elements in a and b for greater-than, and return the boolean result (0 or 1).
- _mm_
comile_ βsh Experimental avx512fp16
- Compare the lower half-precision (16-bit) floating-point elements in a and b for less-than-or-equal, and return the boolean result (0 or 1).
- _mm_
comilt_ βsh Experimental avx512fp16
- Compare the lower half-precision (16-bit) floating-point elements in a and b for less-than, and return the boolean result (0 or 1).
- _mm_
comineq_ βsh Experimental avx512fp16
- Compare the lower half-precision (16-bit) floating-point elements in a and b for not-equal, and return the boolean result (0 or 1).
- _mm_
conj_ βpch Experimental avx512fp16,avx512vl
- Compute the complex conjugates of complex numbers in a, and store the results in dst. Each complex
number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines
the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm_
cvt_ βroundi32_ sh Experimental avx512fp16
- Convert the signed 32-bit integer b to a half-precision (16-bit) floating-point element, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
cvt_ βroundsd_ sh Experimental avx512fp16
- Convert the lower double-precision (64-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
cvt_ βroundsh_ i32 Experimental avx512fp16
- Convert the lower half-precision (16-bit) floating-point element in a to a 32-bit integer, and store the result in dst.
- _mm_
cvt_ βroundsh_ sd Experimental avx512fp16
- Convert the lower half-precision (16-bit) floating-point element in b to a double-precision (64-bit) floating-point element, store the result in the lower element of dst, and copy the upper element from a to the upper element of dst.
- _mm_
cvt_ βroundsh_ ss Experimental avx512fp16
- Convert the lower half-precision (16-bit) floating-point element in b to a single-precision (32-bit) floating-point element, store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_
cvt_ βroundsh_ u32 Experimental avx512fp16
- Convert the lower half-precision (16-bit) floating-point element in a to a 32-bit unsigned integer, and store the result in dst.
- _mm_
cvt_ βroundss_ sh Experimental avx512fp16
- Convert the lower single-precision (32-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
cvt_ βroundu32_ sh Experimental avx512fp16
- Convert the unsigned 32-bit integer b to a half-precision (16-bit) floating-point element, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
cvtepi16_ βph Experimental avx512fp16,avx512vl
- Convert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm_
cvtepi32_ βph Experimental avx512fp16,avx512vl
- Convert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst. The upper 64 bits of dst are zeroed out.
- _mm_
cvtepi64_ βph Experimental avx512fp16,avx512vl
- Convert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst. The upper 96 bits of dst are zeroed out.
- _mm_
cvtepu16_ βph Experimental avx512fp16,avx512vl
- Convert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm_
cvtepu32_ βph Experimental avx512fp16,avx512vl
- Convert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst. The upper 64 bits of dst are zeroed out.
- _mm_
cvtepu64_ βph Experimental avx512fp16,avx512vl
- Convert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst. The upper 96 bits of dst are zeroed out.
- _mm_
cvti32_ βsh Experimental avx512fp16
- Convert the signed 32-bit integer b to a half-precision (16-bit) floating-point element, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
cvtpd_ βph Experimental avx512fp16,avx512vl
- Convert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst. The upper 96 bits of dst are zeroed out.
- _mm_
cvtph_ βepi16 Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst.
- _mm_
cvtph_ βepi32 Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst.
- _mm_
cvtph_ βepi64 Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst.
- _mm_
cvtph_ βepu16 Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst.
- _mm_
cvtph_ βepu32 Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst.
- _mm_
cvtph_ βepu64 Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst.
- _mm_
cvtph_ βpd Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst.
- _mm_
cvtsd_ βsh Experimental avx512fp16
- Convert the lower double-precision (64-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
cvtsh_ βh Experimental avx512fp16
- Copy the lower half-precision (16-bit) floating-point element from
a
todst
. - _mm_
cvtsh_ βi32 Experimental avx512fp16
- Convert the lower half-precision (16-bit) floating-point element in a to a 32-bit integer, and store the result in dst.
- _mm_
cvtsh_ βsd Experimental avx512fp16
- Convert the lower half-precision (16-bit) floating-point element in b to a double-precision (64-bit) floating-point element, store the result in the lower element of dst, and copy the upper element from a to the upper element of dst.
- _mm_
cvtsh_ βss Experimental avx512fp16
- Convert the lower half-precision (16-bit) floating-point element in b to a single-precision (32-bit) floating-point element, store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_
cvtsh_ βu32 Experimental avx512fp16
- Convert the lower half-precision (16-bit) floating-point element in a to a 32-bit unsigned integer, and store the result in dst.
- _mm_
cvtsi16_ βsi128 Experimental avx512fp16
- Copy 16-bit integer a to the lower elements of dst, and zero the upper elements of dst.
- _mm_
cvtsi128_ βsi16 Experimental avx512fp16
- Copy the lower 16-bit integer in a to dst.
- _mm_
cvtss_ βsh Experimental avx512fp16
- Convert the lower single-precision (32-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
cvtt_ βroundsh_ i32 Experimental avx512fp16
- Convert the lower half-precision (16-bit) floating-point element in a to a 32-bit integer with truncation, and store the result in dst.
- _mm_
cvtt_ βroundsh_ u32 Experimental avx512fp16
- Convert the lower half-precision (16-bit) floating-point element in a to a 32-bit unsigned integer with truncation, and store the result in dst.
- _mm_
cvttph_ βepi16 Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst.
- _mm_
cvttph_ βepi32 Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst.
- _mm_
cvttph_ βepi64 Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst.
- _mm_
cvttph_ βepu16 Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst.
- _mm_
cvttph_ βepu32 Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst.
- _mm_
cvttph_ βepu64 Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst.
- _mm_
cvttsh_ βi32 Experimental avx512fp16
- Convert the lower half-precision (16-bit) floating-point element in a to a 32-bit integer with truncation, and store the result in dst.
- _mm_
cvttsh_ βu32 Experimental avx512fp16
- Convert the lower half-precision (16-bit) floating-point element in a to a 32-bit unsigned integer with truncation, and store the result in dst.
- _mm_
cvtu32_ βsh Experimental avx512fp16
- Convert the unsigned 32-bit integer b to a half-precision (16-bit) floating-point element, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
cvtxph_ βps Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst.
- _mm_
cvtxps_ βph Experimental avx512fp16,avx512vl
- Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm_
div_ βph Experimental avx512fp16,avx512vl
- Divide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst.
- _mm_
div_ βround_ sh Experimental avx512fp16
- Divide the lower half-precision (16-bit) floating-point elements in a by b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. Rounding is done according to the rounding parameter, which can be one of:
- _mm_
div_ βsh Experimental avx512fp16
- Divide the lower half-precision (16-bit) floating-point elements in a by b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
fcmadd_ βpch Experimental avx512fp16,avx512vl
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate
to the corresponding complex numbers in c, and store the results in dst. Each complex number is composed
of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm_
fcmadd_ βround_ sch Experimental avx512fp16
- Multiply the lower complex number in a by the complex conjugate of the lower complex number in b,
accumulate to the lower complex number in c, and store the result in the lower elements of dst,
and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is
composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex
number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm_
fcmadd_ βsch Experimental avx512fp16
- Multiply the lower complex number in a by the complex conjugate of the lower complex number in b,
accumulate to the lower complex number in c, and store the result in the lower elements of dst,
and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is
composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex
number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm_
fcmul_ βpch Experimental avx512fp16,avx512vl
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and
store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit)
floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm_
fcmul_ βround_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b,
and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit)
floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, - _mm_
fcmul_ βsch Experimental avx512fp16
- Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b,
and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit)
floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm_
fmadd_ βpch Experimental avx512fp16,avx512vl
- Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c,
and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit)
floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm_
fmadd_ βph Experimental avx512fp16,avx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst.
- _mm_
fmadd_ βround_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a and b, accumulate to the lower complex number in c, and
store the result in the lower elements of dst. Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm_
fmadd_ βround_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
fmadd_ βsch Experimental avx512fp16
- Multiply the lower complex numbers in a and b, accumulate to the lower complex number in c, and
store the result in the lower elements of dst, and copy the upper 6 packed elements from a to the
upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit)
floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm_
fmadd_ βsh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
fmaddsub_ βph Experimental avx512fp16,avx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst.
- _mm_
fmsub_ βph Experimental avx512fp16,avx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst.
- _mm_
fmsub_ βround_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract packed elements in c from the intermediate result. Store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
fmsub_ βsh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract packed elements in c from the intermediate result. Store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
fmsubadd_ βph Experimental avx512fp16,avx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst.
- _mm_
fmul_ βpch Experimental avx512fp16,avx512vl
- Multiply packed complex numbers in a and b, and store the results in dst. Each complex number is
composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex
number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm_
fmul_ βround_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a and b, and store the results in dst. Each complex number is composed
of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm_
fmul_ βsch Experimental avx512fp16
- Multiply the lower complex numbers in a and b, and store the results in dst. Each complex number is
composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex
number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm_
fnmadd_ βph Experimental avx512fp16,avx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst.
- _mm_
fnmadd_ βround_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
fnmadd_ βsh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
fnmsub_ βph Experimental avx512fp16,avx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst.
- _mm_
fnmsub_ βround_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
fnmsub_ βsh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
fpclass_ βph_ mask Experimental avx512fp16,avx512vl
- Test packed half-precision (16-bit) floating-point elements in a for special categories specified by imm8, and store the results in mask vector k. imm can be a combination of:
- _mm_
fpclass_ βsh_ mask Experimental avx512fp16
- Test the lower half-precision (16-bit) floating-point element in a for special categories specified by imm8, and store the result in mask vector k. imm can be a combination of:
- _mm_
getexp_ βph Experimental avx512fp16,avx512vl
- Convert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision
(16-bit) floating-point number representing the integer exponent, and store the results in dst.
This intrinsic essentially calculates
floor(log2(x))
for each element. - _mm_
getexp_ βround_ sh Experimental avx512fp16
- Convert the exponent of the lower half-precision (16-bit) floating-point element in b to a half-precision
(16-bit) floating-point number representing the integer exponent, store the result in the lower element
of dst, and copy the upper 7 packed elements from a to the upper elements of dst. This intrinsic essentially
calculates
floor(log2(x))
for the lower element. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter - _mm_
getexp_ βsh Experimental avx512fp16
- Convert the exponent of the lower half-precision (16-bit) floating-point element in b to a half-precision
(16-bit) floating-point number representing the integer exponent, store the result in the lower element
of dst, and copy the upper 7 packed elements from a to the upper elements of dst. This intrinsic essentially
calculates
floor(log2(x))
for the lower element. - _mm_
getmant_ βph Experimental avx512fp16,avx512vl
- Normalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store
the results in dst. This intrinsic essentially calculates
Β±(2^k)*|x.significand|
, where k depends on the interval range defined by norm and the sign depends on sign and the source sign. - _mm_
getmant_ βround_ sh Experimental avx512fp16
- Normalize the mantissas of the lower half-precision (16-bit) floating-point element in b, store
the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper
elements of dst. This intrinsic essentially calculates
Β±(2^k)*|x.significand|
, where k depends on the interval range defined by norm and the sign depends on sign and the source sign. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter - _mm_
getmant_ βsh Experimental avx512fp16
- Normalize the mantissas of the lower half-precision (16-bit) floating-point element in b, store
the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper
elements of dst. This intrinsic essentially calculates
Β±(2^k)*|x.significand|
, where k depends on the interval range defined by norm and the sign depends on sign and the source sign. - _mm_
load_ βph Experimental avx512fp16,avx512vl
- Load 128-bits (composed of 8 packed half-precision (16-bit) floating-point elements) from memory into a new vector. The address must be aligned to 16 bytes or a general-protection exception may be generated.
- _mm_
load_ βsh Experimental avx512fp16
- Load a half-precision (16-bit) floating-point element from memory into the lower element of a new vector, and zero the upper elements
- _mm_
loadu_ βph Experimental avx512fp16,avx512vl
- Load 128-bits (composed of 8 packed half-precision (16-bit) floating-point elements) from memory into a new vector. The address does not need to be aligned to any particular boundary.
- _mm_
mask3_ βfcmadd_ pch Experimental avx512fp16,avx512vl
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate
to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is
copied from c when the corresponding mask bit is not set). Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm_
mask3_ βfcmadd_ round_ sch Experimental avx512fp16
- Multiply the lower complex number in a by the complex conjugate of the lower complex number in b,
accumulate to the lower complex number in c, and store the result in the lower elements of dst using
writemask k (the element is copied from c when the corresponding mask bit is not set), and copy the upper
6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm_
mask3_ βfcmadd_ sch Experimental avx512fp16
- Multiply the lower complex number in a by the complex conjugate of the lower complex number in b,
accumulate to the lower complex number in c, and store the result in the lower elements of dst using
writemask k (the element is copied from c when the corresponding mask bit is not set), and copy the upper
6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm_
mask3_ βfmadd_ pch Experimental avx512fp16,avx512vl
- Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c,
and store the results in dst using writemask k (the element is copied from c when the corresponding
mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit)
floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm_
mask3_ βfmadd_ ph Experimental avx512fp16,avx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm_
mask3_ βfmadd_ round_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a and b, accumulate to the lower complex number in c, and
store the result in the lower elements of dst using writemask k (elements are copied from c when
mask bit 0 is not set), and copy the upper 6 packed elements from a to the upper elements of dst.
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements,
which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm_
mask3_ βfmadd_ round_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from c when the mask bit 0 is not set), and copy the upper 7 packed elements from c to the upper elements of dst.
- _mm_
mask3_ βfmadd_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a and b, accumulate to the lower complex number in c, and
store the result in the lower elements of dst using writemask k (elements are copied from c when
mask bit 0 is not set), and copy the upper 6 packed elements from a to the upper elements of dst.
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements,
which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm_
mask3_ βfmadd_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from c when the mask bit 0 is not set), and copy the upper 7 packed elements from c to the upper elements of dst.
- _mm_
mask3_ βfmaddsub_ ph Experimental avx512fp16,avx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm_
mask3_ βfmsub_ ph Experimental avx512fp16,avx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm_
mask3_ βfmsub_ round_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract packed elements in c from the intermediate result. Store the result in the lower element of dst using writemask k (the element is copied from c when the mask bit 0 is not set), and copy the upper 7 packed elements from c to the upper elements of dst.
- _mm_
mask3_ βfmsub_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract packed elements in c from the intermediate result. Store the result in the lower element of dst using writemask k (the element is copied from c when the mask bit 0 is not set), and copy the upper 7 packed elements from c to the upper elements of dst.
- _mm_
mask3_ βfmsubadd_ ph Experimental avx512fp16,avx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm_
mask3_ βfnmadd_ ph Experimental avx512fp16,avx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm_
mask3_ βfnmadd_ round_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from c when the mask bit 0 is not set), and copy the upper 7 packed elements from c to the upper elements of dst.
- _mm_
mask3_ βfnmadd_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from c when the mask bit 0 is not set), and copy the upper 7 packed elements from c to the upper elements of dst.
- _mm_
mask3_ βfnmsub_ ph Experimental avx512fp16,avx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm_
mask3_ βfnmsub_ round_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from c when the mask bit 0 is not set), and copy the upper 7 packed elements from c to the upper elements of dst.
- _mm_
mask3_ βfnmsub_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from c when the mask bit 0 is not set), and copy the upper 7 packed elements from c to the upper elements of dst.
- _mm_
mask_ βadd_ ph Experimental avx512fp16,avx512vl
- Add packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_
mask_ βadd_ round_ sh Experimental avx512fp16
- Add the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using writemask k (the element is copied from src when mask bit 0 is not set). Rounding is done according to the rounding parameter, which can be one of:
- _mm_
mask_ βadd_ sh Experimental avx512fp16
- Add the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using writemask k (the element is copied from src when mask bit 0 is not set).
- _mm_
mask_ βblend_ ph Experimental avx512fp16,avx512vl
- Blend packed half-precision (16-bit) floating-point elements from a and b using control mask k, and store the results in dst.
- _mm_
mask_ βcmp_ ph_ mask Experimental avx512fp16,avx512vl
- Compare packed half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_
mask_ βcmp_ round_ sh_ mask Experimental avx512fp16
- Compare the lower half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the result in mask vector k using zeromask k1. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_
mask_ βcmp_ sh_ mask Experimental avx512fp16
- Compare the lower half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the result in mask vector k using zeromask k1.
- _mm_
mask_ βcmul_ pch Experimental avx512fp16,avx512vl
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and
store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set).
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which
defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm_
mask_ βcmul_ round_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b,
and store the results in dst using writemask k (the element is copied from src when mask bit 0 is not set).
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which
defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm_
mask_ βcmul_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b,
and store the results in dst using writemask k (the element is copied from src when mask bit 0 is not set).
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which
defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, - _mm_
mask_ βconj_ pch Experimental avx512fp16,avx512vl
- Compute the complex conjugates of complex numbers in a, and store the results in dst using writemask k
(the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two
adjacent half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm_
mask_ βcvt_ roundsd_ sh Experimental avx512fp16
- Convert the lower double-precision (64-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst using writemask k (the element if copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
mask_ βcvt_ roundsh_ sd Experimental avx512fp16
- Convert the lower half-precision (16-bit) floating-point element in b to a double-precision (64-bit) floating-point element, store the result in the lower element of dst using writemask k (the element is copied from src to dst when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
- _mm_
mask_ βcvt_ roundsh_ ss Experimental avx512fp16
- Convert the lower half-precision (16-bit) floating-point element in b to a single-precision (32-bit) floating-point element, store the result in the lower element of dst using writemask k (the element is copied from src to dst when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_
mask_ βcvt_ roundss_ sh Experimental avx512fp16
- Convert the lower single-precision (32-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst using writemask k (the element if copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
mask_ βcvtepi16_ ph Experimental avx512fp16,avx512vl
- Convert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm_
mask_ βcvtepi32_ ph Experimental avx512fp16,avx512vl
- Convert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
- _mm_
mask_ βcvtepi64_ ph Experimental avx512fp16,avx512vl
- Convert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set). The upper 96 bits of dst are zeroed out.
- _mm_
mask_ βcvtepu16_ ph Experimental avx512fp16,avx512vl
- Convert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm_
mask_ βcvtepu32_ ph Experimental avx512fp16,avx512vl
- Convert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
- _mm_
mask_ βcvtepu64_ ph Experimental avx512fp16,avx512vl
- Convert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set). The upper 96 bits of dst are zeroed out.
- _mm_
mask_ βcvtpd_ ph Experimental avx512fp16,avx512vl
- Convert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set). The upper 96 bits of dst are zeroed out.
- _mm_
mask_ βcvtph_ epi16 Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_
mask_ βcvtph_ epi32 Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_
mask_ βcvtph_ epi64 Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_
mask_ βcvtph_ epu16 Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_
mask_ βcvtph_ epu32 Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_
mask_ βcvtph_ epu64 Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_
mask_ βcvtph_ pd Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm_
mask_ βcvtsd_ sh Experimental avx512fp16
- Convert the lower double-precision (64-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst using writemask k (the element if copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
mask_ βcvtsh_ sd Experimental avx512fp16
- Convert the lower half-precision (16-bit) floating-point element in b to a double-precision (64-bit) floating-point element, store the result in the lower element of dst using writemask k (the element is copied from src to dst when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
- _mm_
mask_ βcvtsh_ ss Experimental avx512fp16
- Convert the lower half-precision (16-bit) floating-point element in b to a single-precision (32-bit) floating-point element, store the result in the lower element of dst using writemask k (the element is copied from src to dst when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_
mask_ βcvtss_ sh Experimental avx512fp16
- Convert the lower single-precision (32-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst using writemask k (the element if copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
mask_ βcvttph_ epi16 Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_
mask_ βcvttph_ epi32 Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_
mask_ βcvttph_ epi64 Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_
mask_ βcvttph_ epu16 Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_
mask_ βcvttph_ epu32 Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_
mask_ βcvttph_ epu64 Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_
mask_ βcvtxph_ ps Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm_
mask_ βcvtxps_ ph Experimental avx512fp16,avx512vl
- Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
- _mm_
mask_ βdiv_ ph Experimental avx512fp16,avx512vl
- Divide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_
mask_ βdiv_ round_ sh Experimental avx512fp16
- Divide the lower half-precision (16-bit) floating-point elements in a by b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using writemask k (the element is copied from src when mask bit 0 is not set). Rounding is done according to the rounding parameter, which can be one of:
- _mm_
mask_ βdiv_ sh Experimental avx512fp16
- Divide the lower half-precision (16-bit) floating-point elements in a by b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using writemask k (the element is copied from src when mask bit 0 is not set).
- _mm_
mask_ βfcmadd_ pch Experimental avx512fp16,avx512vl
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate
to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is
copied from a when the corresponding mask bit is not set). Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm_
mask_ βfcmadd_ round_ sch Experimental avx512fp16
- Multiply the lower complex number in a by the complex conjugate of the lower complex number in b,
accumulate to the lower complex number in c, and store the result in the lower elements of dst using
writemask k (the element is copied from a when the corresponding mask bit is not set), and copy the upper
6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm_
mask_ βfcmadd_ sch Experimental avx512fp16
- Multiply the lower complex number in a by the complex conjugate of the lower complex number in b,
accumulate to the lower complex number in c, and store the result in the lower elements of dst using
writemask k (the element is copied from a when the corresponding mask bit is not set), and copy the upper
6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm_
mask_ βfcmul_ pch Experimental avx512fp16,avx512vl
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and
store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set).
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which
defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm_
mask_ βfcmul_ round_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b,
and store the results in dst using writemask k (the element is copied from src when mask bit 0 is not set).
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which
defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm_
mask_ βfcmul_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b,
and store the results in dst using writemask k (the element is copied from src when mask bit 0 is not set).
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which
defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm_
mask_ βfmadd_ pch Experimental avx512fp16,avx512vl
- Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c,
and store the results in dst using writemask k (the element is copied from a when the corresponding
mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit)
floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm_
mask_ βfmadd_ ph Experimental avx512fp16,avx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm_
mask_ βfmadd_ round_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a and b, accumulate to the lower complex number in c, and
store the result in the lower elements of dst using writemask k (elements are copied from a when
mask bit 0 is not set), and copy the upper 6 packed elements from a to the upper elements of dst.
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements,
which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm_
mask_ βfmadd_ round_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from a when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
mask_ βfmadd_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a and b, accumulate to the lower complex number in c, and
store the result in the lower elements of dst using writemask k (elements are copied from a when
mask bit 0 is not set), and copy the upper 6 packed elements from a to the upper elements of dst.
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements,
which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm_
mask_ βfmadd_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from a when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
mask_ βfmaddsub_ ph Experimental avx512fp16,avx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm_
mask_ βfmsub_ ph Experimental avx512fp16,avx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm_
mask_ βfmsub_ round_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract packed elements in c from the intermediate result. Store the result in the lower element of dst using writemask k (the element is copied from a when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
mask_ βfmsub_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract packed elements in c from the intermediate result. Store the result in the lower element of dst using writemask k (the element is copied from a when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
mask_ βfmsubadd_ ph Experimental avx512fp16,avx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm_
mask_ βfmul_ pch Experimental avx512fp16,avx512vl
- Multiply packed complex numbers in a and b, and store the results in dst using writemask k (the element
is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm_
mask_ βfmul_ round_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a and b, and store the results in dst using writemask k (the element
is copied from src when mask bit 0 is not set). Each complex number is composed of two adjacent half-precision
(16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm_
mask_ βfmul_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a and b, and store the results in dst using writemask k (the element
is copied from src when mask bit 0 is not set). Each complex number is composed of two adjacent half-precision
(16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm_
mask_ βfnmadd_ ph Experimental avx512fp16,avx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm_
mask_ βfnmadd_ round_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from a when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
mask_ βfnmadd_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from a when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
mask_ βfnmsub_ ph Experimental avx512fp16,avx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm_
mask_ βfnmsub_ round_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from a when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
mask_ βfnmsub_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from a when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
mask_ βfpclass_ ph_ mask Experimental avx512fp16,avx512vl
- Test packed half-precision (16-bit) floating-point elements in a for special categories specified by imm8, and store the results in mask vector k using zeromask k (elements are zeroed out when the corresponding mask bit is not set). imm can be a combination of:
- _mm_
mask_ βfpclass_ sh_ mask Experimental avx512fp16
- Test the lower half-precision (16-bit) floating-point element in a for special categories specified by imm8, and store the result in mask vector k using zeromask k (elements are zeroed out when the corresponding mask bit is not set). imm can be a combination of:
- _mm_
mask_ βgetexp_ ph Experimental avx512fp16,avx512vl
- Convert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision
(16-bit) floating-point number representing the integer exponent, and store the results in dst using writemask k
(elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates
floor(log2(x))
for each element. - _mm_
mask_ βgetexp_ round_ sh Experimental avx512fp16
- Convert the exponent of the lower half-precision (16-bit) floating-point element in b to a half-precision
(16-bit) floating-point number representing the integer exponent, store the result in the lower element
of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7
packed elements from a to the upper elements of dst. This intrinsic essentially calculates
floor(log2(x))
for the lower element. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter - _mm_
mask_ βgetexp_ sh Experimental avx512fp16
- Convert the exponent of the lower half-precision (16-bit) floating-point element in b to a half-precision
(16-bit) floating-point number representing the integer exponent, store the result in the lower element
of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7
packed elements from a to the upper elements of dst. This intrinsic essentially calculates
floor(log2(x))
for the lower element. - _mm_
mask_ βgetmant_ ph Experimental avx512fp16,avx512vl
- Normalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store
the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
This intrinsic essentially calculates
Β±(2^k)*|x.significand|
, where k depends on the interval range defined by norm and the sign depends on sign and the source sign. - _mm_
mask_ βgetmant_ round_ sh Experimental avx512fp16
- Normalize the mantissas of the lower half-precision (16-bit) floating-point element in b, store
the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set),
and copy the upper 7 packed elements from a to the upper elements of dst. This intrinsic essentially calculates
Β±(2^k)*|x.significand|
, where k depends on the interval range defined by norm and the sign depends on sign and the source sign. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter - _mm_
mask_ βgetmant_ sh Experimental avx512fp16
- Normalize the mantissas of the lower half-precision (16-bit) floating-point element in b, store
the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set),
and copy the upper 7 packed elements from a to the upper elements of dst. This intrinsic essentially calculates
Β±(2^k)*|x.significand|
, where k depends on the interval range defined by norm and the sign depends on sign and the source sign. - _mm_
mask_ βload_ sh Experimental avx512fp16
- Load a half-precision (16-bit) floating-point element from memory into the lower element of a new vector using writemask k (the element is copied from src when mask bit 0 is not set), and zero the upper elements.
- _mm_
mask_ βmax_ ph Experimental avx512fp16,avx512vl
- Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
- _mm_
mask_ βmax_ round_ sh Experimental avx512fp16,avx512vl
- Compare the lower half-precision (16-bit) floating-point elements in a and b, store the maximum value in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
- _mm_
mask_ βmax_ sh Experimental avx512fp16,avx512vl
- Compare the lower half-precision (16-bit) floating-point elements in a and b, store the maximum value in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
- _mm_
mask_ βmin_ ph Experimental avx512fp16,avx512vl
- Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
- _mm_
mask_ βmin_ round_ sh Experimental avx512fp16,avx512vl
- Compare the lower half-precision (16-bit) floating-point elements in a and b, store the minimum value in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
- _mm_
mask_ βmin_ sh Experimental avx512fp16,avx512vl
- Compare the lower half-precision (16-bit) floating-point elements in a and b, store the minimum value in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
- _mm_
mask_ βmove_ sh Experimental avx512fp16
- Move the lower half-precision (16-bit) floating-point element from b to the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
mask_ βmul_ pch Experimental avx512fp16,avx512vl
- Multiply packed complex numbers in a and b, and store the results in dst using writemask k (the element
is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm_
mask_ βmul_ ph Experimental avx512fp16,avx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_
mask_ βmul_ round_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a and b, and store the result in the lower elements of dst using
writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 6 packed
elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision
(16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm_
mask_ βmul_ round_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using writemask k (the element is copied from src when mask bit 0 is not set). Rounding is done according to the rounding parameter, which can be one of:
- _mm_
mask_ βmul_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a and b, and store the result in the lower elements of dst using
writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 6 packed
elements from a to the upper elements of dst. Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm_
mask_ βmul_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using writemask k (the element is copied from src when mask bit 0 is not set).
- _mm_
mask_ βrcp_ ph Experimental avx512fp16,avx512vl
- Compute the approximate reciprocal of packed 16-bit floating-point elements in
a
and stores the results indst
using writemaskk
(elements are copied fromsrc
when the corresponding mask bit is not set). The maximum relative error for this approximation is less than1.5*2^-12
. - _mm_
mask_ βrcp_ sh Experimental avx512fp16
- Compute the approximate reciprocal of the lower half-precision (16-bit) floating-point element in b,
store the result in the lower element of dst using writemask k (the element is copied from src when
mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
The maximum relative error for this approximation is less than
1.5*2^-12
. - _mm_
mask_ βreduce_ ph Experimental avx512fp16,avx512vl
- Extract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_
mask_ βreduce_ round_ sh Experimental avx512fp16
- Extract the reduced argument of the lower half-precision (16-bit) floating-point element in b by the number of bits specified by imm8, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
mask_ βreduce_ sh Experimental avx512fp16
- Extract the reduced argument of the lower half-precision (16-bit) floating-point element in b by the number of bits specified by imm8, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
mask_ βroundscale_ ph Experimental avx512fp16,avx512vl
- Round packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_
mask_ βroundscale_ round_ sh Experimental avx512fp16
- Round the lower half-precision (16-bit) floating-point element in b to the number of fraction bits specified by imm8, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
mask_ βroundscale_ sh Experimental avx512fp16
- Round the lower half-precision (16-bit) floating-point element in b to the number of fraction bits specified by imm8, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
mask_ βrsqrt_ ph Experimental avx512fp16,avx512vl
- Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point
elements in a, and store the results in dst using writemask k (elements are copied from src when
the corresponding mask bit is not set).
The maximum relative error for this approximation is less than
1.5*2^-12
. - _mm_
mask_ βrsqrt_ sh Experimental avx512fp16
- Compute the approximate reciprocal square root of the lower half-precision (16-bit) floating-point
element in b, store the result in the lower element of dst using writemask k (the element is copied from src
when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
The maximum relative error for this approximation is less than
1.5*2^-12
. - _mm_
mask_ βscalef_ ph Experimental avx512fp16,avx512vl
- Scale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_
mask_ βscalef_ round_ sh Experimental avx512fp16
- Scale the packed single-precision (32-bit) floating-point elements in a using values from b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
mask_ βscalef_ sh Experimental avx512fp16
- Scale the packed single-precision (32-bit) floating-point elements in a using values from b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
mask_ βsqrt_ ph Experimental avx512fp16,avx512vl
- Compute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_
mask_ βsqrt_ round_ sh Experimental avx512fp16
- Compute the square root of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. Rounding is done according to the rounding parameter, which can be one of:
- _mm_
mask_ βsqrt_ sh Experimental avx512fp16
- Compute the square root of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
mask_ βstore_ sh Experimental avx512fp16
- Store the lower half-precision (16-bit) floating-point element from a into memory using writemask k
- _mm_
mask_ βsub_ ph Experimental avx512fp16,avx512vl
- Subtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_
mask_ βsub_ round_ sh Experimental avx512fp16
- Subtract the lower half-precision (16-bit) floating-point elements in b from a, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using writemask k (the element is copied from src when mask bit 0 is not set). Rounding is done according to the rounding parameter, which can be one of:
- _mm_
mask_ βsub_ sh Experimental avx512fp16
- Subtract the lower half-precision (16-bit) floating-point elements in b from a, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using writemask k (the element is copied from src when mask bit 0 is not set).
- _mm_
maskz_ βadd_ ph Experimental avx512fp16,avx512vl
- Add packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_
maskz_ βadd_ round_ sh Experimental avx512fp16
- Add the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using zeromask k (the element is zeroed out when mask bit 0 is not set). Rounding is done according to the rounding parameter, which can be one of:
- _mm_
maskz_ βadd_ sh Experimental avx512fp16
- Add the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using zeromask k (the element is zeroed out when mask bit 0 is not set).
- _mm_
maskz_ βcmul_ pch Experimental avx512fp16,avx512vl
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and
store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set).
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which
defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm_
maskz_ βcmul_ round_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b,
and store the results in dst using zeromask k (the element is zeroed out when mask bit 0 is not set).
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which
defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm_
maskz_ βcmul_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b,
and store the results in dst using zeromask k (the element is zeroed out when mask bit 0 is not set).
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which
defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, - _mm_
maskz_ βconj_ pch Experimental avx512fp16,avx512vl
- Compute the complex conjugates of complex numbers in a, and store the results in dst using zeromask k
(the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm_
maskz_ βcvt_ roundsd_ sh Experimental avx512fp16
- Convert the lower double-precision (64-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
maskz_ βcvt_ roundsh_ sd Experimental avx512fp16
- Convert the lower half-precision (16-bit) floating-point element in b to a double-precision (64-bit) floating-point element, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
- _mm_
maskz_ βcvt_ roundsh_ ss Experimental avx512fp16
- Convert the lower half-precision (16-bit) floating-point element in b to a single-precision (32-bit) floating-point element, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_
maskz_ βcvt_ roundss_ sh Experimental avx512fp16
- Convert the lower single-precision (32-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
maskz_ βcvtepi16_ ph Experimental avx512fp16,avx512vl
- Convert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_
maskz_ βcvtepi32_ ph Experimental avx512fp16,avx512vl
- Convert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
- _mm_
maskz_ βcvtepi64_ ph Experimental avx512fp16,avx512vl
- Convert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The upper 96 bits of dst are zeroed out.
- _mm_
maskz_ βcvtepu16_ ph Experimental avx512fp16,avx512vl
- Convert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_
maskz_ βcvtepu32_ ph Experimental avx512fp16,avx512vl
- Convert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
- _mm_
maskz_ βcvtepu64_ ph Experimental avx512fp16,avx512vl
- Convert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The upper 96 bits of dst are zeroed out.
- _mm_
maskz_ βcvtpd_ ph Experimental avx512fp16,avx512vl
- Convert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The upper 96 bits of dst are zeroed out.
- _mm_
maskz_ βcvtph_ epi16 Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_
maskz_ βcvtph_ epi32 Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_
maskz_ βcvtph_ epi64 Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_
maskz_ βcvtph_ epu16 Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_
maskz_ βcvtph_ epu32 Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_
maskz_ βcvtph_ epu64 Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_
maskz_ βcvtph_ pd Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_
maskz_ βcvtsd_ sh Experimental avx512fp16
- Convert the lower double-precision (64-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
maskz_ βcvtsh_ sd Experimental avx512fp16
- Convert the lower half-precision (16-bit) floating-point element in b to a double-precision (64-bit) floating-point element, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
- _mm_
maskz_ βcvtsh_ ss Experimental avx512fp16
- Convert the lower half-precision (16-bit) floating-point element in b to a single-precision (32-bit) floating-point element, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_
maskz_ βcvtss_ sh Experimental avx512fp16
- Convert the lower single-precision (32-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
maskz_ βcvttph_ epi16 Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_
maskz_ βcvttph_ epi32 Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_
maskz_ βcvttph_ epi64 Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_
maskz_ βcvttph_ epu16 Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_
maskz_ βcvttph_ epu32 Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_
maskz_ βcvttph_ epu64 Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_
maskz_ βcvtxph_ ps Experimental avx512fp16,avx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_
maskz_ βcvtxps_ ph Experimental avx512fp16,avx512vl
- Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
- _mm_
maskz_ βdiv_ ph Experimental avx512fp16,avx512vl
- Divide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_
maskz_ βdiv_ round_ sh Experimental avx512fp16
- Divide the lower half-precision (16-bit) floating-point elements in a by b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using zeromask k (the element is zeroed out when mask bit 0 is not set). Rounding is done according to the rounding parameter, which can be one of:
- _mm_
maskz_ βdiv_ sh Experimental avx512fp16
- Divide the lower half-precision (16-bit) floating-point elements in a by b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using zeromask k (the element is zeroed out when mask bit 0 is not set).
- _mm_
maskz_ βfcmadd_ pch Experimental avx512fp16,avx512vl
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate
to the corresponding complex numbers in c, and store the results in dst using zeromask k (the element is
zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm_
maskz_ βfcmadd_ round_ sch Experimental avx512fp16
- Multiply the lower complex number in a by the complex conjugate of the lower complex number in b,
accumulate to the lower complex number in c using zeromask k (the element is zeroed out when the corresponding
mask bit is not set), and store the result in the lower elements of dst, and copy the upper 6 packed elements
from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit)
floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm_
maskz_ βfcmadd_ sch Experimental avx512fp16
- Multiply the lower complex number in a by the complex conjugate of the lower complex number in b,
accumulate to the lower complex number in c, and store the result in the lower elements of dst using
zeromask k (the element is zeroed out when the corresponding mask bit is not set), and copy the upper
6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm_
maskz_ βfcmul_ pch Experimental avx512fp16,avx512vl
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and
store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set).
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which
defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm_
maskz_ βfcmul_ round_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b,
and store the results in dst using zeromask k (the element is zeroed out when mask bit 0 is not set).
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which
defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm_
maskz_ βfcmul_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b,
and store the results in dst using zeromask k (the element is zeroed out when mask bit 0 is not set).
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which
defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm_
maskz_ βfmadd_ pch Experimental avx512fp16,avx512vl
- Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c,
and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask
bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point
elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm_
maskz_ βfmadd_ ph Experimental avx512fp16,avx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm_
maskz_ βfmadd_ round_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a and b, accumulate to the lower complex number in c, and
store the result in the lower elements of dst using zeromask k (elements are zeroed out when mask
bit 0 is not set), and copy the upper 6 packed elements from a to the upper elements of dst. Each
complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which
defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm_
maskz_ βfmadd_ round_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
maskz_ βfmadd_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a and b, accumulate to the lower complex number in c, and
store the result in the lower elements of dst using zeromask k (elements are zeroed out when mask
bit 0 is not set), and copy the upper 6 packed elements from a to the upper elements of dst. Each
complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which
defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm_
maskz_ βfmadd_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
maskz_ βfmaddsub_ ph Experimental avx512fp16,avx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm_
maskz_ βfmsub_ ph Experimental avx512fp16,avx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm_
maskz_ βfmsub_ round_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract packed elements in c from the intermediate result. Store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
maskz_ βfmsub_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract packed elements in c from the intermediate result. Store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
maskz_ βfmsubadd_ ph Experimental avx512fp16,avx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm_
maskz_ βfmul_ pch Experimental avx512fp16,avx512vl
- Multiply packed complex numbers in a and b, and store the results in dst using zeromask k (the element
is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision
(16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm_
maskz_ βfmul_ round_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a and b, and store the results in dst using zeromask k (the element
is zeroed out when mask bit 0 is not set). Each complex number is composed of two adjacent half-precision
(16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm_
maskz_ βfmul_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a and b, and store the results in dst using zeromask k (the element
is zeroed out when mask bit 0 is not set). Each complex number is composed of two adjacent half-precision
(16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm_
maskz_ βfnmadd_ ph Experimental avx512fp16,avx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm_
maskz_ βfnmadd_ round_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using zeromask k (the element is zeroed out when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
maskz_ βfnmadd_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using zeromask k (the element is zeroed out when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
maskz_ βfnmsub_ ph Experimental avx512fp16,avx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm_
maskz_ βfnmsub_ round_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using zeromask k (the element is zeroed out when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
maskz_ βfnmsub_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using zeromask k (the element is zeroed out when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
maskz_ βgetexp_ ph Experimental avx512fp16,avx512vl
- Convert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision
(16-bit) floating-point number representing the integer exponent, and store the results in dst using zeromask
k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates
floor(log2(x))
for each element. - _mm_
maskz_ βgetexp_ round_ sh Experimental avx512fp16
- Convert the exponent of the lower half-precision (16-bit) floating-point element in b to a half-precision
(16-bit) floating-point number representing the integer exponent, store the result in the lower element
of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed
elements from a to the upper elements of dst. This intrinsic essentially calculates
floor(log2(x))
for the lower element. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter - _mm_
maskz_ βgetexp_ sh Experimental avx512fp16
- Convert the exponent of the lower half-precision (16-bit) floating-point element in b to a half-precision
(16-bit) floating-point number representing the integer exponent, store the result in the lower element
of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed
elements from a to the upper elements of dst. This intrinsic essentially calculates
floor(log2(x))
for the lower element. - _mm_
maskz_ βgetmant_ ph Experimental avx512fp16,avx512vl
- Normalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store
the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
This intrinsic essentially calculates
Β±(2^k)*|x.significand|
, where k depends on the interval range defined by norm and the sign depends on sign and the source sign. - _mm_
maskz_ βgetmant_ round_ sh Experimental avx512fp16
- Normalize the mantissas of the lower half-precision (16-bit) floating-point element in b, store
the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set),
and copy the upper 7 packed elements from a to the upper elements of dst. This intrinsic essentially calculates
Β±(2^k)*|x.significand|
, where k depends on the interval range defined by norm and the sign depends on sign and the source sign. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter - _mm_
maskz_ βgetmant_ sh Experimental avx512fp16
- Normalize the mantissas of the lower half-precision (16-bit) floating-point element in b, store
the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set),
and copy the upper 7 packed elements from a to the upper elements of dst. This intrinsic essentially calculates
Β±(2^k)*|x.significand|
, where k depends on the interval range defined by norm and the sign depends on sign and the source sign. - _mm_
maskz_ βload_ sh Experimental avx512fp16
- Load a half-precision (16-bit) floating-point element from memory into the lower element of a new vector using zeromask k (the element is zeroed out when mask bit 0 is not set), and zero the upper elements.
- _mm_
maskz_ βmax_ ph Experimental avx512fp16,avx512vl
- Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
- _mm_
maskz_ βmax_ round_ sh Experimental avx512fp16,avx512vl
- Compare the lower half-precision (16-bit) floating-point elements in a and b, store the maximum value in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
- _mm_
maskz_ βmax_ sh Experimental avx512fp16,avx512vl
- Compare the lower half-precision (16-bit) floating-point elements in a and b, store the maximum value in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
- _mm_
maskz_ βmin_ ph Experimental avx512fp16,avx512vl
- Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
- _mm_
maskz_ βmin_ round_ sh Experimental avx512fp16,avx512vl
- Compare the lower half-precision (16-bit) floating-point elements in a and b, store the minimum value in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
- _mm_
maskz_ βmin_ sh Experimental avx512fp16,avx512vl
- Compare the lower half-precision (16-bit) floating-point elements in a and b, store the minimum value in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
- _mm_
maskz_ βmove_ sh Experimental avx512fp16
- Move the lower half-precision (16-bit) floating-point element from b to the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
maskz_ βmul_ pch Experimental avx512fp16,avx512vl
- Multiply packed complex numbers in a and b, and store the results in dst using zeromask k (the element
is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm_
maskz_ βmul_ ph Experimental avx512fp16,avx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_
maskz_ βmul_ round_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a and b, and store the result in the lower elements of dst using
zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 6 packed elements
from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision
(16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm_
maskz_ βmul_ round_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using zeromask k (the element is zeroed out when mask bit 0 is not set). Rounding is done according to the rounding parameter, which can be one of:
- _mm_
maskz_ βmul_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a and b, and store the result in the lower elements of dst using
zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 6 packed elements
from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision
(16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm_
maskz_ βmul_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using zeromask k (the element is zeroed out when mask bit 0 is not set).
- _mm_
maskz_ βrcp_ ph Experimental avx512fp16,avx512vl
- Compute the approximate reciprocal of packed 16-bit floating-point elements in
a
and stores the results indst
using zeromaskk
(elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than1.5*2^-12
. - _mm_
maskz_ βrcp_ sh Experimental avx512fp16
- Compute the approximate reciprocal of the lower half-precision (16-bit) floating-point element in b,
store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0
is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
The maximum relative error for this approximation is less than
1.5*2^-12
. - _mm_
maskz_ βreduce_ ph Experimental avx512fp16,avx512vl
- Extract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_
maskz_ βreduce_ round_ sh Experimental avx512fp16
- Extract the reduced argument of the lower half-precision (16-bit) floating-point element in b by the number of bits specified by imm8, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
maskz_ βreduce_ sh Experimental avx512fp16
- Extract the reduced argument of the lower half-precision (16-bit) floating-point element in b by the number of bits specified by imm8, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
maskz_ βroundscale_ ph Experimental avx512fp16,avx512vl
- Round packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_
maskz_ βroundscale_ round_ sh Experimental avx512fp16
- Round the lower half-precision (16-bit) floating-point element in b to the number of fraction bits specified by imm8, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
maskz_ βroundscale_ sh Experimental avx512fp16
- Round the lower half-precision (16-bit) floating-point element in b to the number of fraction bits specified by imm8, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
maskz_ βrsqrt_ ph Experimental avx512fp16,avx512vl
- Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point
elements in a, and store the results in dst using zeromask k (elements are zeroed out when the
corresponding mask bit is not set).
The maximum relative error for this approximation is less than
1.5*2^-12
. - _mm_
maskz_ βrsqrt_ sh Experimental avx512fp16
- Compute the approximate reciprocal square root of the lower half-precision (16-bit) floating-point
element in b, store the result in the lower element of dst using zeromask k (the element is zeroed out when
mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
The maximum relative error for this approximation is less than
1.5*2^-12
. - _mm_
maskz_ βscalef_ ph Experimental avx512fp16,avx512vl
- Scale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_
maskz_ βscalef_ round_ sh Experimental avx512fp16
- Scale the packed single-precision (32-bit) floating-point elements in a using values from b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
maskz_ βscalef_ sh Experimental avx512fp16
- Scale the packed single-precision (32-bit) floating-point elements in a using values from b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
maskz_ βsqrt_ ph Experimental avx512fp16,avx512vl
- Compute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_
maskz_ βsqrt_ round_ sh Experimental avx512fp16
- Compute the square root of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. Rounding is done according to the rounding parameter, which can be one of:
- _mm_
maskz_ βsqrt_ sh Experimental avx512fp16
- Compute the square root of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
maskz_ βsub_ ph Experimental avx512fp16,avx512vl
- Subtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_
maskz_ βsub_ round_ sh Experimental avx512fp16
- Subtract the lower half-precision (16-bit) floating-point elements in b from a, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using zeromask k (the element is zeroed out when mask bit 0 is not set). Rounding is done according to the rounding parameter, which can be one of:
- _mm_
maskz_ βsub_ sh Experimental avx512fp16
- Subtract the lower half-precision (16-bit) floating-point elements in b from a, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using zeromask k (the element is zeroed out when mask bit 0 is not set).
- _mm_
max_ βph Experimental avx512fp16,avx512vl
- Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
- _mm_
max_ βround_ sh Experimental avx512fp16,avx512vl
- Compare the lower half-precision (16-bit) floating-point elements in a and b, store the maximum value in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
- _mm_
max_ βsh Experimental avx512fp16,avx512vl
- Compare the lower half-precision (16-bit) floating-point elements in a and b, store the maximum value in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
- _mm_
min_ βph Experimental avx512fp16,avx512vl
- Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
- _mm_
min_ βround_ sh Experimental avx512fp16,avx512vl
- Compare the lower half-precision (16-bit) floating-point elements in a and b, store the minimum value in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
- _mm_
min_ βsh Experimental avx512fp16,avx512vl
- Compare the lower half-precision (16-bit) floating-point elements in a and b, store the minimum value in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
- _mm_
move_ βsh Experimental avx512fp16
- Move the lower half-precision (16-bit) floating-point element from b to the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
mul_ βpch Experimental avx512fp16,avx512vl
- Multiply packed complex numbers in a and b, and store the results in dst. Each complex number is
composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex
number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm_
mul_ βph Experimental avx512fp16,avx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst.
- _mm_
mul_ βround_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a and b, and store the result in the lower elements of dst,
and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is
composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex
number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm_
mul_ βround_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. Rounding is done according to the rounding parameter, which can be one of:
- _mm_
mul_ βsch Experimental avx512fp16
- Multiply the lower complex numbers in a and b, and store the result in the lower elements of dst,
and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is
composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex
number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm_
mul_ βsh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
permutex2var_ βph Experimental avx512fp16,avx512vl
- Shuffle half-precision (16-bit) floating-point elements in a and b using the corresponding selector and index in idx, and store the results in dst.
- _mm_
permutexvar_ βph Experimental avx512fp16,avx512vl
- Shuffle half-precision (16-bit) floating-point elements in a using the corresponding index in idx, and store the results in dst.
- _mm_
rcp_ βph Experimental avx512fp16,avx512vl
- Compute the approximate reciprocal of packed 16-bit floating-point elements in
a
and stores the results indst
. The maximum relative error for this approximation is less than1.5*2^-12
. - _mm_
rcp_ βsh Experimental avx512fp16
- Compute the approximate reciprocal of the lower half-precision (16-bit) floating-point element in b,
store the result in the lower element of dst, and copy the upper 7 packed elements from a to the
upper elements of dst.
The maximum relative error for this approximation is less than
1.5*2^-12
. - _mm_
reduce_ βadd_ ph Experimental avx512fp16,avx512vl
- Reduce the packed half-precision (16-bit) floating-point elements in a by addition. Returns the sum of all elements in a.
- _mm_
reduce_ βmax_ ph Experimental avx512fp16,avx512vl
- Reduce the packed half-precision (16-bit) floating-point elements in a by maximum. Returns the maximum of all elements in a.
- _mm_
reduce_ βmin_ ph Experimental avx512fp16,avx512vl
- Reduce the packed half-precision (16-bit) floating-point elements in a by minimum. Returns the minimum of all elements in a.
- _mm_
reduce_ βmul_ ph Experimental avx512fp16,avx512vl
- Reduce the packed half-precision (16-bit) floating-point elements in a by multiplication. Returns the product of all elements in a.
- _mm_
reduce_ βph Experimental avx512fp16,avx512vl
- Extract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst.
- _mm_
reduce_ βround_ sh Experimental avx512fp16
- Extract the reduced argument of the lower half-precision (16-bit) floating-point element in b by the number of bits specified by imm8, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
reduce_ βsh Experimental avx512fp16
- Extract the reduced argument of the lower half-precision (16-bit) floating-point element in b by the number of bits specified by imm8, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
roundscale_ βph Experimental avx512fp16,avx512vl
- Round packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst.
- _mm_
roundscale_ βround_ sh Experimental avx512fp16
- Round the lower half-precision (16-bit) floating-point element in b to the number of fraction bits specified by imm8, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
roundscale_ βsh Experimental avx512fp16
- Round the lower half-precision (16-bit) floating-point element in b to the number of fraction bits specified by imm8, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
rsqrt_ βph Experimental avx512fp16,avx512vl
- Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point
elements in a, and store the results in dst.
The maximum relative error for this approximation is less than
1.5*2^-12
. - _mm_
rsqrt_ βsh Experimental avx512fp16
- Compute the approximate reciprocal square root of the lower half-precision (16-bit) floating-point
element in b, store the result in the lower element of dst, and copy the upper 7 packed elements from a
to the upper elements of dst.
The maximum relative error for this approximation is less than
1.5*2^-12
. - _mm_
scalef_ βph Experimental avx512fp16,avx512vl
- Scale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst.
- _mm_
scalef_ βround_ sh Experimental avx512fp16
- Scale the packed single-precision (32-bit) floating-point elements in a using values from b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
scalef_ βsh Experimental avx512fp16
- Scale the packed single-precision (32-bit) floating-point elements in a using values from b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
set1_ βph Experimental avx512fp16
- Broadcast the half-precision (16-bit) floating-point value a to all elements of dst.
- _mm_
set_ βph Experimental avx512fp16
- Set packed half-precision (16-bit) floating-point elements in dst with the supplied values.
- _mm_
set_ βsh Experimental avx512fp16
- Copy half-precision (16-bit) floating-point elements from a to the lower element of dst and zero the upper 7 elements.
- _mm_
setr_ βph Experimental avx512fp16
- Set packed half-precision (16-bit) floating-point elements in dst with the supplied values in reverse order.
- _mm_
setzero_ βph Experimental avx512fp16,avx512vl
- Return vector of type __m128h with all elements set to zero.
- _mm_
sqrt_ βph Experimental avx512fp16,avx512vl
- Compute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst.
- _mm_
sqrt_ βround_ sh Experimental avx512fp16
- Compute the square root of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. Rounding is done according to the rounding parameter, which can be one of:
- _mm_
sqrt_ βsh Experimental avx512fp16
- Compute the square root of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
store_ βph Experimental avx512fp16,avx512vl
- Store 128-bits (composed of 8 packed half-precision (16-bit) floating-point elements) from a into memory. The address must be aligned to 16 bytes or a general-protection exception may be generated.
- _mm_
store_ βsh Experimental avx512fp16
- Store the lower half-precision (16-bit) floating-point element from a into memory.
- _mm_
storeu_ βph Experimental avx512fp16,avx512vl
- Store 128-bits (composed of 8 packed half-precision (16-bit) floating-point elements) from a into memory. The address does not need to be aligned to any particular boundary.
- _mm_
sub_ βph Experimental avx512fp16,avx512vl
- Subtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst.
- _mm_
sub_ βround_ sh Experimental avx512fp16
- Subtract the lower half-precision (16-bit) floating-point elements in b from a, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. Rounding is done according to the rounding parameter, which can be one of:
- _mm_
sub_ βsh Experimental avx512fp16
- Subtract the lower half-precision (16-bit) floating-point elements in b from a, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
ucomieq_ βsh Experimental avx512fp16
- Compare the lower half-precision (16-bit) floating-point elements in a and b for equality, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.
- _mm_
ucomige_ βsh Experimental avx512fp16
- Compare the lower half-precision (16-bit) floating-point elements in a and b for greater-than-or-equal, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.
- _mm_
ucomigt_ βsh Experimental avx512fp16
- Compare the lower half-precision (16-bit) floating-point elements in a and b for greater-than, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.
- _mm_
ucomile_ βsh Experimental avx512fp16
- Compare the lower half-precision (16-bit) floating-point elements in a and b for less-than-or-equal, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.
- _mm_
ucomilt_ βsh Experimental avx512fp16
- Compare the lower half-precision (16-bit) floating-point elements in a and b for less-than, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.
- _mm_
ucomineq_ βsh Experimental avx512fp16
- Compare the lower half-precision (16-bit) floating-point elements in a and b for not-equal, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.
- _mm_
undefined_ βph Experimental avx512fp16,avx512vl
- Return vector of type
__m128h
with indetermination elements. Despite using the word βundefinedβ (following Intelβs naming scheme), this non-deterministically picks some valid value and is not equivalent tomem::MaybeUninit
. In practice, this is typically equivalent tomem::zeroed
.