Available on x86 or x86-64 only.
Expand description
Advanced Vector Extensions (AVX)
The references are:
- Intel 64 and IA-32 Architectures Software Developer’s Manual Volume 2: Instruction Set Reference, A-Z. - AMD64 Architecture Programmer’s Manual, Volume 3: General-Purpose and System Instructions.
Wikipedia provides a quick overview of the instructions available.
Constants§
- _CMP_
EQ_ OQ - Equal (ordered, non-signaling)
- _CMP_
EQ_ OS - Equal (ordered, signaling)
- _CMP_
EQ_ UQ - Equal (unordered, non-signaling)
- _CMP_
EQ_ US - Equal (unordered, signaling)
- _CMP_
FALSE_ OQ - False (ordered, non-signaling)
- _CMP_
FALSE_ OS - False (ordered, signaling)
- _CMP_
GE_ OQ - Greater-than-or-equal (ordered, non-signaling)
- _CMP_
GE_ OS - Greater-than-or-equal (ordered, signaling)
- _CMP_
GT_ OQ - Greater-than (ordered, non-signaling)
- _CMP_
GT_ OS - Greater-than (ordered, signaling)
- _CMP_
LE_ OQ - Less-than-or-equal (ordered, non-signaling)
- _CMP_
LE_ OS - Less-than-or-equal (ordered, signaling)
- _CMP_
LT_ OQ - Less-than (ordered, non-signaling)
- _CMP_
LT_ OS - Less-than (ordered, signaling)
- _CMP_
NEQ_ OQ - Not-equal (ordered, non-signaling)
- _CMP_
NEQ_ OS - Not-equal (ordered, signaling)
- _CMP_
NEQ_ UQ - Not-equal (unordered, non-signaling)
- _CMP_
NEQ_ US - Not-equal (unordered, signaling)
- _CMP_
NGE_ UQ - Not-greater-than-or-equal (unordered, non-signaling)
- _CMP_
NGE_ US - Not-greater-than-or-equal (unordered, signaling)
- _CMP_
NGT_ UQ - Not-greater-than (unordered, non-signaling)
- _CMP_
NGT_ US - Not-greater-than (unordered, signaling)
- _CMP_
NLE_ UQ - Not-less-than-or-equal (unordered, non-signaling)
- _CMP_
NLE_ US - Not-less-than-or-equal (unordered, signaling)
- _CMP_
NLT_ UQ - Not-less-than (unordered, non-signaling)
- _CMP_
NLT_ US - Not-less-than (unordered, signaling)
- _CMP_
ORD_ Q - Ordered (non-signaling)
- _CMP_
ORD_ S - Ordered (signaling)
- _CMP_
TRUE_ UQ - True (unordered, non-signaling)
- _CMP_
TRUE_ US - True (unordered, signaling)
- _CMP_
UNORD_ Q - Unordered (non-signaling)
- _CMP_
UNORD_ S - Unordered (signaling)
Functions§
- _mm256_
add_ ⚠pd avx
- Adds packed double-precision (64-bit) floating-point elements
in
a
andb
. - _mm256_
add_ ⚠ps avx
- Adds packed single-precision (32-bit) floating-point elements in
a
andb
. - _mm256_
addsub_ ⚠pd avx
- Alternatively adds and subtracts packed double-precision (64-bit)
floating-point elements in
a
to/from packed elements inb
. - _mm256_
addsub_ ⚠ps avx
- Alternatively adds and subtracts packed single-precision (32-bit)
floating-point elements in
a
to/from packed elements inb
. - _mm256_
and_ ⚠pd avx
- Computes the bitwise AND of a packed double-precision (64-bit)
floating-point elements in
a
andb
. - _mm256_
and_ ⚠ps avx
- Computes the bitwise AND of packed single-precision (32-bit) floating-point
elements in
a
andb
. - _mm256_
andnot_ ⚠pd avx
- Computes the bitwise NOT of packed double-precision (64-bit) floating-point
elements in
a
, and then AND withb
. - _mm256_
andnot_ ⚠ps avx
- Computes the bitwise NOT of packed single-precision (32-bit) floating-point
elements in
a
and then AND withb
. - _mm256_
blend_ ⚠pd avx
- Blends packed double-precision (64-bit) floating-point elements from
a
andb
using control maskimm8
. - _mm256_
blend_ ⚠ps avx
- Blends packed single-precision (32-bit) floating-point elements from
a
andb
using control maskimm8
. - _mm256_
blendv_ ⚠pd avx
- Blends packed double-precision (64-bit) floating-point elements from
a
andb
usingc
as a mask. - _mm256_
blendv_ ⚠ps avx
- Blends packed single-precision (32-bit) floating-point elements from
a
andb
usingc
as a mask. - _mm256_
broadcast_ ⚠pd avx
- Broadcasts 128 bits from memory (composed of 2 packed double-precision (64-bit) floating-point elements) to all elements of the returned vector.
- _mm256_
broadcast_ ⚠ps avx
- Broadcasts 128 bits from memory (composed of 4 packed single-precision (32-bit) floating-point elements) to all elements of the returned vector.
- _mm256_
broadcast_ ⚠sd avx
- Broadcasts a double-precision (64-bit) floating-point element from memory to all elements of the returned vector.
- _mm256_
broadcast_ ⚠ss avx
- Broadcasts a single-precision (32-bit) floating-point element from memory to all elements of the returned vector.
- _mm256_
castpd128_ ⚠pd256 avx
- Casts vector of type __m128d to type __m256d; the upper 128 bits of the result are undefined.
- _mm256_
castpd256_ ⚠pd128 avx
- Casts vector of type __m256d to type __m128d.
- _mm256_
castpd_ ⚠ps avx
- Cast vector of type __m256d to type __m256.
- _mm256_
castpd_ ⚠si256 avx
- Casts vector of type __m256d to type __m256i.
- _mm256_
castps128_ ⚠ps256 avx
- Casts vector of type __m128 to type __m256; the upper 128 bits of the result are undefined.
- _mm256_
castps256_ ⚠ps128 avx
- Casts vector of type __m256 to type __m128.
- _mm256_
castps_ ⚠pd avx
- Cast vector of type __m256 to type __m256d.
- _mm256_
castps_ ⚠si256 avx
- Casts vector of type __m256 to type __m256i.
- _mm256_
castsi128_ ⚠si256 avx
- Casts vector of type __m128i to type __m256i; the upper 128 bits of the result are undefined.
- _mm256_
castsi256_ ⚠pd avx
- Casts vector of type __m256i to type __m256d.
- _mm256_
castsi256_ ⚠ps avx
- Casts vector of type __m256i to type __m256.
- _mm256_
castsi256_ ⚠si128 avx
- Casts vector of type __m256i to type __m128i.
- _mm256_
ceil_ ⚠pd avx
- Rounds packed double-precision (64-bit) floating point elements in
a
toward positive infinity. - _mm256_
ceil_ ⚠ps avx
- Rounds packed single-precision (32-bit) floating point elements in
a
toward positive infinity. - _mm256_
cmp_ ⚠pd avx
- Compares packed double-precision (64-bit) floating-point
elements in
a
andb
based on the comparison operand specified byIMM5
. - _mm256_
cmp_ ⚠ps avx
- Compares packed single-precision (32-bit) floating-point
elements in
a
andb
based on the comparison operand specified byIMM5
. - _mm256_
cvtepi32_ ⚠pd avx
- Converts packed 32-bit integers in
a
to packed double-precision (64-bit) floating-point elements. - _mm256_
cvtepi32_ ⚠ps avx
- Converts packed 32-bit integers in
a
to packed single-precision (32-bit) floating-point elements. - _mm256_
cvtpd_ ⚠epi32 avx
- Converts packed double-precision (64-bit) floating-point elements in
a
to packed 32-bit integers. - _mm256_
cvtpd_ ⚠ps avx
- Converts packed double-precision (64-bit) floating-point elements in
a
to packed single-precision (32-bit) floating-point elements. - _mm256_
cvtps_ ⚠epi32 avx
- Converts packed single-precision (32-bit) floating-point elements in
a
to packed 32-bit integers. - _mm256_
cvtps_ ⚠pd avx
- Converts packed single-precision (32-bit) floating-point elements in
a
to packed double-precision (64-bit) floating-point elements. - _mm256_
cvtsd_ ⚠f64 avx
- Returns the first element of the input vector of
[4 x double]
. - _mm256_
cvtsi256_ ⚠si32 avx
- Returns the first element of the input vector of
[8 x i32]
. - _mm256_
cvtss_ ⚠f32 avx
- Returns the first element of the input vector of
[8 x float]
. - _mm256_
cvttpd_ ⚠epi32 avx
- Converts packed double-precision (64-bit) floating-point elements in
a
to packed 32-bit integers with truncation. - _mm256_
cvttps_ ⚠epi32 avx
- Converts packed single-precision (32-bit) floating-point elements in
a
to packed 32-bit integers with truncation. - _mm256_
div_ ⚠pd avx
- Computes the division of each of the 4 packed 64-bit floating-point elements
in
a
by the corresponding packed elements inb
. - _mm256_
div_ ⚠ps avx
- Computes the division of each of the 8 packed 32-bit floating-point elements
in
a
by the corresponding packed elements inb
. - _mm256_
dp_ ⚠ps avx
- Conditionally multiplies the packed single-precision (32-bit) floating-point
elements in
a
andb
using the high 4 bits inimm8
, sum the four products, and conditionally return the sum using the low 4 bits ofimm8
. - _mm256_
extract_ ⚠epi32 avx
- Extracts a 32-bit integer from
a
, selected withINDEX
. - _mm256_
extractf128_ ⚠pd avx
- Extracts 128 bits (composed of 2 packed double-precision (64-bit)
floating-point elements) from
a
, selected withimm8
. - _mm256_
extractf128_ ⚠ps avx
- Extracts 128 bits (composed of 4 packed single-precision (32-bit)
floating-point elements) from
a
, selected withimm8
. - _mm256_
extractf128_ ⚠si256 avx
- Extracts 128 bits (composed of integer data) from
a
, selected withimm8
. - _mm256_
floor_ ⚠pd avx
- Rounds packed double-precision (64-bit) floating point elements in
a
toward negative infinity. - _mm256_
floor_ ⚠ps avx
- Rounds packed single-precision (32-bit) floating point elements in
a
toward negative infinity. - _mm256_
hadd_ ⚠pd avx
- Horizontal addition of adjacent pairs in the two packed vectors
of 4 64-bit floating points
a
andb
. In the result, sums of elements froma
are returned in even locations, while sums of elements fromb
are returned in odd locations. - _mm256_
hadd_ ⚠ps avx
- Horizontal addition of adjacent pairs in the two packed vectors
of 8 32-bit floating points
a
andb
. In the result, sums of elements froma
are returned in locations of indices 0, 1, 4, 5; while sums of elements fromb
are locations 2, 3, 6, 7. - _mm256_
hsub_ ⚠pd avx
- Horizontal subtraction of adjacent pairs in the two packed vectors
of 4 64-bit floating points
a
andb
. In the result, sums of elements froma
are returned in even locations, while sums of elements fromb
are returned in odd locations. - _mm256_
hsub_ ⚠ps avx
- Horizontal subtraction of adjacent pairs in the two packed vectors
of 8 32-bit floating points
a
andb
. In the result, sums of elements froma
are returned in locations of indices 0, 1, 4, 5; while sums of elements fromb
are locations 2, 3, 6, 7. - _mm256_
insert_ ⚠epi8 avx
- Copies
a
to result, and inserts the 8-bit integeri
into result at the location specified byindex
. - _mm256_
insert_ ⚠epi16 avx
- Copies
a
to result, and inserts the 16-bit integeri
into result at the location specified byindex
. - _mm256_
insert_ ⚠epi32 avx
- Copies
a
to result, and inserts the 32-bit integeri
into result at the location specified byindex
. - _mm256_
insertf128_ ⚠pd avx
- Copies
a
to result, then inserts 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) fromb
into result at the location specified byimm8
. - _mm256_
insertf128_ ⚠ps avx
- Copies
a
to result, then inserts 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) fromb
into result at the location specified byimm8
. - _mm256_
insertf128_ ⚠si256 avx
- Copies
a
to result, then inserts 128 bits fromb
into result at the location specified byimm8
. - _mm256_
lddqu_ ⚠si256 avx
- Loads 256-bits of integer data from unaligned memory into result.
This intrinsic may perform better than
_mm256_loadu_si256
when the data crosses a cache line boundary. - _mm256_
load_ ⚠pd avx
- Loads 256-bits (composed of 4 packed double-precision (64-bit)
floating-point elements) from memory into result.
mem_addr
must be aligned on a 32-byte boundary or a general-protection exception may be generated. - _mm256_
load_ ⚠ps avx
- Loads 256-bits (composed of 8 packed single-precision (32-bit)
floating-point elements) from memory into result.
mem_addr
must be aligned on a 32-byte boundary or a general-protection exception may be generated. - _mm256_
load_ ⚠si256 avx
- Loads 256-bits of integer data from memory into result.
mem_addr
must be aligned on a 32-byte boundary or a general-protection exception may be generated. - _mm256_
loadu2_ ⚠m128 avx
- Loads two 128-bit values (composed of 4 packed single-precision (32-bit)
floating-point elements) from memory, and combine them into a 256-bit
value.
hiaddr
andloaddr
do not need to be aligned on any particular boundary. - _mm256_
loadu2_ ⚠m128d avx
- Loads two 128-bit values (composed of 2 packed double-precision (64-bit)
floating-point elements) from memory, and combine them into a 256-bit
value.
hiaddr
andloaddr
do not need to be aligned on any particular boundary. - _mm256_
loadu2_ ⚠m128i avx
- Loads two 128-bit values (composed of integer data) from memory, and combine
them into a 256-bit value.
hiaddr
andloaddr
do not need to be aligned on any particular boundary. - _mm256_
loadu_ ⚠pd avx
- Loads 256-bits (composed of 4 packed double-precision (64-bit)
floating-point elements) from memory into result.
mem_addr
does not need to be aligned on any particular boundary. - _mm256_
loadu_ ⚠ps avx
- Loads 256-bits (composed of 8 packed single-precision (32-bit)
floating-point elements) from memory into result.
mem_addr
does not need to be aligned on any particular boundary. - _mm256_
loadu_ ⚠si256 avx
- Loads 256-bits of integer data from memory into result.
mem_addr
does not need to be aligned on any particular boundary. - _mm256_
maskload_ ⚠pd avx
- Loads packed double-precision (64-bit) floating-point elements from memory
into result using
mask
(elements are zeroed out when the high bit of the corresponding element is not set). - _mm256_
maskload_ ⚠ps avx
- Loads packed single-precision (32-bit) floating-point elements from memory
into result using
mask
(elements are zeroed out when the high bit of the corresponding element is not set). - _mm256_
maskstore_ ⚠pd avx
- Stores packed double-precision (64-bit) floating-point elements from
a
into memory usingmask
. - _mm256_
maskstore_ ⚠ps avx
- Stores packed single-precision (32-bit) floating-point elements from
a
into memory usingmask
. - _mm256_
max_ ⚠pd avx
- Compares packed double-precision (64-bit) floating-point elements
in
a
andb
, and returns packed maximum values - _mm256_
max_ ⚠ps avx
- Compares packed single-precision (32-bit) floating-point elements in
a
andb
, and returns packed maximum values - _mm256_
min_ ⚠pd avx
- Compares packed double-precision (64-bit) floating-point elements
in
a
andb
, and returns packed minimum values - _mm256_
min_ ⚠ps avx
- Compares packed single-precision (32-bit) floating-point elements in
a
andb
, and returns packed minimum values - _mm256_
movedup_ ⚠pd avx
- Duplicate even-indexed double-precision (64-bit) floating-point elements
from
a
, and returns the results. - _mm256_
movehdup_ ⚠ps avx
- Duplicate odd-indexed single-precision (32-bit) floating-point elements
from
a
, and returns the results. - _mm256_
moveldup_ ⚠ps avx
- Duplicate even-indexed single-precision (32-bit) floating-point elements
from
a
, and returns the results. - _mm256_
movemask_ ⚠pd avx
- Sets each bit of the returned mask based on the most significant bit of the
corresponding packed double-precision (64-bit) floating-point element in
a
. - _mm256_
movemask_ ⚠ps avx
- Sets each bit of the returned mask based on the most significant bit of the
corresponding packed single-precision (32-bit) floating-point element in
a
. - _mm256_
mul_ ⚠pd avx
- Multiplies packed double-precision (64-bit) floating-point elements
in
a
andb
. - _mm256_
mul_ ⚠ps avx
- Multiplies packed single-precision (32-bit) floating-point elements in
a
andb
. - _mm256_
or_ ⚠pd avx
- Computes the bitwise OR packed double-precision (64-bit) floating-point
elements in
a
andb
. - _mm256_
or_ ⚠ps avx
- Computes the bitwise OR packed single-precision (32-bit) floating-point
elements in
a
andb
. - _mm256_
permute2f128_ ⚠pd avx
- Shuffles 256 bits (composed of 4 packed double-precision (64-bit)
floating-point elements) selected by
imm8
froma
andb
. - _mm256_
permute2f128_ ⚠ps avx
- Shuffles 256 bits (composed of 8 packed single-precision (32-bit)
floating-point elements) selected by
imm8
froma
andb
. - _mm256_
permute2f128_ ⚠si256 avx
- Shuffles 128-bits (composed of integer data) selected by
imm8
froma
andb
. - _mm256_
permute_ ⚠pd avx
- Shuffles double-precision (64-bit) floating-point elements in
a
within 128-bit lanes using the control inimm8
. - _mm256_
permute_ ⚠ps avx
- Shuffles single-precision (32-bit) floating-point elements in
a
within 128-bit lanes using the control inimm8
. - _mm256_
permutevar_ ⚠pd avx
- Shuffles double-precision (64-bit) floating-point elements in
a
within 256-bit lanes using the control inb
. - _mm256_
permutevar_ ⚠ps avx
- Shuffles single-precision (32-bit) floating-point elements in
a
within 128-bit lanes using the control inb
. - _mm256_
rcp_ ⚠ps avx
- Computes the approximate reciprocal of packed single-precision (32-bit)
floating-point elements in
a
, and returns the results. The maximum relative error for this approximation is less than 1.5*2^-12. - _mm256_
round_ ⚠pd avx
- Rounds packed double-precision (64-bit) floating point elements in
a
according to the flagROUNDING
. The value ofROUNDING
may be as follows: - _mm256_
round_ ⚠ps avx
- Rounds packed single-precision (32-bit) floating point elements in
a
according to the flagROUNDING
. The value ofROUNDING
may be as follows: - _mm256_
rsqrt_ ⚠ps avx
- Computes the approximate reciprocal square root of packed single-precision
(32-bit) floating-point elements in
a
, and returns the results. The maximum relative error for this approximation is less than 1.5*2^-12. - _mm256_
set1_ ⚠epi8 avx
- Broadcasts 8-bit integer
a
to all elements of returned vector. This intrinsic may generate thevpbroadcastb
. - _mm256_
set1_ ⚠epi16 avx
- Broadcasts 16-bit integer
a
to all elements of returned vector. This intrinsic may generate thevpbroadcastw
. - _mm256_
set1_ ⚠epi32 avx
- Broadcasts 32-bit integer
a
to all elements of returned vector. This intrinsic may generate thevpbroadcastd
. - _mm256_
set1_ ⚠epi64x avx
- Broadcasts 64-bit integer
a
to all elements of returned vector. This intrinsic may generate thevpbroadcastq
. - _mm256_
set1_ ⚠pd avx
- Broadcasts double-precision (64-bit) floating-point value
a
to all elements of returned vector. - _mm256_
set1_ ⚠ps avx
- Broadcasts single-precision (32-bit) floating-point value
a
to all elements of returned vector. - _mm256_
set_ ⚠epi8 avx
- Sets packed 8-bit integers in returned vector with the supplied values.
- _mm256_
set_ ⚠epi16 avx
- Sets packed 16-bit integers in returned vector with the supplied values.
- _mm256_
set_ ⚠epi32 avx
- Sets packed 32-bit integers in returned vector with the supplied values.
- _mm256_
set_ ⚠epi64x avx
- Sets packed 64-bit integers in returned vector with the supplied values.
- _mm256_
set_ ⚠m128 avx
- Sets packed __m256 returned vector with the supplied values.
- _mm256_
set_ ⚠m128d avx
- Sets packed __m256d returned vector with the supplied values.
- _mm256_
set_ ⚠m128i avx
- Sets packed __m256i returned vector with the supplied values.
- _mm256_
set_ ⚠pd avx
- Sets packed double-precision (64-bit) floating-point elements in returned vector with the supplied values.
- _mm256_
set_ ⚠ps avx
- Sets packed single-precision (32-bit) floating-point elements in returned vector with the supplied values.
- _mm256_
setr_ ⚠epi8 avx
- Sets packed 8-bit integers in returned vector with the supplied values in reverse order.
- _mm256_
setr_ ⚠epi16 avx
- Sets packed 16-bit integers in returned vector with the supplied values in reverse order.
- _mm256_
setr_ ⚠epi32 avx
- Sets packed 32-bit integers in returned vector with the supplied values in reverse order.
- _mm256_
setr_ ⚠epi64x avx
- Sets packed 64-bit integers in returned vector with the supplied values in reverse order.
- _mm256_
setr_ ⚠m128 avx
- Sets packed __m256 returned vector with the supplied values.
- _mm256_
setr_ ⚠m128d avx
- Sets packed __m256d returned vector with the supplied values.
- _mm256_
setr_ ⚠m128i avx
- Sets packed __m256i returned vector with the supplied values.
- _mm256_
setr_ ⚠pd avx
- Sets packed double-precision (64-bit) floating-point elements in returned vector with the supplied values in reverse order.
- _mm256_
setr_ ⚠ps avx
- Sets packed single-precision (32-bit) floating-point elements in returned vector with the supplied values in reverse order.
- _mm256_
setzero_ ⚠pd avx
- Returns vector of type __m256d with all elements set to zero.
- _mm256_
setzero_ ⚠ps avx
- Returns vector of type __m256 with all elements set to zero.
- _mm256_
setzero_ ⚠si256 avx
- Returns vector of type __m256i with all elements set to zero.
- _mm256_
shuffle_ ⚠pd avx
- Shuffles double-precision (64-bit) floating-point elements within 128-bit
lanes using the control in
imm8
. - _mm256_
shuffle_ ⚠ps avx
- Shuffles single-precision (32-bit) floating-point elements in
a
within 128-bit lanes using the control inimm8
. - _mm256_
sqrt_ ⚠pd avx
- Returns the square root of packed double-precision (64-bit) floating point
elements in
a
. - _mm256_
sqrt_ ⚠ps avx
- Returns the square root of packed single-precision (32-bit) floating point
elements in
a
. - _mm256_
store_ ⚠pd avx
- Stores 256-bits (composed of 4 packed double-precision (64-bit)
floating-point elements) from
a
into memory.mem_addr
must be aligned on a 32-byte boundary or a general-protection exception may be generated. - _mm256_
store_ ⚠ps avx
- Stores 256-bits (composed of 8 packed single-precision (32-bit)
floating-point elements) from
a
into memory.mem_addr
must be aligned on a 32-byte boundary or a general-protection exception may be generated. - _mm256_
store_ ⚠si256 avx
- Stores 256-bits of integer data from
a
into memory.mem_addr
must be aligned on a 32-byte boundary or a general-protection exception may be generated. - _mm256_
storeu2_ ⚠m128 avx
- Stores the high and low 128-bit halves (each composed of 4 packed
single-precision (32-bit) floating-point elements) from
a
into memory two different 128-bit locations.hiaddr
andloaddr
do not need to be aligned on any particular boundary. - _mm256_
storeu2_ ⚠m128d avx
- Stores the high and low 128-bit halves (each composed of 2 packed
double-precision (64-bit) floating-point elements) from
a
into memory two different 128-bit locations.hiaddr
andloaddr
do not need to be aligned on any particular boundary. - _mm256_
storeu2_ ⚠m128i avx
- Stores the high and low 128-bit halves (each composed of integer data) from
a
into memory two different 128-bit locations.hiaddr
andloaddr
do not need to be aligned on any particular boundary. - _mm256_
storeu_ ⚠pd avx
- Stores 256-bits (composed of 4 packed double-precision (64-bit)
floating-point elements) from
a
into memory.mem_addr
does not need to be aligned on any particular boundary. - _mm256_
storeu_ ⚠ps avx
- Stores 256-bits (composed of 8 packed single-precision (32-bit)
floating-point elements) from
a
into memory.mem_addr
does not need to be aligned on any particular boundary. - _mm256_
storeu_ ⚠si256 avx
- Stores 256-bits of integer data from
a
into memory.mem_addr
does not need to be aligned on any particular boundary. - _mm256_
stream_ ⚠pd avx
- Moves double-precision values from a 256-bit vector of
[4 x double]
to a 32-byte aligned memory location. To minimize caching, the data is flagged as non-temporal (unlikely to be used again soon). - _mm256_
stream_ ⚠ps avx
- Moves single-precision floating point values from a 256-bit vector
of
[8 x float]
to a 32-byte aligned memory location. To minimize caching, the data is flagged as non-temporal (unlikely to be used again soon). - _mm256_
stream_ ⚠si256 avx
- Moves integer data from a 256-bit integer vector to a 32-byte aligned memory location. To minimize caching, the data is flagged as non-temporal (unlikely to be used again soon)
- _mm256_
sub_ ⚠pd avx
- Subtracts packed double-precision (64-bit) floating-point elements in
b
from packed elements ina
. - _mm256_
sub_ ⚠ps avx
- Subtracts packed single-precision (32-bit) floating-point elements in
b
from packed elements ina
. - _mm256_
testc_ ⚠pd avx
- Computes the bitwise AND of 256 bits (representing double-precision (64-bit)
floating-point elements) in
a
andb
, producing an intermediate 256-bit value, and setZF
to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise setZF
to 0. Compute the bitwise NOT ofa
and then AND withb
, producing an intermediate value, and setCF
to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise setCF
to 0. Return theCF
value. - _mm256_
testc_ ⚠ps avx
- Computes the bitwise AND of 256 bits (representing single-precision (32-bit)
floating-point elements) in
a
andb
, producing an intermediate 256-bit value, and setZF
to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise setZF
to 0. Compute the bitwise NOT ofa
and then AND withb
, producing an intermediate value, and setCF
to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise setCF
to 0. Return theCF
value. - _mm256_
testc_ ⚠si256 avx
- Computes the bitwise AND of 256 bits (representing integer data) in
a
andb
, and setZF
to 1 if the result is zero, otherwise setZF
to 0. Computes the bitwise NOT ofa
and then AND withb
, and setCF
to 1 if the result is zero, otherwise setCF
to 0. Return theCF
value. - _mm256_
testnzc_ ⚠pd avx
- Computes the bitwise AND of 256 bits (representing double-precision (64-bit)
floating-point elements) in
a
andb
, producing an intermediate 256-bit value, and setZF
to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise setZF
to 0. Compute the bitwise NOT ofa
and then AND withb
, producing an intermediate value, and setCF
to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise setCF
to 0. Return 1 if both theZF
andCF
values are zero, otherwise return 0. - _mm256_
testnzc_ ⚠ps avx
- Computes the bitwise AND of 256 bits (representing single-precision (32-bit)
floating-point elements) in
a
andb
, producing an intermediate 256-bit value, and setZF
to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise setZF
to 0. Compute the bitwise NOT ofa
and then AND withb
, producing an intermediate value, and setCF
to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise setCF
to 0. Return 1 if both theZF
andCF
values are zero, otherwise return 0. - _mm256_
testnzc_ ⚠si256 avx
- Computes the bitwise AND of 256 bits (representing integer data) in
a
andb
, and setZF
to 1 if the result is zero, otherwise setZF
to 0. Computes the bitwise NOT ofa
and then AND withb
, and setCF
to 1 if the result is zero, otherwise setCF
to 0. Return 1 if both theZF
andCF
values are zero, otherwise return 0. - _mm256_
testz_ ⚠pd avx
- Computes the bitwise AND of 256 bits (representing double-precision (64-bit)
floating-point elements) in
a
andb
, producing an intermediate 256-bit value, and setZF
to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise setZF
to 0. Compute the bitwise NOT ofa
and then AND withb
, producing an intermediate value, and setCF
to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise setCF
to 0. Return theZF
value. - _mm256_
testz_ ⚠ps avx
- Computes the bitwise AND of 256 bits (representing single-precision (32-bit)
floating-point elements) in
a
andb
, producing an intermediate 256-bit value, and setZF
to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise setZF
to 0. Compute the bitwise NOT ofa
and then AND withb
, producing an intermediate value, and setCF
to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise setCF
to 0. Return theZF
value. - _mm256_
testz_ ⚠si256 avx
- Computes the bitwise AND of 256 bits (representing integer data) in
a
andb
, and setZF
to 1 if the result is zero, otherwise setZF
to 0. Computes the bitwise NOT ofa
and then AND withb
, and setCF
to 1 if the result is zero, otherwise setCF
to 0. Return theZF
value. - _mm256_
undefined_ ⚠pd avx
- Returns vector of type
__m256d
with indeterminate elements. Despite using the word “undefined” (following Intel’s naming scheme), this non-deterministically picks some valid value and is not equivalent tomem::MaybeUninit
. In practice, this is typically equivalent tomem::zeroed
. - _mm256_
undefined_ ⚠ps avx
- Returns vector of type
__m256
with indeterminate elements. Despite using the word “undefined” (following Intel’s naming scheme), this non-deterministically picks some valid value and is not equivalent tomem::MaybeUninit
. In practice, this is typically equivalent tomem::zeroed
. - _mm256_
undefined_ ⚠si256 avx
- Returns vector of type __m256i with with indeterminate elements.
Despite using the word “undefined” (following Intel’s naming scheme), this non-deterministically
picks some valid value and is not equivalent to
mem::MaybeUninit
. In practice, this is typically equivalent tomem::zeroed
. - _mm256_
unpackhi_ ⚠pd avx
- Unpacks and interleave double-precision (64-bit) floating-point elements
from the high half of each 128-bit lane in
a
andb
. - _mm256_
unpackhi_ ⚠ps avx
- Unpacks and interleave single-precision (32-bit) floating-point elements
from the high half of each 128-bit lane in
a
andb
. - _mm256_
unpacklo_ ⚠pd avx
- Unpacks and interleave double-precision (64-bit) floating-point elements
from the low half of each 128-bit lane in
a
andb
. - _mm256_
unpacklo_ ⚠ps avx
- Unpacks and interleave single-precision (32-bit) floating-point elements
from the low half of each 128-bit lane in
a
andb
. - _mm256_
xor_ ⚠pd avx
- Computes the bitwise XOR of packed double-precision (64-bit) floating-point
elements in
a
andb
. - _mm256_
xor_ ⚠ps avx
- Computes the bitwise XOR of packed single-precision (32-bit) floating-point
elements in
a
andb
. - _mm256_
zeroall ⚠avx
- Zeroes the contents of all XMM or YMM registers.
- _mm256_
zeroupper ⚠avx
- Zeroes the upper 128 bits of all YMM registers; the lower 128-bits of the registers are unmodified.
- _mm256_
zextpd128_ ⚠pd256 avx
- Constructs a 256-bit floating-point vector of
[4 x double]
from a 128-bit floating-point vector of[2 x double]
. The lower 128 bits contain the value of the source vector. The upper 128 bits are set to zero. - _mm256_
zextps128_ ⚠ps256 avx
- Constructs a 256-bit floating-point vector of
[8 x float]
from a 128-bit floating-point vector of[4 x float]
. The lower 128 bits contain the value of the source vector. The upper 128 bits are set to zero. - _mm256_
zextsi128_ ⚠si256 avx
- Constructs a 256-bit integer vector from a 128-bit integer vector. The lower 128 bits contain the value of the source vector. The upper 128 bits are set to zero.
- _mm_
broadcast_ ⚠ss avx
- Broadcasts a single-precision (32-bit) floating-point element from memory to all elements of the returned vector.
- _mm_
cmp_ ⚠pd avx
- Compares packed double-precision (64-bit) floating-point
elements in
a
andb
based on the comparison operand specified byIMM5
. - _mm_
cmp_ ⚠ps avx
- Compares packed single-precision (32-bit) floating-point
elements in
a
andb
based on the comparison operand specified byIMM5
. - _mm_
cmp_ ⚠sd avx
- Compares the lower double-precision (64-bit) floating-point element in
a
andb
based on the comparison operand specified byIMM5
, store the result in the lower element of returned vector, and copies the upper element froma
to the upper element of returned vector. - _mm_
cmp_ ⚠ss avx
- Compares the lower single-precision (32-bit) floating-point element in
a
andb
based on the comparison operand specified byIMM5
, store the result in the lower element of returned vector, and copies the upper 3 packed elements froma
to the upper elements of returned vector. - _mm_
maskload_ ⚠pd avx
- Loads packed double-precision (64-bit) floating-point elements from memory
into result using
mask
(elements are zeroed out when the high bit of the corresponding element is not set). - _mm_
maskload_ ⚠ps avx
- Loads packed single-precision (32-bit) floating-point elements from memory
into result using
mask
(elements are zeroed out when the high bit of the corresponding element is not set). - _mm_
maskstore_ ⚠pd avx
- Stores packed double-precision (64-bit) floating-point elements from
a
into memory usingmask
. - _mm_
maskstore_ ⚠ps avx
- Stores packed single-precision (32-bit) floating-point elements from
a
into memory usingmask
. - _mm_
permute_ ⚠pd avx
- Shuffles double-precision (64-bit) floating-point elements in
a
using the control inimm8
. - _mm_
permute_ ⚠ps avx
- Shuffles single-precision (32-bit) floating-point elements in
a
using the control inimm8
. - _mm_
permutevar_ ⚠pd avx
- Shuffles double-precision (64-bit) floating-point elements in
a
using the control inb
. - _mm_
permutevar_ ⚠ps avx
- Shuffles single-precision (32-bit) floating-point elements in
a
using the control inb
. - _mm_
testc_ ⚠pd avx
- Computes the bitwise AND of 128 bits (representing double-precision (64-bit)
floating-point elements) in
a
andb
, producing an intermediate 128-bit value, and setZF
to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise setZF
to 0. Compute the bitwise NOT ofa
and then AND withb
, producing an intermediate value, and setCF
to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise setCF
to 0. Return theCF
value. - _mm_
testc_ ⚠ps avx
- Computes the bitwise AND of 128 bits (representing single-precision (32-bit)
floating-point elements) in
a
andb
, producing an intermediate 128-bit value, and setZF
to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise setZF
to 0. Compute the bitwise NOT ofa
and then AND withb
, producing an intermediate value, and setCF
to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise setCF
to 0. Return theCF
value. - _mm_
testnzc_ ⚠pd avx
- Computes the bitwise AND of 128 bits (representing double-precision (64-bit)
floating-point elements) in
a
andb
, producing an intermediate 128-bit value, and setZF
to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise setZF
to 0. Compute the bitwise NOT ofa
and then AND withb
, producing an intermediate value, and setCF
to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise setCF
to 0. Return 1 if both theZF
andCF
values are zero, otherwise return 0. - _mm_
testnzc_ ⚠ps avx
- Computes the bitwise AND of 128 bits (representing single-precision (32-bit)
floating-point elements) in
a
andb
, producing an intermediate 128-bit value, and setZF
to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise setZF
to 0. Compute the bitwise NOT ofa
and then AND withb
, producing an intermediate value, and setCF
to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise setCF
to 0. Return 1 if both theZF
andCF
values are zero, otherwise return 0. - _mm_
testz_ ⚠pd avx
- Computes the bitwise AND of 128 bits (representing double-precision (64-bit)
floating-point elements) in
a
andb
, producing an intermediate 128-bit value, and setZF
to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise setZF
to 0. Compute the bitwise NOT ofa
and then AND withb
, producing an intermediate value, and setCF
to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise setCF
to 0. Return theZF
value. - _mm_
testz_ ⚠ps avx
- Computes the bitwise AND of 128 bits (representing single-precision (32-bit)
floating-point elements) in
a
andb
, producing an intermediate 128-bit value, and setZF
to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise setZF
to 0. Compute the bitwise NOT ofa
and then AND withb
, producing an intermediate value, and setCF
to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise setCF
to 0. Return theZF
value. - maskloadpd 🔒 ⚠
- maskloadpd256 🔒 ⚠
- maskloadps 🔒 ⚠
- maskloadps256 🔒 ⚠
- maskstorepd 🔒 ⚠
- maskstorepd256 🔒 ⚠
- maskstoreps 🔒 ⚠
- maskstoreps256 🔒 ⚠
- ptestc256 🔒 ⚠
- ptestnzc256 🔒 ⚠
- ptestz256 🔒 ⚠
- roundpd256 🔒 ⚠
- roundps256 🔒 ⚠
- vcmppd 🔒 ⚠
- vcmppd256 🔒 ⚠
- vcmpps 🔒 ⚠
- vcmpps256 🔒 ⚠
- vcmpsd 🔒 ⚠
- vcmpss 🔒 ⚠
- vcvtpd2dq 🔒 ⚠
- vcvtps2dq 🔒 ⚠
- vcvttpd2dq 🔒 ⚠
- vcvttps2dq 🔒 ⚠
- vdpps 🔒 ⚠
- vhaddpd 🔒 ⚠
- vhaddps 🔒 ⚠
- vhsubpd 🔒 ⚠
- vhsubps 🔒 ⚠
- vlddqu 🔒 ⚠
- vmaxpd 🔒 ⚠
- vmaxps 🔒 ⚠
- vminpd 🔒 ⚠
- vminps 🔒 ⚠
- vperm2f128pd256 🔒 ⚠
- vperm2f128ps256 🔒 ⚠
- vperm2f128si256 🔒 ⚠
- vpermilpd 🔒 ⚠
- vpermilpd256 🔒 ⚠
- vpermilps 🔒 ⚠
- vpermilps256 🔒 ⚠
- vrcpps 🔒 ⚠
- vrsqrtps 🔒 ⚠
- vtestcpd 🔒 ⚠
- vtestcpd256 🔒 ⚠
- vtestcps 🔒 ⚠
- vtestcps256 🔒 ⚠
- vtestnzcpd 🔒 ⚠
- vtestnzcpd256 🔒 ⚠
- vtestnzcps 🔒 ⚠
- vtestnzcps256 🔒 ⚠
- vtestzpd 🔒 ⚠
- vtestzpd256 🔒 ⚠
- vtestzps 🔒 ⚠
- vtestzps256 🔒 ⚠
- vzeroall 🔒 ⚠
- vzeroupper 🔒 ⚠