Available on x86 or x86-64 only.
Expand description
Streaming SIMD Extensions (SSE)
Constants§
- _MM_
EXCEPT_ DENORM - See
_mm_setcsr
- _MM_
EXCEPT_ DIV_ ZERO - See
_mm_setcsr
- _MM_
EXCEPT_ INEXACT - See
_mm_setcsr
- _MM_
EXCEPT_ INVALID - See
_mm_setcsr
- _MM_
EXCEPT_ MASK - See
_MM_GET_EXCEPTION_STATE
- _MM_
EXCEPT_ OVERFLOW - See
_mm_setcsr
- _MM_
EXCEPT_ UNDERFLOW - See
_mm_setcsr
- _MM_
FLUSH_ ZERO_ MASK - See
_MM_GET_FLUSH_ZERO_MODE
- _MM_
FLUSH_ ZERO_ OFF - See
_mm_setcsr
- _MM_
FLUSH_ ZERO_ ON - See
_mm_setcsr
- _MM_
HINT_ ET0 - See
_mm_prefetch
. - _MM_
HINT_ ET1 - See
_mm_prefetch
. - _MM_
HINT_ NTA - See
_mm_prefetch
. - _MM_
HINT_ T0 - See
_mm_prefetch
. - _MM_
HINT_ T1 - See
_mm_prefetch
. - _MM_
HINT_ T2 - See
_mm_prefetch
. - _MM_
MASK_ DENORM - See
_mm_setcsr
- _MM_
MASK_ DIV_ ZERO - See
_mm_setcsr
- _MM_
MASK_ INEXACT - See
_mm_setcsr
- _MM_
MASK_ INVALID - See
_mm_setcsr
- _MM_
MASK_ MASK - See
_MM_GET_EXCEPTION_MASK
- _MM_
MASK_ OVERFLOW - See
_mm_setcsr
- _MM_
MASK_ UNDERFLOW - See
_mm_setcsr
- _MM_
ROUND_ DOWN - See
_mm_setcsr
- _MM_
ROUND_ MASK - See
_MM_GET_ROUNDING_MODE
- _MM_
ROUND_ NEAREST - See
_mm_setcsr
- _MM_
ROUND_ TOWARD_ ZERO - See
_mm_setcsr
- _MM_
ROUND_ UP - See
_mm_setcsr
Functions§
- _MM_
GET_ ⚠EXCEPTION_ MASK Deprecated sse
- See
_mm_setcsr
- _MM_
GET_ ⚠EXCEPTION_ STATE Deprecated sse
- See
_mm_setcsr
- _MM_
GET_ ⚠FLUSH_ ZERO_ MODE Deprecated sse
- See
_mm_setcsr
- _MM_
GET_ ⚠ROUNDING_ MODE Deprecated sse
- See
_mm_setcsr
- _MM_
SET_ ⚠EXCEPTION_ MASK Deprecated sse
- See
_mm_setcsr
- _MM_
SET_ ⚠EXCEPTION_ STATE Deprecated sse
- See
_mm_setcsr
- _MM_
SET_ ⚠FLUSH_ ZERO_ MODE Deprecated sse
- See
_mm_setcsr
- _MM_
SET_ ⚠ROUNDING_ MODE Deprecated sse
- See
_mm_setcsr
- _MM_
TRANSPOS ⚠E4_ PS sse
- Transpose the 4x4 matrix formed by 4 rows of __m128 in place.
- _mm_
add_ ⚠ps sse
- Adds packed single-precision (32-bit) floating-point elements in
a
andb
. - _mm_
add_ ⚠ss sse
- Adds the first component of
a
andb
, the other components are copied froma
. - _mm_
and_ ⚠ps sse
- Bitwise AND of packed single-precision (32-bit) floating-point elements.
- _mm_
andnot_ ⚠ps sse
- Bitwise AND-NOT of packed single-precision (32-bit) floating-point elements.
- _mm_
cmpeq_ ⚠ps sse
- Compares each of the four floats in
a
to the corresponding element inb
. The result in the output vector will be0xffffffff
if the input elements were equal, or0
otherwise. - _mm_
cmpeq_ ⚠ss sse
- Compares the lowest
f32
of both inputs for equality. The lowest 32 bits of the result will be0xffffffff
if the two inputs are equal, or0
otherwise. The upper 96 bits of the result are the upper 96 bits ofa
. - _mm_
cmpge_ ⚠ps sse
- Compares each of the four floats in
a
to the corresponding element inb
. The result in the output vector will be0xffffffff
if the input element ina
is greater than or equal to the corresponding element inb
, or0
otherwise. - _mm_
cmpge_ ⚠ss sse
- Compares the lowest
f32
of both inputs for greater than or equal. The lowest 32 bits of the result will be0xffffffff
ifa.extract(0)
is greater than or equalb.extract(0)
, or0
otherwise. The upper 96 bits of the result are the upper 96 bits ofa
. - _mm_
cmpgt_ ⚠ps sse
- Compares each of the four floats in
a
to the corresponding element inb
. The result in the output vector will be0xffffffff
if the input element ina
is greater than the corresponding element inb
, or0
otherwise. - _mm_
cmpgt_ ⚠ss sse
- Compares the lowest
f32
of both inputs for greater than. The lowest 32 bits of the result will be0xffffffff
ifa.extract(0)
is greater thanb.extract(0)
, or0
otherwise. The upper 96 bits of the result are the upper 96 bits ofa
. - _mm_
cmple_ ⚠ps sse
- Compares each of the four floats in
a
to the corresponding element inb
. The result in the output vector will be0xffffffff
if the input element ina
is less than or equal to the corresponding element inb
, or0
otherwise. - _mm_
cmple_ ⚠ss sse
- Compares the lowest
f32
of both inputs for less than or equal. The lowest 32 bits of the result will be0xffffffff
ifa.extract(0)
is less than or equalb.extract(0)
, or0
otherwise. The upper 96 bits of the result are the upper 96 bits ofa
. - _mm_
cmplt_ ⚠ps sse
- Compares each of the four floats in
a
to the corresponding element inb
. The result in the output vector will be0xffffffff
if the input element ina
is less than the corresponding element inb
, or0
otherwise. - _mm_
cmplt_ ⚠ss sse
- Compares the lowest
f32
of both inputs for less than. The lowest 32 bits of the result will be0xffffffff
ifa.extract(0)
is less thanb.extract(0)
, or0
otherwise. The upper 96 bits of the result are the upper 96 bits ofa
. - _mm_
cmpneq_ ⚠ps sse
- Compares each of the four floats in
a
to the corresponding element inb
. The result in the output vector will be0xffffffff
if the input elements are not equal, or0
otherwise. - _mm_
cmpneq_ ⚠ss sse
- Compares the lowest
f32
of both inputs for inequality. The lowest 32 bits of the result will be0xffffffff
ifa.extract(0)
is not equal tob.extract(0)
, or0
otherwise. The upper 96 bits of the result are the upper 96 bits ofa
. - _mm_
cmpnge_ ⚠ps sse
- Compares each of the four floats in
a
to the corresponding element inb
. The result in the output vector will be0xffffffff
if the input element ina
is not greater than or equal to the corresponding element inb
, or0
otherwise. - _mm_
cmpnge_ ⚠ss sse
- Compares the lowest
f32
of both inputs for not-greater-than-or-equal. The lowest 32 bits of the result will be0xffffffff
ifa.extract(0)
is not greater than or equal tob.extract(0)
, or0
otherwise. The upper 96 bits of the result are the upper 96 bits ofa
. - _mm_
cmpngt_ ⚠ps sse
- Compares each of the four floats in
a
to the corresponding element inb
. The result in the output vector will be0xffffffff
if the input element ina
is not greater than the corresponding element inb
, or0
otherwise. - _mm_
cmpngt_ ⚠ss sse
- Compares the lowest
f32
of both inputs for not-greater-than. The lowest 32 bits of the result will be0xffffffff
ifa.extract(0)
is not greater thanb.extract(0)
, or0
otherwise. The upper 96 bits of the result are the upper 96 bits ofa
. - _mm_
cmpnle_ ⚠ps sse
- Compares each of the four floats in
a
to the corresponding element inb
. The result in the output vector will be0xffffffff
if the input element ina
is not less than or equal to the corresponding element inb
, or0
otherwise. - _mm_
cmpnle_ ⚠ss sse
- Compares the lowest
f32
of both inputs for not-less-than-or-equal. The lowest 32 bits of the result will be0xffffffff
ifa.extract(0)
is not less than or equal tob.extract(0)
, or0
otherwise. The upper 96 bits of the result are the upper 96 bits ofa
. - _mm_
cmpnlt_ ⚠ps sse
- Compares each of the four floats in
a
to the corresponding element inb
. The result in the output vector will be0xffffffff
if the input element ina
is not less than the corresponding element inb
, or0
otherwise. - _mm_
cmpnlt_ ⚠ss sse
- Compares the lowest
f32
of both inputs for not-less-than. The lowest 32 bits of the result will be0xffffffff
ifa.extract(0)
is not less thanb.extract(0)
, or0
otherwise. The upper 96 bits of the result are the upper 96 bits ofa
. - _mm_
cmpord_ ⚠ps sse
- Compares each of the four floats in
a
to the corresponding element inb
. Returns four floats that have one of two possible bit patterns. The element in the output vector will be0xffffffff
if the input elements ina
andb
are ordered (i.e., neither of them is a NaN), or 0 otherwise. - _mm_
cmpord_ ⚠ss sse
- Checks if the lowest
f32
of both inputs are ordered. The lowest 32 bits of the result will be0xffffffff
if neither ofa.extract(0)
orb.extract(0)
is a NaN, or0
otherwise. The upper 96 bits of the result are the upper 96 bits ofa
. - _mm_
cmpunord_ ⚠ps sse
- Compares each of the four floats in
a
to the corresponding element inb
. Returns four floats that have one of two possible bit patterns. The element in the output vector will be0xffffffff
if the input elements ina
andb
are unordered (i.e., at least on of them is a NaN), or 0 otherwise. - _mm_
cmpunord_ ⚠ss sse
- Checks if the lowest
f32
of both inputs are unordered. The lowest 32 bits of the result will be0xffffffff
if any ofa.extract(0)
orb.extract(0)
is a NaN, or0
otherwise. The upper 96 bits of the result are the upper 96 bits ofa
. - _mm_
comieq_ ⚠ss sse
- Compares two 32-bit floats from the low-order bits of
a
andb
. Returns1
if they are equal, or0
otherwise. - _mm_
comige_ ⚠ss sse
- Compares two 32-bit floats from the low-order bits of
a
andb
. Returns1
if the value froma
is greater than or equal to the one fromb
, or0
otherwise. - _mm_
comigt_ ⚠ss sse
- Compares two 32-bit floats from the low-order bits of
a
andb
. Returns1
if the value froma
is greater than the one fromb
, or0
otherwise. - _mm_
comile_ ⚠ss sse
- Compares two 32-bit floats from the low-order bits of
a
andb
. Returns1
if the value froma
is less than or equal to the one fromb
, or0
otherwise. - _mm_
comilt_ ⚠ss sse
- Compares two 32-bit floats from the low-order bits of
a
andb
. Returns1
if the value froma
is less than the one fromb
, or0
otherwise. - _mm_
comineq_ ⚠ss sse
- Compares two 32-bit floats from the low-order bits of
a
andb
. Returns1
if they are not equal, or0
otherwise. - _mm_
cvt_ ⚠si2ss sse
- Alias for
_mm_cvtsi32_ss
. - _mm_
cvt_ ⚠ss2si sse
- Alias for
_mm_cvtss_si32
. - _mm_
cvtsi32_ ⚠ss sse
- Converts a 32 bit integer to a 32 bit float. The result vector is the input
vector
a
with the lowest 32 bit float replaced by the converted integer. - _mm_
cvtss_ ⚠f32 sse
- Extracts the lowest 32 bit float from the input vector.
- _mm_
cvtss_ ⚠si32 sse
- Converts the lowest 32 bit float in the input vector to a 32 bit integer.
- _mm_
cvtt_ ⚠ss2si sse
- Alias for
_mm_cvttss_si32
. - _mm_
cvttss_ ⚠si32 sse
- Converts the lowest 32 bit float in the input vector to a 32 bit integer with truncation.
- _mm_
div_ ⚠ps sse
- Divides packed single-precision (32-bit) floating-point elements in
a
andb
. - _mm_
div_ ⚠ss sse
- Divides the first component of
b
bya
, the other components are copied froma
. - _mm_
getcsr ⚠Deprecated sse
- Gets the unsigned 32-bit value of the MXCSR control and status register.
- _mm_
load1_ ⚠ps sse
- Construct a
__m128
by duplicating the value read fromp
into all elements. - _mm_
load_ ⚠ps sse
- Loads four
f32
values from aligned memory into a__m128
. If the pointer is not aligned to a 128-bit boundary (16 bytes) a general protection fault will be triggered (fatal program crash). - _mm_
load_ ⚠ps1 sse
- Alias for
_mm_load1_ps
- _mm_
load_ ⚠ss sse
- Construct a
__m128
with the lowest element read fromp
and the other elements set to zero. - _mm_
loadr_ ⚠ps sse
- Loads four
f32
values from aligned memory into a__m128
in reverse order. - _mm_
loadu_ ⚠ps sse
- Loads four
f32
values from memory into a__m128
. There are no restrictions on memory alignment. For aligned memory_mm_load_ps
may be faster. - _mm_
max_ ⚠ps sse
- Compares packed single-precision (32-bit) floating-point elements in
a
andb
, and return the corresponding maximum values. - _mm_
max_ ⚠ss sse
- Compares the first single-precision (32-bit) floating-point element of
a
andb
, and return the maximum value in the first element of the return value, the other elements are copied froma
. - _mm_
min_ ⚠ps sse
- Compares packed single-precision (32-bit) floating-point elements in
a
andb
, and return the corresponding minimum values. - _mm_
min_ ⚠ss sse
- Compares the first single-precision (32-bit) floating-point element of
a
andb
, and return the minimum value in the first element of the return value, the other elements are copied froma
. - _mm_
move_ ⚠ss sse
- Returns a
__m128
with the first component fromb
and the remaining components froma
. - _mm_
movehl_ ⚠ps sse
- Combine higher half of
a
andb
. The higher half ofb
occupies the lower half of result. - _mm_
movelh_ ⚠ps sse
- Combine lower half of
a
andb
. The lower half ofb
occupies the higher half of result. - _mm_
movemask_ ⚠ps sse
- Returns a mask of the most significant bit of each element in
a
. - _mm_
mul_ ⚠ps sse
- Multiplies packed single-precision (32-bit) floating-point elements in
a
andb
. - _mm_
mul_ ⚠ss sse
- Multiplies the first component of
a
andb
, the other components are copied froma
. - _mm_
or_ ⚠ps sse
- Bitwise OR of packed single-precision (32-bit) floating-point elements.
- _mm_
prefetch ⚠sse
- Fetch the cache line that contains address
p
using the givenSTRATEGY
. - _mm_
rcp_ ⚠ps sse
- Returns the approximate reciprocal of packed single-precision (32-bit)
floating-point elements in
a
. - _mm_
rcp_ ⚠ss sse
- Returns the approximate reciprocal of the first single-precision
(32-bit) floating-point element in
a
, the other elements are unchanged. - _mm_
rsqrt_ ⚠ps sse
- Returns the approximate reciprocal square root of packed single-precision
(32-bit) floating-point elements in
a
. - _mm_
rsqrt_ ⚠ss sse
- Returns the approximate reciprocal square root of the first single-precision
(32-bit) floating-point element in
a
, the other elements are unchanged. - _mm_
set1_ ⚠ps sse
- Construct a
__m128
with all element set toa
. - _mm_
set_ ⚠ps sse
- Construct a
__m128
from four floating point values highest to lowest. - _mm_
set_ ⚠ps1 sse
- Alias for
_mm_set1_ps
- _mm_
set_ ⚠ss sse
- Construct a
__m128
with the lowest element set toa
and the rest set to zero. - _mm_
setcsr ⚠Deprecated sse
- Sets the MXCSR register with the 32-bit unsigned integer value.
- _mm_
setr_ ⚠ps sse
- Construct a
__m128
from four floating point values lowest to highest. - _mm_
setzero_ ⚠ps sse
- Construct a
__m128
with all elements initialized to zero. - _mm_
sfence ⚠sse
- Performs a serializing operation on all non-temporal (“streaming”) store instructions that were issued by the current thread prior to this instruction.
- _mm_
shuffle_ ⚠ps sse
- Shuffles packed single-precision (32-bit) floating-point elements in
a
andb
usingMASK
. - _mm_
sqrt_ ⚠ps sse
- Returns the square root of packed single-precision (32-bit) floating-point
elements in
a
. - _mm_
sqrt_ ⚠ss sse
- Returns the square root of the first single-precision (32-bit)
floating-point element in
a
, the other elements are unchanged. - _mm_
store1_ ⚠ps sse
- Stores the lowest 32 bit float of
a
repeated four times into aligned memory. - _mm_
store_ ⚠ps sse
- Stores four 32-bit floats into aligned memory.
- _mm_
store_ ⚠ps1 sse
- Alias for
_mm_store1_ps
- _mm_
store_ ⚠ss sse
- Stores the lowest 32 bit float of
a
into memory. - _mm_
storer_ ⚠ps sse
- Stores four 32-bit floats into aligned memory in reverse order.
- _mm_
storeu_ ⚠ps sse
- Stores four 32-bit floats into memory. There are no restrictions on memory
alignment. For aligned memory
_mm_store_ps
may be faster. - _mm_
stream_ ⚠ps sse
- Stores
a
into the memory atmem_addr
using a non-temporal memory hint. - _mm_
sub_ ⚠ps sse
- Subtracts packed single-precision (32-bit) floating-point elements in
a
andb
. - _mm_
sub_ ⚠ss sse
- Subtracts the first component of
b
froma
, the other components are copied froma
. - _mm_
ucomieq_ ⚠ss sse
- Compares two 32-bit floats from the low-order bits of
a
andb
. Returns1
if they are equal, or0
otherwise. This instruction will not signal an exception if either argument is a quiet NaN. - _mm_
ucomige_ ⚠ss sse
- Compares two 32-bit floats from the low-order bits of
a
andb
. Returns1
if the value froma
is greater than or equal to the one fromb
, or0
otherwise. This instruction will not signal an exception if either argument is a quiet NaN. - _mm_
ucomigt_ ⚠ss sse
- Compares two 32-bit floats from the low-order bits of
a
andb
. Returns1
if the value froma
is greater than the one fromb
, or0
otherwise. This instruction will not signal an exception if either argument is a quiet NaN. - _mm_
ucomile_ ⚠ss sse
- Compares two 32-bit floats from the low-order bits of
a
andb
. Returns1
if the value froma
is less than or equal to the one fromb
, or0
otherwise. This instruction will not signal an exception if either argument is a quiet NaN. - _mm_
ucomilt_ ⚠ss sse
- Compares two 32-bit floats from the low-order bits of
a
andb
. Returns1
if the value froma
is less than the one fromb
, or0
otherwise. This instruction will not signal an exception if either argument is a quiet NaN. - _mm_
ucomineq_ ⚠ss sse
- Compares two 32-bit floats from the low-order bits of
a
andb
. Returns1
if they are not equal, or0
otherwise. This instruction will not signal an exception if either argument is a quiet NaN. - _mm_
undefined_ ⚠ps sse
- Returns vector of type __m128 with indeterminate elements.with indetermination elements.
Despite using the word “undefined” (following Intel’s naming scheme), this non-deterministically
picks some valid value and is not equivalent to
mem::MaybeUninit
. In practice, this is typically equivalent tomem::zeroed
. - _mm_
unpackhi_ ⚠ps sse
- Unpacks and interleave single-precision (32-bit) floating-point elements
from the higher half of
a
andb
. - _mm_
unpacklo_ ⚠ps sse
- Unpacks and interleave single-precision (32-bit) floating-point elements
from the lower half of
a
andb
. - _mm_
xor_ ⚠ps sse
- Bitwise exclusive OR of packed single-precision (32-bit) floating-point elements.
- cmpps 🔒 ⚠
- cmpss 🔒 ⚠
- comieq_
ss 🔒 ⚠ - comige_
ss 🔒 ⚠ - comigt_
ss 🔒 ⚠ - comile_
ss 🔒 ⚠ - comilt_
ss 🔒 ⚠ - comineq_
ss 🔒 ⚠ - cvtsi2ss 🔒 ⚠
- cvtss2si 🔒 ⚠
- cvttss2si 🔒 ⚠
- ldmxcsr 🔒 ⚠
- maxps 🔒 ⚠
- maxss 🔒 ⚠
- minps 🔒 ⚠
- minss 🔒 ⚠
- prefetch 🔒 ⚠
- rcpps 🔒 ⚠
- rcpss 🔒 ⚠
- rsqrtps 🔒 ⚠
- rsqrtss 🔒 ⚠
- sfence 🔒 ⚠
- stmxcsr 🔒 ⚠
- ucomieq_
ss 🔒 ⚠ - ucomige_
ss 🔒 ⚠ - ucomigt_
ss 🔒 ⚠ - ucomile_
ss 🔒 ⚠ - ucomilt_
ss 🔒 ⚠ - ucomineq_
ss 🔒 ⚠ - _MM_
SHUFFLE Experimental - A utility function for creating masks to use with Intel shuffle and permute intrinsics.