Available on x86 or x86-64 only.
Expand description
Fused Multiply-Add instruction set (FMA)
The FMA instruction set is an extension to the 128 and 256-bit SSE instructions in the x86 microprocessor instruction set to perform fused multiply–add (FMA) operations.
The references are:
- Intel 64 and IA-32 Architectures Software Developer’s Manual Volume 2: Instruction Set Reference, A-Z.
- AMD64 Architecture Programmer’s Manual, Volume 3: General-Purpose and System Instructions.
Wikipedia’s FMA page provides a quick overview of the instructions available.
Functions§
- _mm256_
fmadd_ ⚠pd fma
- Multiplies packed double-precision (64-bit) floating-point elements in
a
andb
, and add the intermediate result to packed elements inc
. - _mm256_
fmadd_ ⚠ps fma
- Multiplies packed single-precision (32-bit) floating-point elements in
a
andb
, and add the intermediate result to packed elements inc
. - _mm256_
fmaddsub_ ⚠pd fma
- Multiplies packed double-precision (64-bit) floating-point elements in
a
andb
, and alternatively add and subtract packed elements inc
to/from the intermediate result. - _mm256_
fmaddsub_ ⚠ps fma
- Multiplies packed single-precision (32-bit) floating-point elements in
a
andb
, and alternatively add and subtract packed elements inc
to/from the intermediate result. - _mm256_
fmsub_ ⚠pd fma
- Multiplies packed double-precision (64-bit) floating-point elements in
a
andb
, and subtract packed elements inc
from the intermediate result. - _mm256_
fmsub_ ⚠ps fma
- Multiplies packed single-precision (32-bit) floating-point elements in
a
andb
, and subtract packed elements inc
from the intermediate result. - _mm256_
fmsubadd_ ⚠pd fma
- Multiplies packed double-precision (64-bit) floating-point elements in
a
andb
, and alternatively subtract and add packed elements inc
from/to the intermediate result. - _mm256_
fmsubadd_ ⚠ps fma
- Multiplies packed single-precision (32-bit) floating-point elements in
a
andb
, and alternatively subtract and add packed elements inc
from/to the intermediate result. - _mm256_
fnmadd_ ⚠pd fma
- Multiplies packed double-precision (64-bit) floating-point elements in
a
andb
, and add the negated intermediate result to packed elements inc
. - _mm256_
fnmadd_ ⚠ps fma
- Multiplies packed single-precision (32-bit) floating-point elements in
a
andb
, and add the negated intermediate result to packed elements inc
. - _mm256_
fnmsub_ ⚠pd fma
- Multiplies packed double-precision (64-bit) floating-point elements in
a
andb
, and subtract packed elements inc
from the negated intermediate result. - _mm256_
fnmsub_ ⚠ps fma
- Multiplies packed single-precision (32-bit) floating-point elements in
a
andb
, and subtract packed elements inc
from the negated intermediate result. - _mm_
fmadd_ ⚠pd fma
- Multiplies packed double-precision (64-bit) floating-point elements in
a
andb
, and add the intermediate result to packed elements inc
. - _mm_
fmadd_ ⚠ps fma
- Multiplies packed single-precision (32-bit) floating-point elements in
a
andb
, and add the intermediate result to packed elements inc
. - _mm_
fmadd_ ⚠sd fma
- Multiplies the lower double-precision (64-bit) floating-point elements in
a
andb
, and add the intermediate result to the lower element inc
. Stores the result in the lower element of the returned value, and copy the upper element froma
to the upper elements of the result. - _mm_
fmadd_ ⚠ss fma
- Multiplies the lower single-precision (32-bit) floating-point elements in
a
andb
, and add the intermediate result to the lower element inc
. Stores the result in the lower element of the returned value, and copy the 3 upper elements froma
to the upper elements of the result. - _mm_
fmaddsub_ ⚠pd fma
- Multiplies packed double-precision (64-bit) floating-point elements in
a
andb
, and alternatively add and subtract packed elements inc
to/from the intermediate result. - _mm_
fmaddsub_ ⚠ps fma
- Multiplies packed single-precision (32-bit) floating-point elements in
a
andb
, and alternatively add and subtract packed elements inc
to/from the intermediate result. - _mm_
fmsub_ ⚠pd fma
- Multiplies packed double-precision (64-bit) floating-point elements in
a
andb
, and subtract packed elements inc
from the intermediate result. - _mm_
fmsub_ ⚠ps fma
- Multiplies packed single-precision (32-bit) floating-point elements in
a
andb
, and subtract packed elements inc
from the intermediate result. - _mm_
fmsub_ ⚠sd fma
- Multiplies the lower double-precision (64-bit) floating-point elements in
a
andb
, and subtract the lower element inc
from the intermediate result. Store the result in the lower element of the returned value, and copy the upper element froma
to the upper elements of the result. - _mm_
fmsub_ ⚠ss fma
- Multiplies the lower single-precision (32-bit) floating-point elements in
a
andb
, and subtract the lower element inc
from the intermediate result. Store the result in the lower element of the returned value, and copy the 3 upper elements froma
to the upper elements of the result. - _mm_
fmsubadd_ ⚠pd fma
- Multiplies packed double-precision (64-bit) floating-point elements in
a
andb
, and alternatively subtract and add packed elements inc
from/to the intermediate result. - _mm_
fmsubadd_ ⚠ps fma
- Multiplies packed single-precision (32-bit) floating-point elements in
a
andb
, and alternatively subtract and add packed elements inc
from/to the intermediate result. - _mm_
fnmadd_ ⚠pd fma
- Multiplies packed double-precision (64-bit) floating-point elements in
a
andb
, and add the negated intermediate result to packed elements inc
. - _mm_
fnmadd_ ⚠ps fma
- Multiplies packed single-precision (32-bit) floating-point elements in
a
andb
, and add the negated intermediate result to packed elements inc
. - _mm_
fnmadd_ ⚠sd fma
- Multiplies the lower double-precision (64-bit) floating-point elements in
a
andb
, and add the negated intermediate result to the lower element inc
. Store the result in the lower element of the returned value, and copy the upper element froma
to the upper elements of the result. - _mm_
fnmadd_ ⚠ss fma
- Multiplies the lower single-precision (32-bit) floating-point elements in
a
andb
, and add the negated intermediate result to the lower element inc
. Store the result in the lower element of the returned value, and copy the 3 upper elements froma
to the upper elements of the result. - _mm_
fnmsub_ ⚠pd fma
- Multiplies packed double-precision (64-bit) floating-point elements in
a
andb
, and subtract packed elements inc
from the negated intermediate result. - _mm_
fnmsub_ ⚠ps fma
- Multiplies packed single-precision (32-bit) floating-point elements in
a
andb
, and subtract packed elements inc
from the negated intermediate result. - _mm_
fnmsub_ ⚠sd fma
- Multiplies the lower double-precision (64-bit) floating-point elements in
a
andb
, and subtract packed elements inc
from the negated intermediate result. Store the result in the lower element of the returned value, and copy the upper element froma
to the upper elements of the result. - _mm_
fnmsub_ ⚠ss fma
- Multiplies the lower single-precision (32-bit) floating-point elements in
a
andb
, and subtract packed elements inc
from the negated intermediate result. Store the result in the lower element of the returned value, and copy the 3 upper elements froma
to the upper elements of the result.