Module fma

Source
Available on x86 or x86-64 only.
Expand description

Fused Multiply-Add instruction set (FMA)

The FMA instruction set is an extension to the 128 and 256-bit SSE instructions in the x86 microprocessor instruction set to perform fused multiply–add (FMA) operations.

The references are:

Wikipedia’s FMA page provides a quick overview of the instructions available.

Functions§

_mm256_fmadd_pdfma
Multiplies packed double-precision (64-bit) floating-point elements in a and b, and add the intermediate result to packed elements in c.
_mm256_fmadd_psfma
Multiplies packed single-precision (32-bit) floating-point elements in a and b, and add the intermediate result to packed elements in c.
_mm256_fmaddsub_pdfma
Multiplies packed double-precision (64-bit) floating-point elements in a and b, and alternatively add and subtract packed elements in c to/from the intermediate result.
_mm256_fmaddsub_psfma
Multiplies packed single-precision (32-bit) floating-point elements in a and b, and alternatively add and subtract packed elements in c to/from the intermediate result.
_mm256_fmsub_pdfma
Multiplies packed double-precision (64-bit) floating-point elements in a and b, and subtract packed elements in c from the intermediate result.
_mm256_fmsub_psfma
Multiplies packed single-precision (32-bit) floating-point elements in a and b, and subtract packed elements in c from the intermediate result.
_mm256_fmsubadd_pdfma
Multiplies packed double-precision (64-bit) floating-point elements in a and b, and alternatively subtract and add packed elements in c from/to the intermediate result.
_mm256_fmsubadd_psfma
Multiplies packed single-precision (32-bit) floating-point elements in a and b, and alternatively subtract and add packed elements in c from/to the intermediate result.
_mm256_fnmadd_pdfma
Multiplies packed double-precision (64-bit) floating-point elements in a and b, and add the negated intermediate result to packed elements in c.
_mm256_fnmadd_psfma
Multiplies packed single-precision (32-bit) floating-point elements in a and b, and add the negated intermediate result to packed elements in c.
_mm256_fnmsub_pdfma
Multiplies packed double-precision (64-bit) floating-point elements in a and b, and subtract packed elements in c from the negated intermediate result.
_mm256_fnmsub_psfma
Multiplies packed single-precision (32-bit) floating-point elements in a and b, and subtract packed elements in c from the negated intermediate result.
_mm_fmadd_pdfma
Multiplies packed double-precision (64-bit) floating-point elements in a and b, and add the intermediate result to packed elements in c.
_mm_fmadd_psfma
Multiplies packed single-precision (32-bit) floating-point elements in a and b, and add the intermediate result to packed elements in c.
_mm_fmadd_sdfma
Multiplies the lower double-precision (64-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Stores the result in the lower element of the returned value, and copy the upper element from a to the upper elements of the result.
_mm_fmadd_ssfma
Multiplies the lower single-precision (32-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Stores the result in the lower element of the returned value, and copy the 3 upper elements from a to the upper elements of the result.
_mm_fmaddsub_pdfma
Multiplies packed double-precision (64-bit) floating-point elements in a and b, and alternatively add and subtract packed elements in c to/from the intermediate result.
_mm_fmaddsub_psfma
Multiplies packed single-precision (32-bit) floating-point elements in a and b, and alternatively add and subtract packed elements in c to/from the intermediate result.
_mm_fmsub_pdfma
Multiplies packed double-precision (64-bit) floating-point elements in a and b, and subtract packed elements in c from the intermediate result.
_mm_fmsub_psfma
Multiplies packed single-precision (32-bit) floating-point elements in a and b, and subtract packed elements in c from the intermediate result.
_mm_fmsub_sdfma
Multiplies the lower double-precision (64-bit) floating-point elements in a and b, and subtract the lower element in c from the intermediate result. Store the result in the lower element of the returned value, and copy the upper element from a to the upper elements of the result.
_mm_fmsub_ssfma
Multiplies the lower single-precision (32-bit) floating-point elements in a and b, and subtract the lower element in c from the intermediate result. Store the result in the lower element of the returned value, and copy the 3 upper elements from a to the upper elements of the result.
_mm_fmsubadd_pdfma
Multiplies packed double-precision (64-bit) floating-point elements in a and b, and alternatively subtract and add packed elements in c from/to the intermediate result.
_mm_fmsubadd_psfma
Multiplies packed single-precision (32-bit) floating-point elements in a and b, and alternatively subtract and add packed elements in c from/to the intermediate result.
_mm_fnmadd_pdfma
Multiplies packed double-precision (64-bit) floating-point elements in a and b, and add the negated intermediate result to packed elements in c.
_mm_fnmadd_psfma
Multiplies packed single-precision (32-bit) floating-point elements in a and b, and add the negated intermediate result to packed elements in c.
_mm_fnmadd_sdfma
Multiplies the lower double-precision (64-bit) floating-point elements in a and b, and add the negated intermediate result to the lower element in c. Store the result in the lower element of the returned value, and copy the upper element from a to the upper elements of the result.
_mm_fnmadd_ssfma
Multiplies the lower single-precision (32-bit) floating-point elements in a and b, and add the negated intermediate result to the lower element in c. Store the result in the lower element of the returned value, and copy the 3 upper elements from a to the upper elements of the result.
_mm_fnmsub_pdfma
Multiplies packed double-precision (64-bit) floating-point elements in a and b, and subtract packed elements in c from the negated intermediate result.
_mm_fnmsub_psfma
Multiplies packed single-precision (32-bit) floating-point elements in a and b, and subtract packed elements in c from the negated intermediate result.
_mm_fnmsub_sdfma
Multiplies the lower double-precision (64-bit) floating-point elements in a and b, and subtract packed elements in c from the negated intermediate result. Store the result in the lower element of the returned value, and copy the upper element from a to the upper elements of the result.
_mm_fnmsub_ssfma
Multiplies the lower single-precision (32-bit) floating-point elements in a and b, and subtract packed elements in c from the negated intermediate result. Store the result in the lower element of the returned value, and copy the 3 upper elements from a to the upper elements of the result.