Module amx

Source

Available on x86-64 only.

Functions§

ldtilecfg 🔒 ^⚠
sttilecfg 🔒 ^⚠
tcmmimfp16ps 🔒 ^⚠
tcmmrlfp16ps 🔒 ^⚠
tdpbf16ps 🔒 ^⚠
tdpbssd 🔒 ^⚠
tdpbsud 🔒 ^⚠
tdpbusd 🔒 ^⚠
tdpbuud 🔒 ^⚠
tdpfp16ps 🔒 ^⚠
tileloadd64 🔒 ^⚠
tileloaddt164 🔒 ^⚠
tilerelease 🔒 ^⚠
tilestored64 🔒 ^⚠
tilezero 🔒 ^⚠
_tile_cmmimfp16ps^⚠Experimentalamx-complex: Perform matrix multiplication of two tiles containing complex elements and accumulate the results into a packed single precision tile. Each dword element in input tiles a and b is interpreted as a complex number with FP16 real part and FP16 imaginary part. Calculates the imaginary part of the result. For each possible combination of (row of a, column of b), it performs a set of multiplication and accumulations on all corresponding complex numbers (one from a and one from b). The imaginary part of the a element is multiplied with the real part of the corresponding b element, and the real part of the a element is multiplied with the imaginary part of the corresponding b elements. The two accumulated results are added, and then accumulated into the corresponding row and column of dst.
_tile_cmmrlfp16ps^⚠Experimentalamx-complex: Perform matrix multiplication of two tiles containing complex elements and accumulate the results into a packed single precision tile. Each dword element in input tiles a and b is interpreted as a complex number with FP16 real part and FP16 imaginary part. Calculates the real part of the result. For each possible combination of (row of a, column of b), it performs a set of multiplication and accumulations on all corresponding complex numbers (one from a and one from b). The real part of the a element is multiplied with the real part of the corresponding b element, and the negated imaginary part of the a element is multiplied with the imaginary part of the corresponding b elements. The two accumulated results are added, and then accumulated into the corresponding row and column of dst.
_tile_dpbf16ps^⚠Experimentalamx-bf16: Compute dot-product of BF16 (16-bit) floating-point pairs in tiles a and b, accumulating the intermediate single-precision (32-bit) floating-point elements with elements in dst, and store the 32-bit result back to tile dst.
_tile_dpbssd^⚠Experimentalamx-int8: Compute dot-product of bytes in tiles with a source/destination accumulator. Multiply groups of 4 adjacent pairs of signed 8-bit integers in a with corresponding signed 8-bit integers in b, producing 4 intermediate 32-bit results. Sum these 4 results with the corresponding 32-bit integer in dst, and store the 32-bit result back to tile dst.
_tile_dpbsud^⚠Experimentalamx-int8: Compute dot-product of bytes in tiles with a source/destination accumulator. Multiply groups of 4 adjacent pairs of signed 8-bit integers in a with corresponding unsigned 8-bit integers in b, producing 4 intermediate 32-bit results. Sum these 4 results with the corresponding 32-bit integer in dst, and store the 32-bit result back to tile dst.
_tile_dpbusd^⚠Experimentalamx-int8: Compute dot-product of bytes in tiles with a source/destination accumulator. Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in a with corresponding signed 8-bit integers in b, producing 4 intermediate 32-bit results. Sum these 4 results with the corresponding 32-bit integer in dst, and store the 32-bit result back to tile dst.
_tile_dpbuud^⚠Experimentalamx-int8: Compute dot-product of bytes in tiles with a source/destination accumulator. Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in a with corresponding unsigned 8-bit integers in b, producing 4 intermediate 32-bit results. Sum these 4 results with the corresponding 32-bit integer in dst, and store the 32-bit result back to tile dst.
_tile_dpfp16ps^⚠Experimentalamx-fp16: Compute dot-product of FP16 (16-bit) floating-point pairs in tiles a and b, accumulating the intermediate single-precision (32-bit) floating-point elements with elements in dst, and store the 32-bit result back to tile dst.
_tile_loadconfig^⚠Experimentalamx-tile: Load tile configuration from a 64-byte memory location specified by mem_addr. The tile configuration format is specified below, and includes the tile type pallette, the number of bytes per row, and the number of rows. If the specified pallette_id is zero, that signifies the init state for both the tile config and the tile data, and the tiles are zeroed. Any invalid configurations will result in #GP fault.
_tile_loadd^⚠Experimentalamx-tile: Load tile rows from memory specifieid by base address and stride into destination tile dst using the tile configuration previously configured via _tile_loadconfig.
_tile_release^⚠Experimentalamx-tile: Release the tile configuration to return to the init state, which releases all storage it currently holds.
_tile_storeconfig^⚠Experimentalamx-tile: Stores the current tile configuration to a 64-byte memory location specified by mem_addr. The tile configuration format is specified below, and includes the tile type pallette, the number of bytes per row, and the number of rows. If tiles are not configured, all zeroes will be stored to memory.
_tile_stored^⚠Experimentalamx-tile: Store the tile specified by src to memory specifieid by base address and stride using the tile configuration previously configured via _tile_loadconfig.
_tile_stream_loadd^⚠Experimentalamx-tile: Load tile rows from memory specifieid by base address and stride into destination tile dst using the tile configuration previously configured via _tile_loadconfig. This intrinsic provides a hint to the implementation that the data will likely not be reused in the near future and the data caching can be optimized accordingly.
_tile_zero^⚠Experimentalamx-tile: Zero the tile specified by tdest.

Module amxCopy item path

Functions§

Module amx