Module prim_f32

1.6.0 · Source
Expand description

A 32-bit floating-point type (specifically, the “binary32” type defined in IEEE 754-2008).

This type can represent a wide range of decimal numbers, like 3.5, 27, -113.75, 0.0078125, 34359738368, 0, -1. So unlike integer types (such as i32), floating-point types can represent non-integer numbers, too.

However, being able to represent this wide range of numbers comes at the cost of precision: floats can only represent some of the real numbers and calculation with floats round to a nearby representable number. For example, 5.0 and 1.0 can be exactly represented as f32, but 1.0 / 5.0 results in 0.20000000298023223876953125 since 0.2 cannot be exactly represented as f32. Note, however, that printing floats with println and friends will often discard insignificant digits: println!("{}", 1.0f32 / 5.0f32) will print 0.2.

Additionally, f32 can represent some special values:

  • −0.0: IEEE 754 floating-point numbers have a bit that indicates their sign, so −0.0 is a possible value. For comparison −0.0 = +0.0, but floating-point operations can carry the sign bit through arithmetic operations. This means −0.0 × +0.0 produces −0.0 and a negative number rounded to a value smaller than a float can represent also produces −0.0.
  • and −∞: these result from calculations like 1.0 / 0.0.
  • NaN (not a number): this value results from calculations like (-1.0).sqrt(). NaN has some potentially unexpected behavior:
    • It is not equal to any float, including itself! This is the reason f32 doesn’t implement the Eq trait.
    • It is also neither smaller nor greater than any float, making it impossible to sort by the default comparison operation, which is the reason f32 doesn’t implement the Ord trait.
    • It is also considered infectious as almost all calculations where one of the operands is NaN will also result in NaN. The explanations on this page only explicitly document behavior on NaN operands if this default is deviated from.
    • Lastly, there are multiple bit patterns that are considered NaN. Rust does not currently guarantee that the bit patterns of NaN are preserved over arithmetic operations, and they are not guaranteed to be portable or even fully deterministic! This means that there may be some surprising results upon inspecting the bit patterns, as the same calculations might produce NaNs with different bit patterns. This also affects the sign of the NaN: checking is_sign_positive or is_sign_negative on a NaN is the most common way to run into these surprising results. (Checking x >= 0.0 or x <= 0.0 avoids those surprises, but also how negative/positive zero are treated.) See the section below for what exactly is guaranteed about the bit pattern of a NaN.

When a primitive operation (addition, subtraction, multiplication, or division) is performed on this type, the result is rounded according to the roundTiesToEven direction defined in IEEE 754-2008. That means:

  • The result is the representable value closest to the true value, if there is a unique closest representable value.
  • If the true value is exactly half-way between two representable values, the result is the one with an even least-significant binary digit.
  • If the true value’s magnitude is ≥ f32::MAX + 2(f32::MAX_EXPf32::MANTISSA_DIGITS − 1), the result is ∞ or −∞ (preserving the true value’s sign).
  • If the result of a sum exactly equals zero, the outcome is +0.0 unless both arguments were negative, then it is -0.0. Subtraction a - b is regarded as a sum a + (-b).

For more information on floating-point numbers, see Wikipedia.

See also the std::f32::consts module.

§NaN bit patterns

This section defines the possible NaN bit patterns returned by floating-point operations.

The bit pattern of a floating-point NaN value is defined by:

  • a sign bit.
  • a quiet/signaling bit. Rust assumes that the quiet/signaling bit being set to 1 indicates a quiet NaN (QNaN), and a value of 0 indicates a signaling NaN (SNaN). In the following we will hence just call it the “quiet bit”.
  • a payload, which makes up the rest of the significand (i.e., the mantissa) except for the quiet bit.

The rules for NaN values differ between arithmetic and non-arithmetic (or “bitwise”) operations. The non-arithmetic operations are unary -, abs, copysign, signum, {to,from}_bits, {to,from}_{be,le,ne}_bytes and is_sign_{positive,negative}. These operations are guaranteed to exactly preserve the bit pattern of their input except for possibly changing the sign bit.

The following rules apply when a NaN value is returned from an arithmetic operation:

  • The result has a non-deterministic sign.

  • The quiet bit and payload are non-deterministically chosen from the following set of options:

    • Preferred NaN: The quiet bit is set and the payload is all-zero.
    • Quieting NaN propagation: The quiet bit is set and the payload is copied from any input operand that is a NaN. If the inputs and outputs do not have the same payload size (i.e., for as casts), then
      • If the output is smaller than the input, low-order bits of the payload get dropped.
      • If the output is larger than the input, the payload gets filled up with 0s in the low-order bits.
    • Unchanged NaN propagation: The quiet bit and payload are copied from any input operand that is a NaN. If the inputs and outputs do not have the same size (i.e., for as casts), the same rules as for “quieting NaN propagation” apply, with one caveat: if the output is smaller than the input, dropping the low-order bits may result in a payload of 0; a payload of 0 is not possible with a signaling NaN (the all-0 significand encodes an infinity) so unchanged NaN propagation cannot occur with some inputs.
    • Target-specific NaN: The quiet bit is set and the payload is picked from a target-specific set of “extra” possible NaN payloads. The set can depend on the input operand values. See the table below for the concrete NaNs this set contains on various targets.

In particular, if all input NaNs are quiet (or if there are no input NaNs), then the output NaN is definitely quiet. Signaling NaN outputs can only occur if they are provided as an input value. Similarly, if all input NaNs are preferred (or if there are no input NaNs) and the target does not have any “extra” NaN payloads, then the output NaN is guaranteed to be preferred.

The non-deterministic choice happens when the operation is executed; i.e., the result of a NaN-producing floating-point operation is a stable bit pattern (looking at these bits multiple times will yield consistent results), but running the same operation twice with the same inputs can produce different results.

These guarantees are neither stronger nor weaker than those of IEEE 754: IEEE 754 guarantees that an operation never returns a signaling NaN, whereas it is possible for operations like SNAN * 1.0 to return a signaling NaN in Rust. Conversely, IEEE 754 makes no statement at all about which quiet NaN is returned, whereas Rust restricts the set of possible results to the ones listed above.

Unless noted otherwise, the same rules also apply to NaNs returned by other library functions (e.g. min, minimum, max, maximum); other aspects of their semantics and which IEEE 754 operation they correspond to are documented with the respective functions.

When an arithmetic floating-point operation is executed in const context, the same rules apply: no guarantee is made about which of the NaN bit patterns described above will be returned. The result does not have to match what happens when executing the same code at runtime, and the result can vary depending on factors such as compiler version and flags.

§Target-specific “extra” NaN values

target_archExtra payloads possible on this platform
aarch64, arm, arm64ec, loongarch64, powerpc (except when target_abi = "spe"), powerpc64, riscv32, riscv64, s390x, x86, x86_64None
nvptx64All payloads
sparc, sparc64The all-one payload
wasm32, wasm64If all input NaNs are quiet with all-zero payload: None.
Otherwise: all payloads.

For targets not in this table, all payloads are possible.

§Algebraic operators

Algebraic operators of the form a.algebraic_*(b) allow the compiler to optimize floating point operations using all the usual algebraic properties of real numbers – despite the fact that those properties do not hold on floating point numbers. This can give a great performance boost since it may unlock vectorization.

The exact set of optimizations is unspecified but typically allows combining operations, rearranging series of operations based on mathematical properties, converting between division and reciprocal multiplication, and disregarding the sign of zero. This means that the results of elementary operations may have undefined precision, and “non-mathematical” values such as NaN, +/-Inf, or -0.0 may behave in unexpected ways, but these operations will never cause undefined behavior.

Because of the unpredictable nature of compiler optimizations, the same inputs may produce different results even within a single program run. Unsafe code must not rely on any property of the return value for soundness. However, implementations will generally do their best to pick a reasonable tradeoff between performance and accuracy of the result.

For example:

x = a.algebraic_add(b).algebraic_add(c).algebraic_add(d);

May be rewritten as:

x = a + b + c + d; // As written
x = (a + c) + (b + d); // Reordered to shorten critical path and enable vectorization