Module div

Source
Expand description

Floating point division routines.

This module documentation gives an overview of the method used. More documentation is inline.

§Relevant notation

  • m_a: the mantissa of a, in base 2
  • p_a: the exponent of a, in base 2. I.e. a = m_a * 2^p_a
  • uqN (e.g. uq1): this refers to Q notation for fixed-point numbers. UQ1.31 is an unsigned fixed-point number with 1 integral bit, and 31 decimal bits. A uqN variable of type uM will have N bits of integer and M-N bits of fraction.
  • hw: half width, i.e. for f64 this will be a u32.
  • x is the best estimate of 1/m_b

§Method Overview

Division routines must solve for a / b, which is res = m_a*2^p_a / m_b*2^p_b. The basic process is as follows:

  • Rearange the exponent and significand to simplify the operations: res = (m_a / m_b) * 2^{p_a - p_b}.
  • Check for early exits (infinity, zero, etc).
  • If a or b are subnormal, normalize by shifting the mantissa and adjusting the exponent.
  • Set the implicit bit so math is correct.
  • Shift mantissa significant digits (with implicit bit) fully left such that fixed-point UQ1 or UQ0 numbers can be used for mantissa math. These will have greater precision than the actual mantissa, which is important for correct rounding.
  • Calculate the reciprocal of m_b, x.
  • Use the reciprocal to multiply rather than divide: res = m_a * x_b * 2^{p_a - p_b}.
  • Reapply rounding.

§Reciprocal calculation

Calculating the reciprocal is the most complicated part of this process. It uses the Newton-Raphson method, which picks an initial estimation (of the reciprocal) and performs a number of iterations to increase its precision.

In general, Newton’s method takes the following form:

`x_n` is a guess or the result of a previous iteration. Increasing `n` converges to the
desired result.

The result approaches a zero of `f(x)` by applying a correction to the previous gues.

x_{n+1} = x_n - f(x_n) / f'(x_n)

Applying this to find the reciprocal:

1 / x = b

Rearrange so we can solve by finding a zero
0 = (1 / x) - b = f(x)

f'(x) = -x^{-2}

x_{n+1} = 2*x_n - b*x_n^2

This is a process that can be repeated to calculate the reciprocal with enough precision to achieve a correctly rounded result for the overall division operation. The maximum required number of iterations is known since precision doubles with each iteration.

§Half-width operations

Calculating the reciprocal requires widening multiplication and performing arithmetic on the results, meaning that emulated integer arithmetic on u128 (for f64) and u256 (for f128) gets used instead of native math.

To make this more efficient, all but the final operation can be computed using half-width integers. For example, rather than computing four iterations using 128-bit integers for f64, we can instead perform three iterations using native 64-bit integers and only one final iteration using the full 128 bits.

This works because of precision doubling. Some leeway is allowed here because the fixed-point number has more bits than the final mantissa will.

Modules§

__divdf3 🔒
__divdf3vfp 🔒
__divsf3 🔒
__divsf3vfp 🔒
__divtf3 🔒

Functions§

__divdf3
__divsf3
__divtf3
c_hw 🔒
The value of C adjusted to half width.
div 🔒
get_iterations 🔒
Calculate the number of iterations required for a float type’s precision.
next_guess 🔒
Perform one iteration at any width to approach 1/b, given previous guess x. Returns the next x as a UQ0 number.
reciprocal_precision 🔒
u_n for different precisions (with N-1 half-width iterations).