Expand description
Conversions from integers to floats.
The algorithm is explained here: https://blog.m-ou.se/floats/. It roughly does the following:
- Calculate a base mantissa by shifting the integer into mantissa position. This gives us a mantissa with the implicit bit set!
- Figure out if rounding needs to occur by classifying the bits that are to be truncated. Some patterns are used to simplify this. Adjust the mantissa with the result if needed.
- Calculate the exponent based on the base-2 logarithm of
i
(leading zeros). Subtract one. - Shift the exponent and add the mantissa to create the final representation. Subtracting one from the exponent (above) accounts for the explicit bit being set in the mantissa.
ยงTerminology
i
: the original integeri_m
: the integer, shifted fully left (no leading zeros)n
: number of leading zeroese
: the resulting exponent. Usually 1 is subtracted to offset the mantissa implicit bit.m_base
: the mantissa before adjusting for truncated bits. Implicit bit is usually set.adj
: the bits that will be truncated, possibly compressed in some way.m
: the resulting mantissa. Implicit bit is usually set.
Functionsยง
- exp ๐
- Calculate the exponent from the number of leading zeros.
- m_adj ๐
- Adjust a mantissa with dropped bits to perform correct rounding.
- repr ๐
- Shift the exponent to its position and add the mantissa.
- shift_
f_ ๐gt_ i - Shift distance from an integer with
n
leading zeros to a smaller float. - shift_
f_ ๐lt_ i - Shift distance from a left-aligned integer to a smaller float.
- signed
- Perform a signed operation as unsigned, then add the sign back.
- u32_
to_ f32_ bits - u32_
to_ f64_ bits - u32_
to_ f128_ bits - u64_
to_ f32_ bits - u64_
to_ f64_ bits - u64_
to_ f128_ bits - u128_
to_ f32_ bits - u128_
to_ f64_ bits - u128_
to_ f128_ bits