IEEE 754

Useful Links:

Structure

Using 32bit single precision IEEE 754 as an example to store

Sign (1 bit)	Exponent (8 bits)	Significand/Mantissa (23 bits)
0	10000010	11110000000000000000000

Sign

The sign bit is the first bit in the IEEE 754 representation. A value of 0 indicates a positive number, and a value of 1 indicates a negative number. In this case, the sign bit is 0, indicating that is positive.

Exponent

The exponent is stored using a biased representation. The formula for the bias is:

Where is the number of bits used for the exponent field.

For single-precision (32-bit), the exponent field has 8 bits, so the bias is:

For double-precision (64-bit), the exponent field has 11 bits, so the bias is:

The exponent is stored as the value of the exponent plus the bias. The formula for the actual exponent is:

Where:

is the actual exponent (i.e., the exponent in base 10),
is the exponent stored in the IEEE 754 format (in binary),
is a constant used to offset the exponent so that both positive and negative exponents can be represented.

In this case, the exponent value is which is in binary, and the actual exponent is calculated as:

Significand

The significand (or mantissa) represents the precision bits of the number, excluding the leading 1 (implicit in normalized numbers). The stored value of the significand is 1.1111... in binary, and the remaining bits are padded with zeros. In this case, the mantissa is 11110000000000000000000, which corresponds to the fractional part of the number.

Addition

When adding two IEEE 754 floating-point numbers, the following steps are generally followed:

Align the exponents: If the exponents of the two numbers are not the same, the smaller exponent is adjusted by shifting its significand to the right.
Add the significands: After aligning the exponents, the significands (mantissas) are added. If necessary, the result is normalized (shifting the result and adjusting the exponent).
Round the result: If the result is not exactly representable, rounding must be performed.
Normalize the result: If the result of the addition is out of range or not normalized, it is adjusted to fit the IEEE 754 format.

Basic and interchange formats

IEEE 754 specifies two primary formats for storing floating-point numbers: the basic format and the interchange format.

Basic Format: This is the format in which floating-point numbers are stored in memory. For single precision, this is the 32-bit representation of the number, with 1 bit for the sign, 8 bits for the exponent, and 23 bits for the significand (mantissa).
Interchange Format: This format is used for the exchange of floating-point data between different systems. The key difference is that the exponent is adjusted to represent the number in a normalized form (if applicable) and may use different conventions for handling special cases like infinity and NaN (Not a Number).

Examples

Convert 29.125 to IEEE 754 Single Precision

Step 1: Convert to Binary

To convert 29.125 to binary, we first handle the whole number (29) and the decimal part (0.125). Start by converting 29 to binary through repeated division by 2, recording the remainders:

29 / 2 = 14 remainder 1

14 / 2 = 7 remainder 0

7 / 2 = 3 remainder 1

3 / 2 = 1 remainder 1

1 / 2 = 0 remainder 1

Reading the remainders from bottom to top, we get 11101.

Next, convert the decimal part (0.125). Multiply the decimal by 2 and record the integer part:

0.125 * 2 = 0.25, integer part 0

0.25 * 2 = 0.5, integer part 0

0.5 * 2 = 1.0, integer part 1; stop when the result is 1.0

The decimal part in binary is .001. Combining it with the whole number, we get 11101.001.

Step 2: Determine the Sign

Since 29.125 is positive, the sign bit is 0.

Step 3: Determine the Exponent

To find the exponent, shift the binary point to normalize the number. For 11101.001, shift the binary point 4 places to the left, so the exponent is 4.

IEEE-754 uses a bias of 127 for single precision. So, add 127 to the exponent: 4 + 127 = 131.

Now, convert 131 to binary by dividing by 2:

131 / 2 = 65 remainder 1

65 / 2 = 32 remainder 1

32 / 2 = 16 remainder 0

16 / 2 = 8 remainder 0

8 / 2 = 4 remainder 0

4 / 2 = 2 remainder 0

2 / 2 = 1 remainder 0

1 / 2 = 0 remainder 1

Reading the remainders from bottom to top, we get 10000011.

Step 4: Determine the Mantissa

For the mantissa, take the binary digits to the right of the binary point in the normalized number 1.1101001. Remove the leading 1 (which is implied), leaving 1101001. Then, pad it with zeros to the right to get 23 bits: 11010010000000000000000.

Step 5: Putting It All Together

The final IEEE 754 single precision representation of 29.125 is:

0 | 10000011 | 11010010000000000000000

Verify answer here

This will be added soon. Maybe tomorrow!

Last modified: 16 November 2024