Float Point
Fractional Binary Numbers
- Representation
- Bits to right of "binary point" represent fractional powers of (2)
- Represents rational number:[sum_{k=-j}^i b_k imes 2^k ]
we can represent any fractional binary number
Fractional Binary Numbers: Examples
Observations
- Divide by 2 by shifting right (unsigned)
- Multiply by 2 by shifting left
- Numbers of form 0.111111…2 are just below 1.0
- Use notation (1.0^{-varepsilon})
(varepsilon) depends how many bits you have to the right of binary point
- Use notation (1.0^{-varepsilon})
Representable Numbers
Limitation #1
Can only exactly represent numbers of the form
(frac{x}{2^k})
example:
(1/3) Representation: ({0.0101010101[01]…}_2
)
(1/5) Representation: ({0.001100110011[0011]…}_2
)
Limitation #2
Just one setting of binary point within the (w) bits
Limited range of numbers:
- binary point shift right ( ightarrow) range of numbers (uparrow)
- binary point shift left ( ightarrow) range of fractional binary numbers (uparrow)
IEEE Floating Point
Floating Point Representation
- Numerical Form: ((-1)^s M 2^E)
- Sign bit (s) determines whether number is negative or positive
- Significand (M)(mantissa) normally a fractional value in range ([1.0,2.0)).
- Exponent (E) weights value by power of two
- Encoding
- MSB s is sign bit (s)
- exp field encodes (E) (but is not equal to E)
- frac field encodes (M) (but is not equal to M)
Precision options
- Single precision: 32 bits
(s):1 bit
(exp): 8 bit
(frac): 23 bit - Double precision: 64 bits
(s): 1 bit
(exp): 11 bit
(frac): 52 bit
"Normalized" Values
- When: exp ( ot =) (000…0) and exp ( ot =) (111…1)
- Exponent coded as a biased value: E = Exp – Bias(7 unsigned numbers)
- Exp: unsigned value of exp field(we can compare two float numbers using Exp directly because of the unsigned value)
- Bias = (2^{k-1} - 1),where (k) is number of exponent bits
Single precision: 127 (Exp: 1…254, E: -126…127)(don't have 000..0 or 111..1)
Double precision: 1023 (Exp: 1…2046, E: -1022…1023)(don't have 000..0 or 111..1)
- Significand coded with implied leading 1: M = 1.xx..x2
- xxx…x: bits of frac field(1 is drop,because we want a bit for free)
- Minimum when frac=000…0 (M = 1.0)