Floating Point Visually Explained

Let's think of the same thing but in a hypothetical base 10 computer

First digit is the plus or minus

The next section is the scale of the number: E-10, E+2 etc.

The last section is the actual number used.

So 67 would be:

  • plus

  • E+1 (over 10)

    • 6(.)7000000...

    -> +6.7E+1 = 67

Now when we move to the practical binary version:

The minus/plus but stays the same. That's where the (-1)s comes from.

The second section needs to have negative numbers (for |x|<1) as well, so we'll start counting from -127 instead of 0. Also instead of E+x being 10x, it's now 2x because we're working in binary.

That's where the 2x-127 comes from.

In the third section we can assume the number is always of form 1. ____ because for 10._ (2) or 0._ you can just change the scale to have it start with a 1 again.

That's why there's a 1.M

And that's all there is to it. The only reason it's difficult to grasp is the binary <-> dec conversion, not because it's complicated. I hope this helps

/r/programming Thread Parent Link - fabiensanglard.net