How can a 80-bit floating point be used by a 32-bit system?

One of the meanings of a 32 bit CPU is that its registers are 32 bits wide. This doesn't mean it can't deal with say, 64 bit numbers, just that it has to deal with the lower 32 bit half first, then the upper 32 bit half second. (It's why CPUs have a carry flag.) It's slower than if the CPU could just load the values in a wider 64 bit register, but still possible.

Thus, the "bitness" of a system does not necessarily limit the size of the numbers a program can deal with, because you can always break up operations that wont fit into CPU registers into multiple operations. So it makes operations slower, consume more memory (if you have to use memory as a "scratchpad"), and more difficult to program, but the operations are still possible.

However, none of that matters with, for example, Intel 32 bit processors and floating point, as the floating point part of the CPU has its own registers and they are 80 bits wide. (Early in the x86's history, the floating point capability was a separate chip, it was integrated in the CPU beginning with 80486DX.)

@Breakthrough's answer inspired me to add this.

Floating point values, insofar as they are stored in the FPU registers, work very different than binary integer values.

The 80 bits of a floating point value are divided amongst a mantissa and exponent (there is also the "base" in floating point numbers which is always 2). The mantissa contains the significant digits, and the exponent determines how large those significant digits are. So there is no "overflow" into another register, if your number gets too big to fit in the mantissa, your exponent increases and you lose precision - i.e. when you convert it to an integer, you'll lose decimal places off the right - this is why it's called floating point.

If your exponent is too large, you then have a floating-point overflow, but you can't easily extend it to another register since the exponent and mantissa are tied together.

I could be inaccurate and wrong about some of that, but I believe that's the gist of it. (This Wikipedia article illustrates the above a bit more succinctly.)

It's OK that this works totally differently since the whole "floating-point" part of the CPU is sort of in its own world - you use special CPU instructions to access it and such. Also, towards the point of the question, because it's separate, the bitness of the FPU isn't tightly coupled with bitness of the native CPU.

32-bit, 64-bit, and 128-bit all refer to the word length of the processor, which can be thought of as the "fundamental data type". Often, this is the number of bits transferred to/from the RAM of the system, and the width of pointers (although nothing stops you from using software to access more RAM then what a single pointer can access).

Assuming a constant clock speed (as well as everything else in the architecture being constant), and assuming memory reads/writes are the same speed (we assume 1 clock cycle here, but this is far from the case in real life), you can add two 64-bit numbers in a single clock cycle on a 64-bit machine (three if you count fetching the numbers from RAM):

ADDA [NUM1], [NUM2]
STAA [RESULT]

We can also do the same computation on a 32-bit machine... However, on a 32-bit machine, we need to do this in software, since the lower 32-bits must be added first, compensate for overflow, then add the upper 64-bits:

     ADDA [NUM1_LOWER], [NUM2_LOWER]
     STAA [RESULT_LOWER]
     CLRA          ; I'm assuming the condition flags are not modified by this.
     BRNO CMPS     ; Branch to CMPS if there was no overflow.
     ADDA #1       ; If there was overflow, compensate the value of A.
CMPS ADDA [NUM1_UPPER], [NUM2_UPPER]
     STAA [RESULT_UPPER]

Going through my made-up assembly syntax, you can easily see how higher-precision operations can take an exponentially longer time on a lower word length machine. This is the real key to 64-bit and 128-bit processors: they allow us to handle larger numbers of bits in a single operation. Some machines include instructions for adding other quantities with a carry (e.g. ADC on x86), but the example above has arbitrary precision values in mind.

Now, to extend this to the question, it's simple to see how we could add numbers larger than the registers we have available - we just break the problem up into chunks the size of the registers, and work from there. Although as mentioned by @MatteoItalia, the x87 FPU stack has native support for 80-bit quantities, in systems lacking this support (or processors lacking a floating point unit entirely!), the equivalent computations/operations must be performed in software.

So for an 80-bit number, after adding each 32-bit segment, one would also check for overflow into the 81-st bit, and optionally zero the higher order bits out. These checks/zeros are performed automatically for certain x86 and x86-64 instructions, where the source and destination operand sizes are specified (although these are only specified in powers of 2 starting from 1 byte wide).

Of course, with floating point numbers, one can't simply perform the binary addition since the mantissa and significant digits are packed together in offset form. In the ALU on an x86 processor, there is a hardware circuit to perform this for IEEE 32-bit and 64-bit floats; however, even in the absence of a floating-point unit (FPU), the same computations can be performed in software (e.g. through the use of the GNU Scientific Library, which uses an FPU when compiled on architectures with, falling back to software algorithms if no floating-point hardware is available [e.g. for embedded microcontrollers lacking FPUs]).

Given enough memory, one can also perform computations on numbers of arbitrary (or "infinite" - within realistic bounds) precision, using more memory as more precision is required. One implementation of this exists in the GNU Multiple Precision library, allowing unlimited precision (until your RAM is full, of course) on integer, rational, and floating point operations.

The memory architecture of the system may only allow you to move 32 bits at once - but that doesn't stop it from using larger numbers.

Think of multiplication. You may know your multiplication tables up to 10x10, yet you probably have no problem performing 123x321 on a piece of paper: you just break it into many small problems, multiplying individual digits, and taking care of the carry etc.

Processors can do the same thing. In the "olden days" you had 8 bit processors that could do floating point math. But they were slooooooow.

How can a 80-bit floating point be used by a 32-bit system?

Tags:

Memory

32 Bit

Related

Recent Posts