Is fixed point math faster than floating point?

Fixed point is marginally useful on platforms that do not support any kind of decimal type of their own; for example, I implemented a 24-bit fixed point type for the PIC16F series microcontrollers (more on why I chose fixed point later).

However, almost every modern CPU supports floating point at the microcode or hardware level, so there isn't much need for fixed point.

Fixed point numbers are limited in the range they can represent - consider a 64-bit(32.32) fixed point vs. a 64-bit floating point: the 64-bit fixed point number has a decimal resolution of 1/(232), while the floating point number has a decimal resolution of up to 1/(253); the fixed point number can represent values as high as 231, while the floating point number can represent numbers up to 2223. And if you need more, most modern CPUs support 80-bit floating point values.

Of course, the biggest downfall of floating point is limited precision in extreme cases - e.g. in fixed point, it would require fewer bits to represent 9000000000000000000000000000000.00000000000000000000000000000002. Of course, with floating point, you get better precision for average uses of decimal arithmetic, and I have yet to see an application where decimal arithmetic is as extreme as the above example yet also does not overflow the equivalent fixed-point size.

The reason I implemented a fixed-point library for the PIC16F rather than use an existing floating point library was code size, not speed: the 16F88 has 384 bytes of usable RAM and room for 4095 instructions total. To add two fixed point numbers of predefined width, I inlined integer addition with carry-out in my code (the fixed point doesn't move anyway); to multiply two fixed point numbers, I used a simple shift-and-add function with extended 32-bit fixed point, even though that isn't the fastest multiplication approach, in order to save even more code.

So, when I had need of only one or two basic arithmetic operations, I was able to add them without using up all of the program storage. For comparison, a freely available floating point library on that platform was about 60% of the total storage on the device. In contrast, software floating point libraries are mostly just wrappers around a few arithmetic operations, and in my experience, they are mostly all-or-nothing, so cutting the code size in half because you only need half of the functions doesn't work so well.

Fixed point generally doesn't provide much of an advantage in speed though, because of its limited representation range: how many bits would you need to represent 1.7E+/-308 with 15 digits of precision, the same as a 64-bit double? If my calculations are correct, you'd need somewhere around 2020 bits. I'd bet the performance of that wouldn't be so good.

Thirty years ago, when hardware floating point was relatively rare, very special-purpose fixed-point (or even scaled integer) arithmetic could provide significant gains in performance over doing software-based floating point, but only if the allowable range of values could be efficiently represented with scaled-integer arithmetic (the original Doom used this approach when no coprocessor was available, such as on my 486sx-25 in 1992 - typing this on an overclocked hyperthreaded Core i7 running at 4.0GHz with a GeForce card that has over 1000 independent floating point compute units, it just seems wrong somehow, although I'm not sure which - the 486, or the i7...).

Floating point is more general purpose due to the range of values it can represent, and with it implemented in hardware on both CPUs and GPUs, it beats fixed point in every way, unless you really need more than 80-bit floating point precision at the expense of huge fixed-point sizes and very slow code.


Well I code for 2 decades and my experience is there are 3 main reasons to use fixed point:

  1. No FPU available

    Fixed point is still valid for DSP,MCU,FPGA and chip design in general. Also no floating point unit can work without fixed point core unit so also all bigdecimal libs must use fixed point... Also graphics cards use fixed point a lot (normalized device coordinates).

  2. insufficient FPU precision

    if you go to astronomic computations you will very soon hit the extremes and the need of handling them. For example simple Newtonian/D'Alembert integration or atmosphere ray-tracing hits the precision barriers pretty fast on large scales and low granularity. I usually use array of floating point doubles to remedy that. For situations where the input/output range is known the fixed point is usually better choice. See some examples of hitting the FPU barrier:

    • Is it possible to make realistic n-body solar system simulation in matter of size and mass?
    • ray and ellipsoid intersection accuracy improvement
  3. speed

    Back in the old days FPU was really slow (especially on x86 architecture) due the interface and api it uses. An interrupt was generated for each FPU instruction not to mention the operands and results transfer process... So few bit-shift operations in CPU ALU was usually faster.

    Nowadays is this not true anymore and the ALU and FPU speeds are comparable. For example here mine measurement of CPU/FPU operations (in small Win32 C++ app):

      fcpu(0) = 3.194877 GHz // tested on first core of AMD-A8-5500 APU 3.2GHz Win7 x64 bit
    
      CPU 32bit integer aritmetics:
      add = 387.465 MIPS
      sub = 376.333 MIPS
      mul = 386.926 MIPS
      div = 245.571 MIPS
      mod = 243.869 MIPS
    
      FPU 32bit float aritmetics:
      add = 377.332 MFLOPS
      sub = 385.444 MFLOPS
      mul = 383.854 MFLOPS
      div = 367.520 MFLOPS
    
      FPU 64bit double aritmetics:
      add = 385.038 MFLOPS
      sub = 261.488 MFLOPS
      mul = 353.601 MFLOPS
      div = 309.282 MFLOPS
    

    The values vary with time but in comparison between data types are almost identical. Just few years back the doubles where slower due to 2x times bigger data transfers. But there are other platforms where the speed difference may be still valid.