-O1 alters floating point math

While the other answer already explains why you are seeing different behavior between -O0 (evaluated at run-time with slightly imperfect results and unspecified rounding) and -O1 (evaluated at compile-time with exact results and rounding), I want to add an explanation of why it was difficult for me to reproduce your particular output with -O0. I always observed the output

-0x1.8f4e436eb5372p-3 -0x1.4ca54aa5d4e1ep-1 

on both my own my machine and the compiler explorer.

The reason is that you most likely are using a glibc compiled with the -mfma flag, i.e. using FMA3. I have tested through the -march switches for gcc and was able to narrow it down to that.

On my machine with a Kaby Lake processor, gcc 9.2 and glibc 2.29, compiling glibc with -O2 -march=native and the executable with -O0 I get the output

-0x1.8f4e436eb5371p-3 -0x1.4ca54aa5d4e1ep-1 

Compiling glibc with -O2 -march=native -mno-fma and the executable with -O0 I get

-0x1.8f4e436eb5372p-3 -0x1.4ca54aa5d4e1ep-1 

In either case compiling the executable with -O1 gives:

-0x1.8f4e436eb5372p-3 -0x1.4ca54aa5d4e1ep-1 

Looking at the disassembly for sin and cos with FMA3 enabled, it is clear that these instructions are used. The fused-multiply-add causes one less rounding of the intermediate results and can thereby change the output of cos and/or sin slightly. I suspect that this is the reason that the code in question produces slightly different output depending on optimization flags of glibc. As explained in the other answer this difference does however fall into the range of documented divergence for these functions.

As to why the compiler is allowed to use the FMA3 instructions, although it changes the result of floating point operations, see this question.


There is also a feature called multi-arch in glibc, that if enabled will link differently optimized math functions at runtime to fit the architecture the program is running on. If this is enabled and your CPU supports FMA3 (e.g. Haswell and up), then you will also be seeing your results.


With -O1, the floating computation happens at compile time, using the GNU MPFR library. MPFR is expected to give a correctly rounded result even for functions such as sin and cos. Your math library likely has different accuracy goals for these functions, which is why run-time computation (at the -O0 optimization level) sometimes gives different results. For example, the GNU C library has a general accuracy goal of a few ulp.

Reportedly, IEEE 754 only has accuracy requirements for a subset of the math library functions (sqrt, apparently), which enables math libraries to choose different trade-offs between speed and accuracy for the transcendental functions. (I do not have access to IEEE 754 because IEEE is opposed to the open dissemination of knowledge unfortunately.)