Rounding to nearest int with numpy.rint() not consistent for .5

So, this kind of behavior (as noted in comments), is a very traditional form of rounding, seen in the round half to even method. Also known (according to David Heffernan) as banker's rounding. The numpy documentation around this behavior implies that they are using this type of rounding, but also implies that there may be issues with the way in which numpy interacts with the IEEE floating point format. (shown below)

Notes
-----
For values exactly halfway between rounded decimal values, Numpy
rounds to the nearest even value. Thus 1.5 and 2.5 round to 2.0,
-0.5 and 0.5 round to 0.0, etc. Results may also be surprising due
to the inexact representation of decimal fractions in the IEEE
floating point standard [1]_ and errors introduced when scaling
by powers of ten.

Whether or not that is the case, I honestly don't know. I do know that large portions of the numpy core are still written in FORTRAN 77, which predates the IEEE standard (set in 1984), but I don't know enough FORTRAN 77 to say whether or not there's some issue with the interface here.

If you're looking to just round up regardless, the np.ceil function (ceiling function in general), will do this. If you're looking for the opposite (always rounding down), the np.floor function will achieve this.

Numpy rounding does round towards even, but the other rounding modes can be expressed using a combination of operations.

>>> a=np.arange(-4,5)*0.5
>>> a
array([-2. , -1.5, -1. , -0.5,  0. ,  0.5,  1. ,  1.5,  2. ])
>>> np.floor(a)      # Towards -inf
array([-2., -2., -1., -1.,  0.,  0.,  1.,  1.,  2.])
>>> np.ceil(a)       # Towards +inf
array([-2., -1., -1., -0.,  0.,  1.,  1.,  2.,  2.])
>>> np.trunc(a)      # Towards 0
array([-2., -1., -1., -0.,  0.,  0.,  1.,  1.,  2.])
>>> a+np.copysign(0.5,a)   # Shift away from 0
array([-2.5, -2. , -1.5, -1. ,  0.5,  1. ,  1.5,  2. ,  2.5])
>>> np.trunc(a+np.copysign(0.5,a))   # 0.5 towards higher magnitude round
array([-2., -2., -1., -1.,  0.,  1.,  1.,  2.,  2.])

In general, numbers of the form n.5 can be accurately represented by binary floating point (they are m.1 in binary, as 0.5=2**-1), but calculations expected to reach them might not. For instance, negative powers of ten are not exactly represented:

>>> (0.1).as_integer_ratio()
(3602879701896397, 36028797018963968)
>>> [10**n * 10**-n for n in range(20)]
[1, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0,
 0.9999999999999999, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]

This is in fact exactly the rounding specified by the IEEE floating point standard IEEE 754 (1985 and 2008). It is intended to make rounding unbiased. In normal probability theory, a random number between two integers has zero probability of being exactly N + 0.5, so it shouldn't matter how you round it because that case never happens. But in real programs, numbers are not random and N + 0.5 occurs quite often. (In fact, you have to round 0.5 every time a floating point number loses 1 bit of precision!) If you always round 0.5 up to the next largest number, then the average of a bunch rounded numbers is likely to be slightly larger than the average of the unrounded numbers: this bias or drift can have very bad effects on some numerical algorithms and make them inaccurate.

The reason rounding to even is better than rounding to odd is that the last digit is guaranteed to be zero, so if you have to divide by 2 and round again, you don't lose any information at all.

In summary, this kind of rounding is the best that mathematicians have been able to devise, and you should WANT it under most circumstances. Now all we need to do is get schools to start teaching it to children.

Rounding to nearest int with numpy.rint() not consistent for .5

Tags:

Python

Numpy

Related

Recent Posts