Binning continuous values with round() creates artifacts

The problem can be easily solved using np.histogram2d(x,y, bins=100).

The remainder of this answer is to show where the manual algorithms fails:

Consider that numerically

0.56*100 == 56.00000000000001    -> int(0.56*100) == 56
0.57*100 == 56.99999999999999    -> int(0.57*100) == 56
0.58*100 == 57.99999999999999    -> int(0.58*100) == 57
0.59*100 == 59.00000000000000    -> int(0.59*100) == 59

such that the number 58 will simply not occur in your indexing, while the number 56 would appear twice as often (for uniform distribution).

You may instead first multiply and then truncate to integer. Also note that the last bin needs to be closed, such that a value of 1 is added to the bin with index 99.

mtx = np.zeros([100,100])
for i in range(n):
    posX = int(x[i]*100)
    posY = int(y[i]*100)
    if posX == 100:
        posX = 99
    if posY == 100:
        posY = 99
    mtx[posX, posY] += 1

This would define the bins via the edges, i.e. the first bin ranges from 0 to 1 etc. In the call to imshow/matshow you would then need to take this into account by setting the extent.

plt.matshow(mtx, cmap=plt.cm.jet, extent=(0,100,0,100))

enter image description here

The issue you have with your method is a floating point error. This becomes apparent when you try to turn your rounded number into an integer. Consider the following function (which is essentially what you are doing to each of your random numbers):

def int_round(a):
     r = round(a, 2)
     rh = r*100
     i = int(rh)
     print(r, rh, i)


int_round(0.27)
#prints: 0.27 27.0 27

int_round(0.28)
#prints: 0.28 28.000000000000004 28

int_round(0.29)
#prints: 0.29 28.999999999999996 28

int_round(0.30)
#prints: 0.3 30.0 30

As you can see, because of the floating point error after rounding 0.28 and 0.29 and multiplying by 100, both 0.28 and 0.29 end up with an integer of 28. (This is because int() always rounds down, so 28.99999999999 becomes 28).

A solution may be to round the value after multiplying by 100:

def round_int(a):
    ah = a*100
    rh = round(ah, 2)
    i = int(rh)
    print(ah, rh, i)

round_int(0.27)
#prints: 27.0 27.0 27

round_int(0.28)
#prints: 28.000000000000004 28.0 28

round_int(0.29)
#prints: 28.999999999999996 29.0 29

round_int(0.30)
#prints: 30.0 30.0 30

Note that in this case 0.29 is corrected converted to 29.

Applying this logic to your code: we can change the for loop to:

mtx = np.zeros([101, 101])

for i in range(n):
    # my idea was that I could roughly get the bins by
    # simply rounding to the 2nd decimal point:
    posX = np.round(100*x[i], 2)
    posY = np.round(100*y[i], 2)
    mtx[int(posX), int(posY)] += 1

Note the increase number of bins to 101 to account for the final bin when x=1 or y=1. Also, here you can see that as we multiplied x[i] and y[i] by 100 before rounding, the binning occurs correctly:

enter image description here

Binning continuous values with round() creates artifacts

Tags:

Python

Rounding

Matplotlib

Artifacts

Discretization

Related

Recent Posts