Efficiently replace elements in array based on dictionary - NumPy / Python

Given that you're using numpy arrays, I'd suggest you do a mapping using numpy too. Here's a vectorized approach using np.select:

mapping = {1:2, 5:3, 8:6}
keys, choices = list(zip(*mapping.items()))
# [(1, 5, 8), (2, 3, 6)]
# we can use broadcasting to obtain a 3x100x100
# array to use as condlist
conds = np.array(keys)[:,None,None]  == input_array
# use conds as arrays of conditions and the values 
# as choices
np.select(conds, choices)

array([[2, 2, 2, ..., 0, 0, 0],
       [2, 2, 2, ..., 0, 0, 0],
       [2, 2, 2, ..., 0, 0, 0],
       ...,
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]])

Approach #1 : Loopy one with array data

One approach would be extracting the keys and values in arrays and then use a similar loop -

k = np.array(list(mapping.keys()))
v = np.array(list(mapping.values()))

out = np.zeros_like(input_array)
for key,val in zip(k,v):
    out[input_array==key] = val

Benefit with this one over the original one is the spatial-locality of the array data for efficient data-fetching, which is used in the iterations.

Also, since you mentioned thousand large np.arrays. So, if the mapping dictionary stays the same, that step to get the array versions - k and v would be a one-time setup process.

Approach #2 : Vectorized one with searchsorted

A vectorized one could be suggested using np.searchsorted -

sidx = k.argsort() #k,v from approach #1

k = k[sidx]
v = v[sidx]

idx = np.searchsorted(k,input_array.ravel()).reshape(input_array.shape)
idx[idx==len(k)] = 0
mask = k[idx] == input_array
out = np.where(mask, v[idx], 0)

Approach #3 : Vectorized one with mapping-array for integer keys

A vectorized one could be suggested using a mapping array for integer keys, which when indexed by the input array would lead us directly to the final output -

mapping_ar = np.zeros(k.max()+1,dtype=v.dtype) #k,v from approach #1
mapping_ar[k] = v
out = mapping_ar[input_array]

The numpy_indexed library (disclaimer: I am its author) provides functionality to implement this operation in an efficient vectorized maner:

import numpy_indexed as npi
output_array = npi.remap(input_array.flatten(), list(mapping.keys()), list(mapping.values())).reshape(input_array.shape)

Note; I didnt test it; but it should work along these lines. Efficiency should be good for large inputs, and many items in the mapping; I imagine similar to divakars' method 2; not as fast as his method 3. But this solution is aimed more at generality; and it will also work for inputs which are not positive integers; or even nd-arrays (f.i. replacing colors in an image with other colors, etc).

I think the Divakar #3 method assumes that the mapping dict covers all values (or at least the maximum value) in the target array. Otherwise, to avoid index out of range errors, you have to replace the line

mapping_ar = np.zeros(k.max()+1,dtype=v.dtype) with

mapping_ar = np.zeros(array.max()+1,dtype=v.dtype)

That adds considerable overhead.

Efficiently replace elements in array based on dictionary - NumPy / Python

Tags:

Python

Performance

Numpy

Vectorization

Related

Recent Posts