How to match pairs of values contained in two numpy arrays

This solution will scale worse for large arrays, for such cases the other proposed answers will perform better.


Here's one way taking advantage of broadcasting:

(coo[:,None] == targets).all(2).any(1)
# array([False,  True,  True, False])

Details

Check for every row in coo whether or not it matches another in target by direct comparisson having added a first axis to coo so it becomes broadcastable against targets:

(coo[:,None] == targets)

array([[[False, False],
        [ True, False]],

       [[False, False],
        [ True,  True]],

       [[ True,  True],
        [False, False]],

       [[False, False],
        [False,  True]]])

Then check which ndarrays along the second axis have all values to True:

(coo[:,None] == targets).all(2)

array([[False, False],
       [False,  True],
       [ True, False],
       [False, False]])

And finally use any to check which rows have at least one True.


Here is a simple and intuitive solution that actually uses numpy.isin(), to match tuples, rather than match individual numbers:

# View as a 1d array of tuples
coo_view     = coo.view(dtype='i,i').reshape((-1,))
targets_view = targets.view(dtype='i,i').reshape((-1,))

result = np.isin(coo_view, targets_view)
print (result)
print(result.nonzero()[0])

Output:

[False  True  True False]
[1 2]

Notes:

  1. The creation of these views does not involve any copying of data.
  2. The dtype='i,i' specifies that we want each element of the view to be a tuple of two integers

The numpy_indexed package implements functionality of this type in a vectorized manner (disclaimer: I am its author). Sadly numpy lacks a lot of this functionality out of the box; I started numpy_indexed with the intention of having it merged into numpy, but there are some backwards compatibility concerns, and big packages like that tend to move slowly. So that hasnt happened in the last 3 years; but the python packaging ecosystem works so well nowadays that just adding one more package to your environment is just as simple, really.

import numpy_indexed as npi
bools = npi.in_(targets, coo)

This will have a time-complexity similar to that of the solution posted by @fountainhead (logarithmic rather than linear, as per the currently accepted answer), but also the npi library will give you the safety of automated tests, and a lot of other convenient options, should you decide to approach the problem from a slightly different angle.