Set "in" operator: uses equality or identity?

Set __contains__ makes checks in the following order:

 'Match' if hash(a) == hash(b) and (a is b or a==b) else 'No Match'

The relevant C source code is in Objects/setobject.c::set_lookkey() and in Objects/object.c::PyObject_RichCompareBool().


You need to define __hash__ too. For example

class A(object):
    def __hash__(self):
        print '__hash__'
        return 42

    def __cmp__(self, other):
        print '__cmp__'
        return object.__cmp__(self, other)

    def __eq__(self, rhs):
        print '__eq__'
        return True

a1 = A()
a2 = A()
print a1 in set([a1])
print a1 in set([a2])

Will work as expected.

As a general rule, any time you implement __cmp__ you should implement a __hash__ such that for all x and y such that x == y, x.__hash__() == y.__hash__().


Sets and dictionaries gain their speed by using hashing as a fast approximation of full equality checking. If you want to redefine equality, you usually need to redefine the hash algorithm so that it is consistent.

The default hash function uses the identity of the object, which is pretty useless as a fast approximation of full equality, but at least allows you to use an arbitrary class instance as a dictionary key and retrieve the value stored with it if you pass exactly the same object as a key. But it means if you redefine equality and don't redefine the hash function, your objects will go into a dictionary/set without complaining about not being hashable, but still won't actually work the way you expect them to.

See the official python docs on __hash__ for more details.