Reverse Image search (for image duplicates) on local computer
What you are looking for is called image hashing . In this answer you will find a basic explanation of the concept, as well as a go-to github repo for plug-and-play application.
Basic concept of Hashing
From the repo page: "We have developed a new image hash based on the Marr wavelet that computes a perceptual hash based on edge information with particular emphasis on corners. It has been shown that the human visual system makes special use of certain retinal cells to distinguish corner-like stimuli. It is the belief that this corner information can be used to distinguish digital images that motivates this approach. Basically, the edge information attained from the wavelet is compressed into a fixed length hash of 72 bytes. Binary quantization allows for relatively fast hamming distance computation between hashes. The following scatter plot shows the results on our standard corpus of images. The first plot shows the distances between each image and its attacked counterpart (e.g. the intra distances). The second plot shows the inter distances between altogether different images. While the hash is not designed to handle rotated images, notice how slight rotations still generally fall within a threshold range and thus can usually be matched as identical. However, the real advantage of this hash is for use with our mvp tree indexing structure. Since it is more descriptive than the dct hash (being 72 bytes in length vs. 8 bytes for the dct hash), there are much fewer false matches retrieved for image queries. "
Another blogpost for an in-depth read, with an application example.
Available Code and Usage
A github repo can be found here. There are obviously more to be found. After importing the package you can use it to generate and compare hashes:
>>> from PIL import Image >>> import imagehash >>> hash = imagehash.average_hash(Image.open('test.png')) >>> print(hash) d879f8f89b1bbf >>> otherhash = imagehash.average_hash(Image.open('other.bmp')) >>> print(otherhash) ffff3720200ffff >>> print(hash == otherhash) False >>> print(hash - otherhash) 36
The demo script
find_similar_images also on the mentioned github, illustrates how to find similar images in a directory.