Algorithm to compare two images in C#

Here is a simple approach with a 256 bit image-hash (MD5 has 128 bit)

  1. resize the picture to 16x16 pixel

16x16 resized

  1. reduce colors to black/white (which equals true/false in this console output)

enter image description here

  1. read the boolean values into List<bool> - this is the hash

Code:

public static List<bool> GetHash(Bitmap bmpSource)
{
    List<bool> lResult = new List<bool>();         
    //create new image with 16x16 pixel
    Bitmap bmpMin = new Bitmap(bmpSource, new Size(16, 16));
    for (int j = 0; j < bmpMin.Height; j++)
    {
        for (int i = 0; i < bmpMin.Width; i++)
        {
            //reduce colors to true / false                
            lResult.Add(bmpMin.GetPixel(i, j).GetBrightness() < 0.5f);
        }             
    }
    return lResult;
}

I know, GetPixel is not that fast but on a 16x16 pixel image it should not be the bottleneck.

  1. compare this hash to hash values from other images and add a tolerance.(number of pixels that can differ from the other hash)

Code:

List<bool> iHash1 = GetHash(new Bitmap(@"C:\mykoala1.jpg"));
List<bool> iHash2 = GetHash(new Bitmap(@"C:\mykoala2.jpg"));

//determine the number of equal pixel (x of 256)
int equalElements = iHash1.Zip(iHash2, (i, j) => i == j).Count(eq => eq);

So this code is able to find equal images with:

  • different file formats (e.g. jpg, png, bmp)
  • rotation (90, 180, 270), horizontal /vertical flip - by changing the iteration order of i and j
  • different dimensions (same aspect is required)
  • different compression (tolerance is required in case of quality loss like jpeg artifacts) - you can accept a 99% equality to be the same image and 50% to be a different one.
  • colored changed to geyscaled and the other way round (because brightness is independent of color)

Update / Improvements:

after using this method for a while I noticed a few improvements that can be done

  • replacing GetPixel for more performance
  • using the exeif-thumbnail instead of reading the whole image for a performance improvement
  • instead of setting 0.5f to differ between light and dark - use the distinct median brightness of all 256 pixels. Otherwise dark/light images are assumed to be the same and it enables to detect images which have a changed brightness.
  • if you need fast calculations, use bool[] or List<bool> if you need to store a lot hashes with the need to save memory, use a Bitarray because a Boolean isn't stored in a bit, it takes a byte!

You could check Algorithm to compare two images in order to see the available methods for image comparison.

Unless you want to recreate the full algorithms on your own, you should try to use already existing libraries or a least part of their code (as long as their license is ok for you).

For an open source C# implementation of Edge detection and related Computer vision algorithms, you can try EmguCV which is a wrapper of OpenCV.


After resampling the images to some common resolution, you could use a Wavelet Decomposition and compare the coefficients of this decomposition instead of the images themselves. Comparing only the first N coefficients will make this method more robust to noise and other artifacts.

There are several C# implementations for wavelets available. One example is https://waveletstudio.codeplex.com/

Tags:

C#

Image

Hash