Accurate binary image classification

I think this is some sort of Supervised Learning. You need to do some feature extraction on the images and then do your classification on the basis of the feature vector you've computed for each image.

Feature Extraction

On the first sight, that Feature Extraction part looks like a good scenario for Hu-Moments. Just calculate the image moments, then compute cv::HuMoments from these. Then you have a 7 dimensional real valued feature space (one feature vector per image). Alternatively, you could omit this step and use each pixel value as seperate feature. I think the suggestion in this answer goes in this direction, but adds a PCA compression to reduce the dimensionality of the feature space.

Classification

As for the classification part, you can use almost any classification algorithm you like. You could use an SVM for each letter (binary yes-no classification), you could use a NaiveBayes (what is the maximal likely letter), or you could use a k-NearestNeighbor (kNN, minimum spatial distance in feature space) approach, e.g. flann.

Especially for distance-based classifiers (e.g. kNN) you should consider a normalization of your feature space (e.g. scale all dimension values to a certain range for euclidean distance, or use things like mahalanobis distance). This is to avoid overrepresenting features with large value differences in the classification process.

Evaluation

Of course you need training data, that is images' feature vectors given the correct letter. And a process, to evaluate your process, e.g. cross validation.


In this case, you might also want to have a look at template matching. In this case you would convolute the candidate image with the available patterns in your training set. High values in the output image indicate a good probability that the pattern is located at that position.


This is a recognition problem. I'd personally use a combination of PCA and a machine learning technique (likely SVM). These are fairly large topics so I'm afraid I can't really elaborate too much, but here's the very basic process:

  1. Gather your training images (more than one per letter, but don't go crazy)
  2. Label them (could mean a lot of things, in this case it means group the letters into logical groups -- All A images -> 1, All B images -> 2, etc.)
  3. Train your classifier
    • Run everything through PCA decomposition
    • Project all of your training images into PCA space
    • Run the projected images through an SVM (if it's a one-class classifier, do them one at a time, otherwise do them all at once.)
    • Save off your PCA eigenvector and SVM training data
  4. Run recognition
    • Load in your PCA space
    • Load in your SVM training data
    • For each new image, project it into PCA space and ask your SVM to classify it.
    • If you get an answer (a number) map it back to a letter (1 -> A, 2 -> B, etc).