Open CV Face Recognition not accurate

Update

According to the new edit in the question, you need a way to identify new people on the fly whose photos might not have been available during the training phase of the model. These tasks are called few shot learning. This is similar to the requirements of the intelligence/police agencies to find their targets using CCTV camera footage. As usually there are not enough images of a specific target, during training, they use models such as FaceNet. I really suggest reading the paper, however, I explain a few of its highlights here:

  • Generally, the last layer of a classifier is a n*1 vector with n-1 of the elements almost equal to zero, and one close to 1. The element close to 1, determines the prediction of the classifier about the input's label. Typical CNN architecture
  • The authors figured out that if they train a classifier network with a specific loss function on a huge dataset of faces, you can use the semi-final layer output as a representation of any face, irrespective of it being in the training set or not, the authors call this vector Face Embedding.
  • The previous result means that with a very well trained FaceNet model, you can summarise any face into a vector. The very interesting attribute of this approach is that the vectors of a specific person's face in different angles/positions/states have are proximate in the euclidian space (this property is enforced by the loss function that the authors chose).enter image description here
  • In summary, you have a model that gets faces as input and returns vectors. The vectors close to each other are very likely to belong to the same person (For checking that you can use KNN or just simple euclidian distance).

One implementation of FaceNet can be found here. I suggest you try to run it on your computer to get to know what you are actually dealing with. After that, it might be best to do the following:

  1. Transform the FaceNet model mentioned in the repository to its tflite version (this blogpost might help)
  2. For each photo submitted by the user, use Face API to extract the face(s)
  3. Use the minified model in your app to get the face embeddings of the extracted face.
  4. Process all the images in the gallery of the user, getting the vectors for the faces in the photos.
  5. Then compare each vector found in step4 with each vector found in step3 to get the matches.

Original Answer

You came across one of the most prevalent challenges of machine learning: Overfitting. Face detection and recognition is a huge area of research on its own and almost all the reasonably accurate models are using some kind of deep learning. Note that even detecting a face accurately is not as easy as it seems, however, as you are doing it on android, you can use Face API for this task. (Other more advanced techniques such as MTCNN are too slow/difficult to deploy on a handset). It has been shown that just feeding the model with a face photo with a lot of background noise or multiple people inside does not work. So, you really cannot skip this step.

After getting a nice trimmed face of the candidate targets from the background, you need to overcome the challenge of recognising the detected faces. Again, all the competent models to the best of my knowledge, are using some sort of deep learning/convolutional neural networks. Using them on a mobile phone is a challenge, but thanks to Tensorflow Lite you can minify them and run them within your app. A project about face recognition on android phones that I had worked on is here that you can check. Keep in mind that any good model should be trained on numerous instances of labelled data, however there are a plethora of models already trained on large datasets of faces or other image recognition tasks, to tweak them and use their existing knowledge, we can employ transfer learning, for a quick start on object detection and transfer learning that is closely related to your case check this blog post.

Overall, you have to get numerous instances of the faces that you want to detect plus numerous face pics of people that you don't care about, then you need to train a model based on the above-mentioned resources, and then you need to use TensorFlow lite to decrease its size and embed it within your app. For each frame then, you call android Face API and feed (the probably detected face) into the model and identify the person.

Depending on your level of tolerance for delay and the number of training set size and number of targets, you can get various results, however, %90+ accuracy is easily achievable if you have only a few target people.


If I understand correctly, you're training the classifier with a single image. In that case, this one specific image is everything the classifier will be able to ever recognise. You would need a noticeably bigger training set of pictures showing the same person, something like 5 or 10 different images at the very least.