Tensorflow high false-positive rate and non-max-suppression issue

I think that before diving into paramter tuning (i.e. the mentioned score_threshold) you will have to review your dataset.

I didn't check the entire dataset you shared but from a high level view the main problem I found is that most of the images are really small and with a highly variable aspect ratio.

In my opinion this enters in conflict with this part of your configuration file:

image_resizer {
  keep_aspect_ratio_resizer {
    min_dimension: 600
    max_dimension: 1024
  }
}

If take one of the images of your dataset and you manually apply that transformation you will see that the result is very noisy for small images and very deformed for many images that have a different aspect ratio.

I would highly recommend you to re-build your dataset with images with more definition and maybe try to preprocess the images with unusual aspect ration with padding, cropping or other strategies.

If you want to stick with the small images you'd have to at least change the min and max dimensions of the image_resizer but, from my experience, the biggest problem here is the dataset and I would invest the time in trying to fix that.

Pd.

I don't see the house false positive as a big problem if we consider that it's from a totally different domain of your dataset.

You could probably adjust the minium confidence to consider a detections as true positive and remove it.

If you take the current winner of COCO and feed it with strange images like from a cartoon you will see that it generates a lot of false positives.

So it's more like a problem with the current object detection approaches wich are not robust to domain changes.


A lot of people I see online have been running into the same issue using Tensorflow API. I think there are some inherent problems with the idea/process of using the pretrained models with custom classifier(s) at home. For example people want to use SSD Mobile or Faster RCNN Inception to detect objects like "Person w/ helmet," "pistol," or "tool box," etc. The general process is to feed in images of that object, but most of the time, no matter how many images...200 to 2000, you still end up with false positives when you go actually run it at your desk.

The object classifier works great when you show it the object in its own context, but you end up getting 99% match on every day items like your bedroom window, your desk, your computer monitor, keyboard, etc. People have mentioned the strategy of introducing negative images or soft images. I think the problem has to do with limited context in the images that most people use. The pretrained models were trained with over a dozen classifiers in many variety of environments like in one example could be a Car on the street. The CNN sees the car and then everything in that image that is not a car is a negative image which includes the street, buildings, sky, etc.. In another image, it can see a Bottle and everything in that image which includes desks, tables, windows, etc. I think the problem with training custom classifiers is that it is a negative image problem. Even if you have enough images of the object itself, there isn't enough data of that that same object in different contexts and backgrounds. So in a sense, there is not enough negative images even if conceptually you shouldn't need negative images. When you run the algorithm at home you get false positives all over the place identifying objects around your own room. I think the idea of transfer learning in this way is flawed. We just end up seeing a lot of great tutorials online of people identifying playing cards, Millenium Falcons, etc., but none of those models are deployable in the real world as they all would generate a bunch of false positives when it sees anything outside of its image pool. The best strategy would be to retrain the CNN from scratch with a multiple classifiers and add the desired ones in there as well. I suggest re-introducing a previous dataset from ImageNet or Pascal with 10-20 pre-existing classifiers and add your own ones and retrain it.