What is the best way to programmatically detect porn images?

  • Bag-of-Visual-Words Models for Adult Image Classification and Filtering
  • What is the best way to programatically detect porn images?
  • A Brief Survey of Porn-Detection/Porn-Removal Software
  • Detection of Pornographic Digital Images (2011!)

This was written in 2000, not sure if the state of the art in porn detection has advanced at all, but I doubt it.

http://www.dansdata.com/pornsweeper.htm

PORNsweeper seems to have some ability to distinguish pictures of people from pictures of things that aren't people, as long as the pictures are in colour. It is less successful at distinguishing dirty pictures of people from clean ones.

With the default, medium sensitivity, if Human Resources sends around a picture of the new chap in Accounts, you've got about a 50% chance of getting it. If your sister sends you a picture of her six-month-old, it's similarly likely to be detained.

It's only fair to point out amusing errors, like calling the Mona Lisa porn, if they're representative of the behaviour of the software. If the makers admit that their algorithmic image recogniser will drop the ball 15% of the time, then making fun of it when it does exactly that is silly.

But PORNsweeper only seems to live up to its stated specifications in one department - detection of actual porn. It's half-way decent at detecting porn, but it's bad at detecting clean pictures. And I wouldn't be surprised if no major leaps were made in this area in the near future.


I would rather allow users report on bad images. Image recognition development can take too much efforts and time and won't be as much as accurate as human eyes. It's much cheaper to outsource that moderation job.

Take a look at: Amazon Mechanical Turk

"The Amazon Mechanical Turk (MTurk) is one of the suite of Amazon Web Services, a crowdsourcing marketplace that enables computer programs to co-ordinate the use of human intelligence to perform tasks which computers are unable to do."


This is actually reasonably easy. You can programatically detect skin tones - and porn images tend to have a lot of skin. This will create false positives but if this is a problem you can pass images so detected through actual moderation. This not only greatly reduces the the work for moderators but also gives you lots of free porn. It's win-win.

#!python    
import os, glob
from PIL import Image

def get_skin_ratio(im):
    im = im.crop((int(im.size[0]*0.2), int(im.size[1]*0.2), im.size[0]-int(im.size[0]*0.2), im.size[1]-int(im.size[1]*0.2)))
    skin = sum([count for count, rgb in im.getcolors(im.size[0]*im.size[1]) if rgb[0]>60 and rgb[1]<(rgb[0]*0.85) and rgb[2]<(rgb[0]*0.7) and rgb[1]>(rgb[0]*0.4) and rgb[2]>(rgb[0]*0.2)])
    return float(skin)/float(im.size[0]*im.size[1])

for image_dir in ('porn','clean'):
    for image_file in glob.glob(os.path.join(image_dir,"*.jpg")):
        skin_percent = get_skin_ratio(Image.open(image_file)) * 100
        if skin_percent>30:
            print "PORN {0} has {1:.0f}% skin".format(image_file, skin_percent)
        else:
            print "CLEAN {0} has {1:.0f}% skin".format(image_file, skin_percent)

This code measures skin tones in the center of the image. I've tested on 20 relatively tame "porn" images and 20 completely innocent images. It flags 100% of the "porn" and 4 out of the 20 of the clean images. That's a pretty high false positive rate but the script aims to be fairly cautious and could be further tuned. It works on light, dark and Asian skin tones.

It's main weaknesses with false positives are brown objects like sand and wood and of course it doesn't know the difference between "naughty" and "nice" flesh (like face shots).

Weakness with false negatives would be images without much exposed flesh (like leather bondage), painted or tattooed skin, B&W images, etc.

source code and sample images