Meaningful thumbnails for a Video using FFmpeg

How about looking for, ideally, the first >40%-change frame within each of 5 time spans, where the time spans are the 1st, 2nd, 3rd, 4th, and 5th 20% of the video.

You could also split it into 6 time spans and disregard the 1st one to avoid credits.

In practice, this would mean setting the fps to a low number while applying your scene change check and your argument to throw out the first bit of the video.

...something like:

ffmpeg -ss 3 -i input.mp4 -vf "select=gt(scene\,0.4)" -frames:v 5 -vsync vfr -vf fps=fps=1/600 out%02d.jpg

Defining meaningful is hard but if you want to make N thumbnails efficiently spanning whole video file this is what I use to generate thumbnails on production with user uploaded content.

Pseudo-code

for X in 1..N
  T = integer( (X - 0.5) * D / N )  
  run `ffmpeg -ss <T> -i <movie>
              -vf select="eq(pict_type\,I)" -vframes 1 image<X>.jpg`

Where:

  • D - video duration read from ffmpeg -i <movie> alone or ffprobe which has nice JSON output writer btw
  • N - total number of thumbnails you want
  • X - thumbnail number, from 1 to N
  • T - time point for tumbnail

Simply the above writes down center key-frame of each partition of the movie. E.g. if movie is 300s long and you want 3 thumbnails then it takes one key frame after 50s, 150s and 250s. For 5 thumbnails it would be 30s, 90s, 150s, 210s, 270s. You can adjust N depending on movie duration D, that e.g. 5 minute movie will have 3 thumbnails but over 1 hour will have 20 thumbnails.

Performance

Each invocation of above ffmpeg command takes a fraction of second (!) for ~1GB H.264. That is because it instantly jumps to <time> position (mind -ss before -i) and takes first key frame which is practically complete JPEG. There is no time wasted for rendering the movie to match exact time position.

Post-processing

You can mix above with scale or any other resize method. You can also remove solid color frames or try to mix it with other filters like thumbnail.


I once did something similar, but I exported all frames of the video (in 1 fps) and compared them with a perl utility I found which computes the difference between images. I compared each frame to previous thumbnails, and if it was different from all thumbnails, I added it to the thumbnails collection. The advantage here is that if your video moves from scene A to B and them returns to A, ffmpeg will export 2 frames of A.