Is there an elegant way to split a file by chapter using ffmpeg?

ffmpeg -i "$SOURCE.$EXT" 2>&1 \ # get metadata about file
| grep Chapter \ # search for Chapter in metadata and pass the results
| sed -E "s/ *Chapter #([0-9]+.[0-9]+): start ([0-9]+.[0-9]+), end ([0-9]+.[0-9]+)/-i \"$SOURCE.$EXT\" -vcodec copy -acodec copy -ss \2 -to \3 \"$SOURCE-\1.$EXT\"/" \ # filter the results, explicitly defining the timecode markers for each chapter
| xargs -n 11 ffmpeg # construct argument list with maximum of 11 arguments and execute ffmpeg

Your command parses through the files metadata and reads out the timecode markers for each chapter. You could do this manually for each chapter..

ffmpeg -i ORIGINALFILE.mp4 -acodec copy -vcodec copy -ss 0 -t 00:15:00 OUTFILE-1.mp4

or you can write out the chapter markers and run through them with this bash script which is just a little easier to read..

#!/bin/bash
# Author: http://crunchbang.org/forums/viewtopic.php?id=38748#p414992
# m4bronto

#     Chapter #0:0: start 0.000000, end 1290.013333
#       first   _     _     start    _     end

while [ $# -gt 0 ]; do

ffmpeg -i "$1" 2> tmp.txt

while read -r first _ _ start _ end; do
  if [[ $first = Chapter ]]; then
    read  # discard line with Metadata:
    read _ _ chapter

    ffmpeg -vsync 2 -i "$1" -ss "${start%?}" -to "$end" -vn -ar 44100 -ac 2 -ab 128  -f mp3 "$chapter.mp3" </dev/null

  fi
done <tmp.txt

rm tmp.txt

shift
done

or you can use HandbrakeCLI, as originally mentioned in this post, this example extracts chapter 3 to 3.mkv

HandBrakeCLI -c 3 -i originalfile.mkv -o 3.mkv

or another tool is mentioned in this post

mkvmerge -o output.mkv --split chapters:all input.mkv

(Edit: This tip came from https://github.com/phiresky via this issue: https://github.com/harryjackson/ffmpeg_split/issues/2)

You can get chapters using:

ffprobe -i fname -print_format json -show_chapters -loglevel error

If I was writing this again I'd use ffprobe's json options

(Original answer follows)

This is a working python script. I tested it on several videos and it worked well. Python isn't my first language but I noticed you use it so I figure writing it in Python might make more sense. I've added it to Github. If you want to improve please submit pull requests.

#!/usr/bin/env python
import os
import re
import subprocess as sp
from subprocess import *
from optparse import OptionParser

def parseChapters(filename):
  chapters = []
  command = [ "ffmpeg", '-i', filename]
  output = ""
  try:
    # ffmpeg requires an output file and so it errors 
    # when it does not get one so we need to capture stderr, 
    # not stdout.
    output = sp.check_output(command, stderr=sp.STDOUT, universal_newlines=True)
  except CalledProcessError, e:
    output = e.output 
   
  for line in iter(output.splitlines()):
    m = re.match(r".*Chapter #(\d+:\d+): start (\d+\.\d+), end (\d+\.\d+).*", line)
    num = 0 
    if m != None:
      chapters.append({ "name": m.group(1), "start": m.group(2), "end": m.group(3)})
      num += 1
  return chapters

def getChapters():
  parser = OptionParser(usage="usage: %prog [options] filename", version="%prog 1.0")
  parser.add_option("-f", "--file",dest="infile", help="Input File", metavar="FILE")
  (options, args) = parser.parse_args()
  if not options.infile:
    parser.error('Filename required')
  chapters = parseChapters(options.infile)
  fbase, fext = os.path.splitext(options.infile)
  for chap in chapters:
    print "start:" +  chap['start']
    chap['outfile'] = fbase + "-ch-"+ chap['name'] + fext
    chap['origfile'] = options.infile
    print chap['outfile']
  return chapters

def convertChapters(chapters):
  for chap in chapters:
    print "start:" +  chap['start']
    print chap
    command = [
        "ffmpeg", '-i', chap['origfile'],
        '-vcodec', 'copy',
        '-acodec', 'copy',
        '-ss', chap['start'],
        '-to', chap['end'],
        chap['outfile']]
    output = ""
    try:
      # ffmpeg requires an output file and so it errors 
      # when it does not get one
      output = sp.check_output(command, stderr=sp.STDOUT, universal_newlines=True)
    except CalledProcessError, e:
      output = e.output
      raise RuntimeError("command '{}' return with error (code {}): {}".format(e.cmd, e.returncode, e.output))

if __name__ == '__main__':
  chapters = getChapters()
  convertChapters(chapters)

A version of the original shell code with:

  • improved efficiency by
    • using ffprobe instead of ffmpeg
    • splitting the input rather than the output
  • improved reliability by avoiding xargs and sed
  • improved readability by using multiple lines
  • carrying over of multiple audio or subtitle streams
  • remove chapters from output files (as they would be invalid timecodes)
  • simplified command-line arguments
#!/bin/sh -efu

input="$1"
ffprobe \
    -print_format csv \
    -show_chapters \
    "$input" |
cut -d ',' -f '5,7,8' |
while IFS=, read start end chapter
do
    ffmpeg \
        -nostdin \
        -ss "$start" -to "$end" \
        -i "$input" \
        -c copy \
        -map 0 \
        -map_chapters -1 \
        "${input%.*}-$chapter.${input##*.}"
done

To prevent it from interfering with the loop, ffmpeg is instructed not to read from stdin.

Tags:

Ffmpeg