Extract audio from video file

The following Swift 5 / iOS 12.3 code shows how to extract audio from a movie file (.mov) and convert it to an audio file (.m4a) by using AVURLAsset, AVMutableComposition and AVAssetExportSession:

import UIKit
import AVFoundation

class ViewController: UIViewController {

    @IBAction func extractAudioAndExport(_ sender: UIButton) {
        // Create a composition
        let composition = AVMutableComposition()
        do {
            let sourceUrl = Bundle.main.url(forResource: "Movie", withExtension: "mov")!
            let asset = AVURLAsset(url: sourceUrl)
            guard let audioAssetTrack = asset.tracks(withMediaType: AVMediaType.audio).first else { return }
            guard let audioCompositionTrack = composition.addMutableTrack(withMediaType: AVMediaType.audio, preferredTrackID: kCMPersistentTrackID_Invalid) else { return }
            try audioCompositionTrack.insertTimeRange(audioAssetTrack.timeRange, of: audioAssetTrack, at: CMTime.zero)
        } catch {
            print(error)
        }

        // Get url for output
        let outputUrl = URL(fileURLWithPath: NSTemporaryDirectory() + "out.m4a")
        if FileManager.default.fileExists(atPath: outputUrl.path) {
            try? FileManager.default.removeItem(atPath: outputUrl.path)
        }

        // Create an export session
        let exportSession = AVAssetExportSession(asset: composition, presetName: AVAssetExportPresetPassthrough)!
        exportSession.outputFileType = AVFileType.m4a
        exportSession.outputURL = outputUrl

        // Export file
        exportSession.exportAsynchronously {
            guard case exportSession.status = AVAssetExportSession.Status.completed else { return }

            DispatchQueue.main.async {
                // Present a UIActivityViewController to share audio file
                guard let outputURL = exportSession.outputURL else { return }
                let activityViewController = UIActivityViewController(activityItems: [outputURL], applicationActivities: [])
                self.present(activityViewController, animated: true, completion: nil)
            }
        }
    }

}

In all multimedia formats, audio is encoded separately from video, and their frames are interleaved in the file. So removing the video from a multimedia file does not require any messing with encoders and decoders: you can write a file format parser that will drop the video track, without using the multimedia APIs on the phone.

To do this without using a 3rd party library, you need to write the parser from scratch, which could be simple or difficult depending on the file format you wish to use. For example, FLV is very simple so stripping a track out of it is very easy (just go over the stream, detect the frame beginnings and drop the '0x09'=video frames). MP4 a bit more complex, its header (MOOV) has a hierarchical structure in which you have headers for each of the tracks (TRAK atoms). You need to drop the video TRAK, and then copy the interleaved bitstream atom (MDAT) skipping all the video data clusters as you copy.

There are 3rd party libraries you can use, aside from ffmpeg. One that comes in mind is GPAC MP4BOX (LGPL license). If the LGPL is a problem, there are plenty of commercial SDKs that you can use.