How to read Ogg or MP3 audio files in a TensorFlow graph?

Yes, there are special decoders, in the package tensorflow.contrib.ffmpeg. To use it, you need to install ffmpeg first.

Example:

audio_binary = tf.read_file('song.mp3')
waveform = tf.contrib.ffmpeg.decode_audio(audio_binary, file_format='mp3', samples_per_second=44100, channel_count=2)

The answer from @sygi is unfortunately not supported in TensorFlow 2.x. An alternative solution would be to use some external library (e.g. pydub or librosa) to implement the mp3 decoding step, and integrate it in the pipeline through the use of tf.py_function. So you can do something along the lines of:

from pydub import AudioSegment
import tensorflow as tf

dataset = tf.data.Dataset.list_files('path/to/mp3s/*')

def decode_mp3(mp3_path):
    mp3_path = mp3_path.numpy().decode("utf-8")
    mp3_audio = AudioSegment.from_file(mp3_path, format="mp3")
    return mp3_audio.get_array_of_samples()

dataset = dataset.map(lambda path:
    tf.py_function(func=decode_mp3, inp=[path], Tout=tf.float32))

for features in dataset.take(3):
    data = features.numpy()
    plt.plot(data)
    plt.show()

enter image description here


Such a function has recently been added to tensorflow_io (here). You can use it like this:

content = tf.io.read_file(path)
audio = tfio.experimental.audio.decode_ogg(content)