remove silence at the beginning and at the end of wave files with PyDub

You can this code:

from pydub.silence import detect_nonsilent

def remove_sil(path_in, path_out, format="wav"):
    sound = AudioSegment.from_file(path_in, format=format)
    non_sil_times = detect_nonsilent(sound, min_silence_len=50, silence_thresh=sound.dBFS * 1.5)
    if len(non_sil_times) > 0:
        non_sil_times_concat = [non_sil_times[0]]
        if len(non_sil_times) > 1:
            for t in non_sil_times[1:]:
                if t[0] - non_sil_times_concat[-1][-1] < 200:
                    non_sil_times_concat[-1][-1] = t[1]
                else:
                    non_sil_times_concat.append(t)
        non_sil_times = [t for t in non_sil_times_concat if t[1] - t[0] > 350]
        sound[non_sil_times[0][0]: non_sil_times[-1][1]].export(path_out, format='wav')

I would advise that you cycle in chunks of at least 10 ms in order to do it a little more quickly (less iterations) and also because individual samples don't really have a "loudness".

Sound is vibration, so at a minimum it would take 2 samples to detect whether there was actually any sound, (but that would only tell you about high frequency).

Anyway… something like this could work:

from pydub import AudioSegment

def detect_leading_silence(sound, silence_threshold=-50.0, chunk_size=10):
    '''
    sound is a pydub.AudioSegment
    silence_threshold in dB
    chunk_size in ms

    iterate over chunks until you find the first one with sound
    '''
    trim_ms = 0 # ms

    assert chunk_size > 0 # to avoid infinite loop
    while sound[trim_ms:trim_ms+chunk_size].dBFS < silence_threshold and trim_ms < len(sound):
        trim_ms += chunk_size

    return trim_ms

sound = AudioSegment.from_file("/path/to/file.wav", format="wav")

start_trim = detect_leading_silence(sound)
end_trim = detect_leading_silence(sound.reverse())

duration = len(sound)    
trimmed_sound = sound[start_trim:duration-end_trim]

pydub has probably been updated since this question was first asked, but here is the code I used to trim trailing and leading silence:

from pydub import AudioSegment
from pydub.silence import detect_leading_silence

trim_leading_silence: AudioSegment = lambda x: x[detect_leading_silence(x) :]
trim_trailing_silence: AudioSegment = lambda x: trim_leading_silence(x.reverse()).reverse()
strip_silence: AudioSegment = lambda x: trim_trailing_silence(trim_leading_silence(x))

sound = AudioSegment.from_file(file_path_here)
stripped = strip_silence(sound)

detect_leading_silence from pydub.silence gives you indices you can use to slice the loaded AudioSegment. Basically, you can reverse the AudioSegment, trim it, and reverse it again to trim trailing silence. Stripping silence from both ends is tantamount to trimming leading and trailing silences.

Note that strip_silence should raise an IndexError if the loaded AudioSegment is silent or becomes silent after a trim operation.

The last time I looked, the default chunk size was 10 ms and the default silence threshold was -50 dBFS.

My version of pydub is 0.25.1 and my version of ffmpeg is 4.3.1.

Tags:

Python

Wave

Pydub