Mixing 16 bit linear PCM streams and avoiding clipping/overflow

There's a discussion here: https://dsp.stackexchange.com/questions/3581/algorithms-to-mix-audio-signals-without-clipping about why the A+B - A*B solution is not ideal. Hidden down in one of the comments on this discussion is the suggestion to sum the values and divide by the square root of the number of signals. And an additional check for clipping couldn't hurt. This seems like a reasonable (simple and fast) middle ground.


here's a descriptive implementation:

short int mix_sample(short int sample1, short int sample2) {
    const int32_t result(static_cast<int32_t>(sample1) + static_cast<int32_t>(sample2));
    typedef std::numeric_limits<short int> Range;
    if (Range::max() < result)
        return Range::max();
    else if (Range::min() > result)
        return Range::min();
    else
        return result;
}

to mix, it's just add and clip!

to avoid clipping artifacts, you will want to use saturation or a limiter. ideally, you will have a small int32_t buffer with a small amount of lookahead. this will introduce latency.

more common than limiting everywhere, is to leave a few bits' worth of 'headroom' in your signal.


The best solution I have found is given by Viktor Toth. He provides a solution for 8-bit unsigned PCM, and changing that for 16-bit signed PCM, produces this:

int a = 111; // first sample (-32768..32767)
int b = 222; // second sample
int m; // mixed result will go here

// Make both samples unsigned (0..65535)
a += 32768;
b += 32768;

// Pick the equation
if ((a < 32768) || (b < 32768)) {
    // Viktor's first equation when both sources are "quiet"
    // (i.e. less than middle of the dynamic range)
    m = a * b / 32768;
} else {
    // Viktor's second equation when one or both sources are loud
    m = 2 * (a + b) - (a * b) / 32768 - 65536;
}

// Output is unsigned (0..65536) so convert back to signed (-32768..32767)
if (m == 65536) m = 65535;
m -= 32768;

Using this algorithm means there is almost no need to clip the output as it is only one value short of being within range. Unlike straight averaging, the volume of one source is not reduced even when the other source is silent.


Here is what I did on my recent synthesizer project.

int* unfiltered = (int *)malloc(lengthOfLongPcmInShorts*4);
int i;
for(i = 0; i < lengthOfShortPcmInShorts; i++){
    unfiltered[i] = shortPcm[i] + longPcm[i];
}
for(; i < lengthOfLongPcmInShorts; i++){
     unfiltered[i] = longPcm[i];
}

int max = 0;
for(int i = 0; i < lengthOfLongPcmInShorts; i++){
   int val = unfiltered[i];
   if(abs(val) > max)
      max = val;
}

short int *newPcm = (short int *)malloc(lengthOfLongPcmInShorts*2);
for(int i = 0; i < lengthOfLongPcmInShorts; i++){
   newPcm[i] = (unfilted[i]/max) * MAX_SHRT;
}

I added all the PCM data into an integer array, so that I get all the data unfiltered.

After doing that I looked for the absolute max value in the integer array.

Finally, I took the integer array and put it into a short int array by taking each element dividing by that max value and then multiplying by the max short int value.

This way you get the minimum amount of 'headroom' needed to fit the data.

You might be able to do some statistics on the integer array and integrate some clipping, but for what I needed the minimum amount of headroom was good enough for me.