How to convert sample rate from AV_SAMPLE_FMT_FLTP to AV_SAMPLE_FMT_S16?

I found 2 resample function from FFMPEG. The performance maybe better.

  1. avresample_convert() http://libav.org/doxygen/master/group__lavr.html
  2. swr_convert() http://spirton.com/svn/MPlayer-SB/ffmpeg/libswresample/swresample_test.c

Thanks Reuben for a solution to this. I did find that some of the sample values were slightly off when compared with a straight ffmpeg -i file.wav. It seems that in the conversion, they use a round() on the value.

To do the conversion, I did what you did with a bid of modification to work for any amount of channels:

if (audioCodecContext->sample_fmt == AV_SAMPLE_FMT_FLTP)
{
    int nb_samples = decoded_frame->nb_samples;
    int channels = decoded_frame->channels;
    int outputBufferLen = nb_samples & channels * 2;
    short* outputBuffer = new short[outputBufferLen/2];

    for (int i = 0; i < nb_samples; i++)
    {
         for (int c = 0; c < channels; c++)
         {
             float* extended_data = (float*)decoded_frame->extended_data[c];
             float sample = extended_data[i];
             if (sample < -1.0f) sample = -1.0f;
             else if (sample > 1.0f) sample = 1.0f;
             outputBuffer[i * channels + c] = (short)round(sample * 32767.0f);
         }
    }

    // Do what you want with the data etc.

}

I went from ffmpeg 0.11.1 -> 1.1.3 and found the change of sample format annoying. I looked at setting the request_sample_fmt to AV_SAMPLE_FMT_S16 but it seems the aac decoder doesn't support anything other than AV_SAMPLE_FMT_FLTP anyway.


EDIT 9th April 2013: Worked out how to use libswresample to do this... much faster!

At some point in the last 2-3 years FFmpeg's AAC decoder's output format changed from AV_SAMPLE_FMT_S16 to AV_SAMPLE_FMT_FLTP. This means that each audio channel has it's own buffer, and each sample value is a 32-bit floating point value scaled from -1.0 to +1.0.

Whereas with AV_SAMPLE_FMT_S16 the data is in a single buffer, with the samples interleaved, and each sample is a signed integer from -32767 to +32767.

And if you really need your audio as AV_SAMPLE_FMT_S16, then you have to do the conversion yourself. I figured out two ways to do it:

1. Use libswresample (recommended)

#include "libswresample/swresample.h"

...

SwrContext *swr;

...

// Set up SWR context once you've got codec information
swr = swr_alloc();
av_opt_set_int(swr, "in_channel_layout",  audioCodec->channel_layout, 0);
av_opt_set_int(swr, "out_channel_layout", audioCodec->channel_layout,  0);
av_opt_set_int(swr, "in_sample_rate",     audioCodec->sample_rate, 0);
av_opt_set_int(swr, "out_sample_rate",    audioCodec->sample_rate, 0);
av_opt_set_sample_fmt(swr, "in_sample_fmt",  AV_SAMPLE_FMT_FLTP, 0);
av_opt_set_sample_fmt(swr, "out_sample_fmt", AV_SAMPLE_FMT_S16,  0);
swr_init(swr);

...

// In your decoder loop, after decoding an audio frame:
AVFrame *audioFrame = ...;
int16_t* outputBuffer = ...;
swr_convert(&outputBuffer, audioFrame->nb_samples, audioFrame->extended_data, audioFrame->nb_samples);   

And that's all you have to do!

2. Do it by hand in C (original answer, not recommended)

So in your decode loop, when you've got an audio packet you decode it like this:

AVCodecContext *audioCodec;   // init'd elsewhere
AVFrame *audioFrame;          // init'd elsewhere
AVPacket packet;              // init'd elsewhere
int16_t* outputBuffer;        // init'd elsewhere
int out_size = 0;
...
int len = avcodec_decode_audio4(audioCodec, audioFrame, &out_size, &packet);

And then, if you've got a full frame of audio, you can convert it fairly easily:

    // Convert from AV_SAMPLE_FMT_FLTP to AV_SAMPLE_FMT_S16
    int in_samples = audioFrame->nb_samples;
    int in_linesize = audioFrame->linesize[0];
    int i=0;
    float* inputChannel0 = (float*)audioFrame->extended_data[0];
    // Mono
    if (audioFrame->channels==1) {
        for (i=0 ; i<in_samples ; i++) {
            float sample = *inputChannel0++;
            if (sample<-1.0f) sample=-1.0f; else if (sample>1.0f) sample=1.0f;
            outputBuffer[i] = (int16_t) (sample * 32767.0f);
        }
    }
    // Stereo
    else {
        float* inputChannel1 = (float*)audioFrame->extended_data[1];
        for (i=0 ; i<in_samples ; i++) {
             outputBuffer[i*2] = (int16_t) ((*inputChannel0++) * 32767.0f);
             outputBuffer[i*2+1] = (int16_t) ((*inputChannel1++) * 32767.0f);
        }
    }
    // outputBuffer now contains 16-bit PCM!

I've left a couple of things out for clarity... the clamping in the mono path should ideally be duplicated in the stereo path. And the code can be easily optimized.