Receiving RTP stream - AudioStream, AudioGroup

Apologies if the following is dumb:

The ffmpeg command line appears to be generating a test sound and emitting it as a pcm data stream over RTP.

RTP in itself does not guarantee reliable delivery of streamed data, it merely provides enough information to tell the receiver if it has received all the data, and exactly what data is missing if some was lost in transit. Plus it is normally used over UDP.

Hence with RTP the emphasis is on the user of RTP to send data that is encoded in such a way (i.e. with error correction coding, redundancy in the data, etc) so that the reciever can reconstruct enough of the original data to meet the application's needs. So with an audio stream you'd need some sort of encoding format that suits.

I've not found a reference for what pcm_u8 means but it is highly suggestive of it being a straightforward pulse code modulated data stream, with 8 bit data. That doesn't sound like it has any error correction encoding or data redundancy built into it. Losing a byte of that means losing a sample, and there's nothing that can be done at the receiving end to fill in.

So I think what's happening is that something in your network is dropping UDP packets, that RTP is telling the AudioStream which data is missing and the result is gaps because there's no error correction or data redundancy in the pcm_u8 data stream to allow the lost data to be reconstructed by the AudioStream.

I've seen things like VMWare deliberately drop UDP packets on a virtual network as a way of ensuring good performance, the justification being that UDP is not guaranteed delivery anyway so "it doesn't matter". That severely stung a colleague who was using RTP and was expecting guaranteed delivery but didn't get it. He had a closed network segment with a sever at each end of the wire, one of them hosting a single VM.

So it might simply be a case of changing which codec you're using. I'm not able to recommend one. For a start it's worth examining what a broadcast digital media stream uses. DVB-T uses MPEG Transport Stream (which has error correction coding, etc) as, AFAIK, a wrapper around MPEG-2.


Answering my own question, the problem was with android rtp packet management.

Android said that ... assume packet interval is 50ms or less. in the AudioGroup source file.

However, RTP packets are sending with interval 60ms.

That means 50ms is not enough and this leads the problem as described below.

Incoming: X X X X X X Y Y Y Y Y Y X X X X X X Y Y Y Y Y Y X X X X X X
Reading : X X X X X Y Y Y Y Y X X X X X Y Y Y Y Y X X X X X Y Y Y Y Y
          ^ ^ ^ ^ ^ - - - - - - - - - - - - - - - - - - - - ^ ^ ^ ^ ^ 
          ^                                                 ^
          |                                                 |
          |---- just these overlapping packets is valid ----|
          |---- and other packets discarding due to --------|
          |---- invalid RTP headers. -----------------------|

X, Y < packets

I have just one packet in every 300ms interval. That results jittery sound.

I'll send a bug report for this, hope it helps someone.

For ones who are really want to listen raw RTP stream, I suggest them to read packets manually and decode it to PCM 16bit (which is the only audio format that android soundcard supports) and write it on AudioTrack.