synchronizing audio over a network

Hard problem, but possible.

Use NTP or tictoc to get yourself a synchronised clock with a known rate in terms of your system's time source.

Also keep an estimator running as to the rate of your sound clock; the usual way of doing this is to record with the same sound device that is playing, recording over a buffer preloaded with a magic number, and see where the sound card gets to in a measured time by the synchronised clock (or vice versa, see how long it takes to do a known number of samples on the synchronised clock). You need to keep doing this, the clock will drift relative to network time.

So now you know exactly how many samples per second by your soundcard's clock you need to output to match the rate of the synchronised clock. So you then interpolate the samples received from the network at that rate, plus or minus a correction if you need to catch up or fall back a bit from where you got to on the last buffer. You will need to be extremely careful about doing this interpolation in such a way that it does not introduce audio artifacts; there is example code here for the algorithms you will need, but it's going to be quite a bit of reading before you get up to speed on that.

If your source is a live recording, of course, you're going to have to measure the sample rate of that soundcard and interpolate into network time samples before sending it.


Ryan Barrett wrote up his findings on his blog.

His solution involved using NTP as a method to keep all the clocks in-sync:

Seriously, though, there's only one trick to p4sync, and that is how it uses NTP. One host acts as the p4sync server. The other p4sync clients synchronize their system clocks to the server's clock, using SNTP. When the server starts playing a song, it records the time, to the millisecond. The clients then retrieve that timestamp, calculate the difference between current time from that timestamp, and seek forward that far into the song.