Precise seek in MP3 files on Android

For those who might come across this issue in the future, I ended up simply converting mp3 to m4a. This was the simplest solution in my specific case.


MP3 files are not inherently seekable. They don't contain any timestamps. It's just a series of MPEG frames, one after the other. That makes this tricky. There are two methods for seeking an MP3, each with some tradeoffs.

The most common (and fastest) method is to read the bitrate from the first frame header (or, maybe the average bitrate from the first few frame headers), perhaps 128k. Then, take the byte length of the entire file, divide it by this bitrate to estimate the time length of the file. Then, let the user seek into the file. If they seek 1:00 into a 2:00 file, divide the byte size of the file to the 50% mark and "needle drop" into the stream. Read the file until a sync word for the next frame header comes by, and then begin decoding.

As you can imagine, this method isn't accurate. At best, you're going to be within a half frame of the target on-average. With frame sizes being 576 samples, this is pretty accurate. However, there are problems with calculating the needle drop point in the first place. The most common issue is that ID3 tags and such add size to the file, throwing off the size calculations. A more severe issue is a variable bitrate (VBR) file. If you have music encoded with VBR, and the beginning of the track is silent-ish or otherwise easy to encode, the beginning might be 32 kbps whereas one second in might be 320 kbps. A 10x error in calculating the time length of the file!

The second method is to decode the whole file to raw PCM samples. This means you can guarantee sample-accurate seeking, but you must decode at least up to the seek point. If you want a proper time length for the full track, you must decode the whole file. Some 20 years ago, this was painfully slow. Seeking into a track would take almost as long as listening to the track to the point you were seeking to! These days, for short files, you can probably decode them so fast that it doesn't matter so much.

TL;DR; If you must have sample-accurate seeking, decode the files first before putting them in your player, but understand the performance penalty first before deciding this tradeoff.