mp4 atom - How to discriminate the audio codec? Is it AAC or MP3?

In the 'esds' atom there are a few fields relevant to determining the codec. The first byte of content of the esds atom is the objectTypeIndication (that's the 11th byte from your solution). This field is supposed to indicate the codec used, but there are a few entries used by multiple codecs. MP4RA has a full list of codec values. Here are few that are relevant in this case:

  • 0x40 - MPEG-4 Audio
  • 0x6B - MPEG-1 Audio (MPEG-1 Layers 1, 2, and 3)
  • 0x69 - MPEG-2 Backward Compatible Audio (MPEG-2 Layers 1, 2, and 3)
  • 0x67 - MPEG-2 AAC LC

0x6B and 0x69 denote MPEG-1 and 2 respectively layers 1, 2, and 3. 0x67 denotes MPEG-2 AAC LC but generally is unused in favor of 0x040 (0x66 and 0x68 are also MPEG-2 AAC profiles are seen even less frequently). 0x40 denotes MPEG-4 Audio. MPEG-4 Audio generally is thought of as AAC but there is a whole framework of audio codecs that can go in MPEG-4 Audio including AAC, BSAC, ALS, CELP, and something called MP3On4. MP3On4 is an MP3 variant with some new header information for multichannel.

We can figure out what audio format is actually in the MPEG-4 Audio by looking at the the AudioSpecificConfig. This is the global header for the decoder that exists at byte 13 of the content of the 'esds' atom. At the beginning of the AudioSpecificConfig there is a 5-bit AudioObjectType. A full list can be found on the multimedia wiki (that was linked in your post under the 'MPEG-4 Audio' article: http://wiki.multimedia.cx/index.php?title=MPEG-4_Audio but here are the useful values:

  • 00 - NULL
  • 01 - AAC Main (a deprecated AAC profile from MPEG-2)
  • 02 - AAC LC or backwards compatible HE-AAC (Most realworld AAC falls in one of these cases)
  • 03 - AAC Scalable Sample Rate (rarely used)
  • 03 - AAC LTP (a replacement for AAC Main, rarely used)
  • 05 - HE-AAC explicitly signaled (Non-backward compatible)
  • 22 - ER BSAC (A Korean broadcast codec)
  • 23 - Low Delay AAC
  • 29 - HE-AACv2 explicitly signaled (In one draft this was MP3On4 instead)
  • 31 - ESCAPE (read 6 more bits, add 32)
  • 32 - MP3on4 Layer 1
  • 33 - MP3on4 Layer 2
  • 34 - MP3on4 Layer 3

If you aren't worried about 'MP3On4' mp3 variant nor the other weird MPEG-4 Audio codecs then just use the objectTypeIndication.

In the MPEG specifications these details are spread across 14496-1, -12, -14, and -3. Of these only 14496-12 is freely available: http://standards.iso.org/ittf/PubliclyAvailableStandards/index.html


The format of the esds atom [1] is defined as:

Size 32-bit
Type 32-bit 'esds'
Version: 8-bit, zero.
Flags: 24-bit field, zero.
Elementary Stream Descriptor

The Elementary Stream Descriptor is defined in the relevant MPEG4 documents [2].

Looking at a typical ESDS from MP4A file:

00000033 65736473 00000000 03808080  
22000100 04808080 14401500 00000001
FC170001 FC170580 80800212 08068080
800102

Intepret as

00000033 65736473 = ISO Atom "esds" of length 0x33
00000000 = Version/Flags field (0), meaning tagged Elementary Stream Descriptor follows
03808080 = TAG(3) = Object Descriptor ([2])
22       = length of this OD (which includes the next 2 tags)
  0001   = ES_ID = 1
      00 = flags etc = 0
04808080 = TAG(4) = ES Descriptor ([2]) embedded in above OD
14       = length of this ESD
  40     = MPEG4 Audio (see table for valid types here)
    15   = stream type(6bits)=5 audio, flags(2bits)=1
000000   = 24bit buffer size
0001FC17 = max bitrate (130,071 bps)
0001FC17 = avg bitrate
05808080 = TAG(5) = ASC ([2],[3]) embedded in above OD
02       = length
1208     = ASC (AOT=2 AAC-LC, freq=4 => 44100 Hz, chan=1 => single channel, flen0 => 1024 samples)
06808080 = TAG(6)
01       = length
02       = data

Refs:

  • [1] https://wikileaks.org/sony/docs/05/docs/Apple/qtff.pdf
  • [2] Tags defined in MPEG4-part1 Systems, I believe.
  • [3] ASC is AudioSpecificConfig, see https://wiki.multimedia.cx/index.php/MPEG-4_Audio

Tags:

Audio

Mp4

Codec