Why Is Conversion From An MP3 To WAV (PCM) Lossless?

PCM is one of the most used audio codecs; most computers (as well as many devices) use it as their default for audio output/input. All audio heard on this device, whether from a file, or a microphone input, is in PCM, and that is used to regenerate an approximation of the original analog sounds.

When you make a digital audio recording using a mic on a system that is using PCM for audio output/input, the audio stream is PCM. When you losslessly save that recording as WAV, FLAC, WMA Lossless, ALAC, AIFF, and etc, there is no encoding process, the audio stream is just put into the new container file, it's still the PCM stream your system interpreted from the analog real-world sounds.

Lossless formats can be divided into 2 types: compressed, and uncompressed. Uncompressed formats like WAV and AIFF just store the PCM audio stream. Compressed formats like FLAC, ALAC, and WMA Lossless run the stream data through a compression algorithm to save space, like a zip archive. The data isn't changed, it's just stored more efficiently; it's still encoded as PCM.

If you then take the lossless export, and convert it to a 128kbps MP3, there is re-encoding involved. Encoding means organizing the audio stream data in a new way; this is a lossless process. However, there are no codecs to my knowledge that do this, because it would be pointless. The file would sound the same, playback would require the audio to be decoded which means more system resources are needed, and it would be the same size as the original file. Therefore, codecs like MP3, WMA Lossy, AAC, Vorbis (OGG), and etc also perform additional operations. A common operation being to discard data deemed to be of lesser importance. Discarding data results in a smaller file size.

The below isn't a perfect analogy, but it gets the point across I think.

Let's say that a "I hate you Sarah!" written on a piece of paper represents the PCM audio stream in the WAV file you losslessly exported after making a recording.

Encoding that data in a different encoding format would be equivalent to jumbling the letters up to say "h Iyae oSr! haatu". The decoding software knows how this codec arranges data so it can unjumble the message.

However, as discussed, formats like MP3 also discard data during the encoding process. So the MP3 message would be more like "h I Sr! h tu", and when decoded (unjumbled), it would be like this "I h t u S r h!". If you read this back, you still get the message, but excluding letters does alter the sound a bit. The more you exclude, the worse it gets, until you reach a point where you can no longer understand the original message.

The MP3 audio stream represents the original PCM audio stream. When you play the MP3, it's being decoded back to PCM, but the discarded data doesn't return obviously. It sounds plausible in this example to add back in the missing letters, but remember, computers aren't as smart as us, and this is a very very very simple example.

If you were to convert the MP3 to a WAV file, technically you are decoding to PCM, and then saving the PCM stream, which is why the WAV is bigger than the MP3. There is no data loss because the MP3 stream was always just a representation for a PCM stream. The codec specification tells the decoder how to decode the MP3 back to PCM format. If you were to convert the MP3 to a 128kbps AAC, what is actually happening is the MP3 is decoded to PCM, and then the PCM stream is encoded as AAC. The encoding process from PCM to AAC results in further data loss, because "I h t u S r h!" is treated as the original message. Notice the letters aren't squished together. When the AAC file is made, it won't know spaces from excluded letters aren't part of the message when trying to figure out what is safest to discard, which is why re-encoding data at the same bitrate results in quality degradation.