What exactly happens when you use the 'copy /b' command?

The /b flag of the copy command treats the files as binary (i.e., a raw stream of meaningless bytes), and copies them byte for byte instead of the default (or the /a) behavior which treats them as lines of text (with end-of-line characters, end-of-files, etc.)

You can merge text files with either the default text behavior or the binary switch, but pretty much any binary file will not work. You cannot simply copy the bytes from two binary files and expect them to work because binary files usually have headers, metadata, data structures, etc. that define the format of the file. If you do a binary copy, you will simply be copying all the bytes as is which ends up putting these structures in places that they should not be, so when you open them, the parsing function will have trouble and see what is essentially corrupt data. Some program will ignore the parts that don’t make sense and simply show what they can (which allows for stereography to work), but some will throw an error and complain that the file is corrupt. The ability to detect corruption depends on the file-type.

As an example, let’s invent a simplified PDF format:

Byte(s)    Meaning
---------------------

File header:
0-1        # of Pages
2-3        Language
4-5        Font
6-EOF      Data (each page encoded separately)

Page data:
0-1        Page number
2-3        # of characters on page
4-#chars   Letters contained on the page

As you can see, each file will contain a file-level header with some general information, followed by data blocks for each page containing the page data. If you then take two files, each containing one page and merge them as binary files, you will not be creating one two-page file, but instead one corrupt file that starts out with one page, then has a bunch of junk (the file header makes no sense when the program tries to read page two).

The same thing happens for your MP3s. When you combined them like that, the ID3 tags at the start and/or end of the of the second file are retained, and when the player tries to read the next frame, it is expecting audio data, but is finding the header of the second file which does not match the expected format for audio data, so it doesn’t know what to do. Some players will play the header as audio data (which will likely play as static/noise/pops/etc.), some will cut the sound for until the next correct frame, some may stop playing the song altogether, and some may even crash.

The copy command knows nothing about file-types other than plain-text (and even then, only ASCII text), so only plain-text can be combined correctly with it. Binary files must be combined using an editor that knows how to parse and interpret the contents correctly.