Why is the size of my email about a third bigger than the size of its attached files?

Your data was 17 MiB. There are 1024 KiB in an MiB. There are 1024 B in a KiB. There are 8 bits in a byte. So that's 142,606,336 bits.

Base 64 encoding encodes every six bits as a separate byte. So we need about 23,767,722 bytes. Dividing by 1024 twice gets us 22.67 MiB. So that's where the 22 MiB comes from.

Email is a pretty old technology and doesn't assume an 8-bit clean pipe.


Why is the email bigger?

Because the data is encoded in base64 which encodes groups of up to three bytes as groups of four printable ASCII characters. Typically, these groups of printable characters are then split into lines.

The result is that the encoded data is just over 1⅓ times the size of the original data.

Why is base64 used?

Email has a long history and was originally designed to carry text. Only byte values representing ASCII printable characters could reliably pass through the wide variety of email systems on the planet and een some of those could be problematic.

So MIME devised two schemes for encoding other data as ASCII text - "quoted-printable" designed for mostly ASCII text with a few other bits, and "BASE64" for arbitrary binary data.

There have been extensions to the SMTP protocol to try and remove these restrictions. First, 8BITMIME in 1994, which allowed higher octet values but unfortunately didn't remove limits related to line lengths and line endings, so was not suitable for arbitrary binary data; and then BINARYMIME in 1995, which allowed transfer of messages containing arbitrary binary data.

However, these standards have not seen widespread adoption. One problem is, what happens if one hop in the mail chain supports them but the next hop doesn't? The mail server then can't send the mail on as-is, it must either reject it as undeliverable and bounce it (which is unlikely to be acceptable to users), or convert it (which requires significant extra code in the mail server). Conversion is made especially painful by MIME rules regarding not using content transfer encodings on multipart types.