Is it a coincidence that the first 4 bytes of a PGP/GPG file are ellipsis, smile, female sign and a heart?

Yes, it's a coincidence that the first bytes appear to you as these symbols. They are part of the OpenPGP message format specification (RFC 4880) and vary depending on the packet properties.

Let's create a file containing only those bytes and try to read it as a GPG message:

$ echo "\x85\x02\x0c\x03" > foo.gpg && gpg --list-packets foo.gpg
# off=0 ctb=85 tag=1 hlen=3 plen=524
:pubkey enc packet: version 3, algo 255, keyid 0AFFFFFFFFFFFFFF
    unsupported algorithm 255
  • The first byte (0x85 = 0b10000101) is the cipher type byte (CTB) that describes the packet type. We can break it up as follows:
    1: CTB indicator bit
    0: old packet format (see RFC 1991)
    0001: public-key-encrypted packet
    01: packet-length field is 2 bytes long

  • The second and third bytes denote the packet length (0x020c = 524).

  • The fourth byte (0x03) means it's in the version 3 packet format.

As you can see, these bytes are meaningful and not magic number constants that you can remove without losing information. If you cut them off, you are corrupting the GPG packet and it will require some guesswork to reconstruct it.


The bytes are shown as smileys and hearts because that's how your (probably DOS) terminal displays non-printable control characters. In character sets that originate from code page 437, low bytes outside the printable ASCII range are traditionally represented as icons. Here's the original CP437 on an IBM PC:

enter image description here

(Image source)


As a general principle, well-designed binary file formats¹ will have their first few bytes be a magic number identifying the format. ELF executables' first four bytes are always 7f 45 4c 46, PNG files' first eight bytes are always 89 50 4e 47 0d 0a 1a 0a, and so on. Well-designed encrypted file formats will always follow that magic number with an unencrypted "header" that reveals the encryption algorithm, the length of the encrypted data, things like that.

This is not normally considered a security vulnerability, because of Kerckhoffs' principle, which says that a cryptosystem needs to be secure even if the attacker knows everything that the file header can tell them (such as the algorithm).

It's possible to design a file format, or a protocol, all of whose bytes are indistinguishable from randomness unless you already know the decryption key, but it's surprisingly difficult (did you know that encrypting the expected length of encrypted data can introduce a vulnerability?) and it doesn't actually gain you anything. A file that's completely indistinguishable from the output of cat /dev/random will be just as suspicious to the secret police as an obviously GPG-encrypted file. Perhaps more suspicious, even, since there are all kinds of innocuous reasons to encrypt files.

If you are worried about an attacker merely learning that you are using encryption to communicate with someone, you need steganography, which conceals secret information within ordinary-looking, unencrypted files. Be aware that the state of the art in steganography is not nearly as sophisticated as the state of the art in cryptography; last I checked, all known approaches were breakable by a determined adversary. (If the secret police's first impression is "oh, this is a memory card full of vacation photos", they might not bother digging any deeper…unless they already have a reason to suspect you.)


¹ I have no opinion about whether the GPG file format is well-designed.

Tags:

Gnupg

Pgp