How to interpret an octal or hex dump of a binary file?

One of the first things I had to memorise for computer science was Data + Interpretation = Useful Information. A corollary of this is that if you're missing Data or Interpretation, you have nothing. The data itself can't tell you how to interpret it. (you can have metadata which tells you this, but then you need to know how to interpret the metadata too)

Under the circumstances, I suggest trying this:

file filename

If it comes up with something like:

filename: data

and you have absolutely no idea what the format is, what program it's from, what its use is, or anything about the contents of filename, then you should probably give up.

Octal Dump Output

od (octal dump) produces a hybrid text-and-octal dump. Non-numbers are either printable characters such as o, s, f, etc, or non-printable characters such as \0 (ASCII 0, NUL), or \a (ASCII 7, BEL), or numbers in base 8, with the standard C prefix 0 (e.g 032 = 26 in decimal). Your file is interpreted as a stream of 8-bit bytes.

Hex Dump Output

hexdump produces a traditional hex dump, with one column listing 8-bit bytes in hexadecimal, the other showing what ASCII characters these bytes correspond ot, if any (if the byte value is a non-printable ASCII character, or not an ASCII character at all, . is shown at that position). Again, your file is interpreted as a stream of 8-bit bytes.

Integers

If your file comprises 100% binary integers (i.e. is a headerless, uniform, one-dimensional array of some sort of integer representation), then you have to answer to yourself all of these questions:

  • Are they ‘proper’ binary, or binary-coded decimal (BCD)? (probably binary)
  • How wide are they in bits?
  • If their width isn't a multiple of 8, are they bit-packed like SMS messages or Base64, or byte-aligned?
  • If their width is 8 bits or more, what is the byte order? Is it Big Endian, Little Endian, or one of the other, rarer sorts?
  • Are the integers signed, or unsigned?
  • If they're signed, are they represented in two's complement (more likely), or one's complement, or something rare and weird?

There are probably more I'm forgetting right now.

And this is just for a single dimensional uniform array of integers, coming from a common, modern architecture of computer. If your data has any sort of complexity, things are going to get so hairy it'll quickly become easier to win the lottery than to just guess the format. And you have to guess (an educated guess, but a guess), unless you know the format.


There are lots of ways of storing numbers - ASCII (which can have locale specific variants, such as using ',' to separate fractional part OR as a thousands grouping), binary integer (variable number of bits)/float/double (all of which may vary depending on endian architecture and whether software producing the file formalises the representation), BCD (uncompressed, packed, fixed point and other variants), Bi-quinary coded decimal ...

There is no standard.

Tags:

Binary

Hexdump

Od