Why don't you see binary code when you open a binary file with text editor?

Binary and text data aren't separated: They are simply data. It depends on the interpretation that makes them one or the other. If you open binary data (such as an image file) in a text editor, much of it won't make sense, because it does not fit your chosen interpretation (as text).

What you call text is a subset of the possible file contents: Data that in a given character set translates to readable characters.

For example, in ASCII, you can see that, of 128 "allowed" values, only about half are letters and numbers, 30 are punctuation, and the rest are control characters. The latter group just isn't used a lot in text files, and they have no really good textual representation. Some of them are Tab and Newline characters, where text editors already need to get creative in displaying them.

Some text editors have options to explicitly display whitespace. Then they'll actually be drawn as characters, in addition to their regular formatting behavior (which is also just the interpretation of these characters).

Pure ASCII only interprets 128 values. The bytes used to store this information have 256 possible values each, so half of the possible values aren't allowed in ASCII. Those are e.g. used in region-specific character sets, such as Latin 1, but in ASCII, they're undefined. They have no useful representation in a text viewer that can only handle ASCII.


Binary data is not usually interpreted as text. So in these files, all possible byte values are commonly found. Everything else would be wasteful (and that's a reason you can compress text very well). Image file formats are complicated, and you don't usually view them as text, so they don't need to be readable.

As there is no common data interpretation (character set) that maps all possible values to readable characters, and since that wouldn't make lot of sense anyway (as it's not readable text), major parts are displayed as gibberish.


A hex editor chooses a different representation for the data: It displays each byte as two hexadecimal digits. It's just a different representation, and one with an easily human-readable character set: All 256 possible byte values can be represented as two hex digits.

Since there's an easy mapping of binary data to hex and vice versa (4 binary digits to/from one hexadecimal digit), and binary contains very little information per digit, hexadecimal is generally the preferred way for humans to read binary, unless there are specific reasons to prefer a different representation.


Some text editors might have a hex editor mode and some heuristic that tried to determine whether a file is text or binary, and automatically select one mode or the other. But this can be difficult to get right and it's not a specific property of the file that says whether it's one kind or the other.


Some FTP clients ask you to specify which file endings are used for text data. These programs will then change the file contents to match the OS of the machine you're connected to, as Windows uses a different line ending character sequence (CR/LF) than Linux and Unix (including Mac OS X; LF).


Because you've opened it in a text editor, not a binary editor.


It's all to do with context and interpretation. What's in your computer is patterns of high and low voltage, or magnetised regions of a disk, that only gain meaning when we decide how we want to interpret them.

Under different circumstances, the pattern low-high-low-low-low-low-low-high might mean the number 65, a capital letter 'A', a sky-blue colour, that a customer ordered coffee, the date 'March 6th' or anything at all, really.

When you open your image file in a graphics program, it knows to interpret it as an image, knows which patterns indicate the image format, which patterns indicate the image size and so on.

When you open your image file in a text editor, it gets treated as text. This is a very simple format, much closer to what's really going on in the computer, but there is still some interpretation going on. Specifically, nearly every pattern gets interpreted as a particular character, some normal like A-Z, but also some weird characters. A few patterns don't show up as characters but instead are treated as basic formatting: newline, tab.

(The situation is slightly complicated by things such as Unicode and text encodings such as UTF-8 but I won't deal with those here for the sake of simplicity.)

When you have an binary file open in a text editor, take care not to make changes, because almost any change you make will completely disrupt the normal interpretation of the file's contents, that is it will ruin the file and make it unusable.

Tags:

Hexdump