How to grep a text file which contains some binary data?

grep -a

It can't get simpler than that.


You can use "strings" to extract strings from a binary file, for example

strings binary.file | grep foo

One way is to simply treat binary files as text anyway, with grep --text but this may well result in binary information being sent to your terminal. That's not really a good idea if you're running a terminal that interprets the output stream (such as VT/DEC or many others).

Alternatively, you can send your file through tr with the following command:

tr '[\000-\011\013-\037\177-\377]' '.' <test.log | grep whatever

This will change anything less than a space character (except newline) and anything greater than 126, into a . character, leaving only the printables.


If you want every "illegal" character replaced by a different one, you can use something like the following C program, a classic standard input filter:

#include<stdio.h>
int main (void) {
    int ch;
    while ((ch = getchar()) != EOF) {
        if ((ch == '\n') || ((ch >= ' ') && (ch <= '~'))) {
            putchar (ch);
        } else {
            printf ("{{%02x}}", ch);
        }
    }
    return 0;
}

This will give you {{NN}}, where NN is the hex code for the character. You can simply adjust the printf for whatever style of output you want.

You can see that program in action here, where it:

pax$ printf 'Hello,\tBob\nGoodbye, Bob\n' | ./filterProg
Hello,{{09}}Bob
Goodbye, Bob

You could run the data file through cat -v, e.g

$ cat -v tmp/test.log | grep re
line1 re ^@^M
line3 re^M

which could be then further post-processed to remove the junk; this is most analogous to your query about using tr for the task.

-v simply tells cat to display non-printing characters.

Tags:

Shell