Trying to find files that contain only NULs, but getting some others

In short, what is happening here is that grep is trying to interpret your file as Unicode data. The sequence 0xFF, 0xFE is a Byte Order Marker for UTF-16.

(In my testing, even other sequences involving two 0xFF's or two 0xFE's etc. would still not match the '[^\x00]' regex, since even when trying to do UTF-8 these would be considered non-characters.)

Using a locale that doesn't use Unicode for character types should fix this, which you can accomplish by setting the LC_CTYPE environment variable. Use the C locale to force ASCII encoding (so no Unicode enabled):

LC_CTYPE=C grep -RLP '[^\x00]' .

UPDATE: As pointed out by @steeldriver, grep still acts on a line-by-line basis, so files containing NUL bytes and newlines will still match.

@DavidFoerster's solution using grep's -z does a good job of solving this problem, using the NUL bytes as separators does the trick.

Alternatively, I came up with a short Python 3 script (allzeroes.py) to check whether the file's contents are all zeroes:

#!/usr/bin/python3
import sys
assert len(sys.argv) == 2
with open(sys.argv[1], 'rb') as f:
    for block in iter(lambda: f.read(4096), b''):
        if any(block):
            sys.exit(1)

Which you can use in a find to locate all matches recursively:

$ find . -type f -exec allzeroes.py {} \; -print

I hope that helps.


You can abuse grep’s alternative null-terminated line mode and thus search for files that contain only empty lines:

grep -L -z -e . ...

Replace ... with the file set that you want to scan (here: -R .).

Explanation

  • -z, --null-data – Treat the input as a set of lines, each terminated by a zero byte (the ASCII NUL character) instead of a newline.1
  • -e . – Use . as the search pattern, i. e. match any character.
  • -L, --files-without-match – Suppress normal output; instead print the name of each input file from which no output would normally have been printed. The scanning will stop on the first match.1

Test case

Set-up:

: > empty
truncate -s 100 zero
printf '%s\0' foo bar > foobar

Run test:

$ grep -L -z -e . empty zero foobar
empty
zero

1 From the grep(1) manual page.