Find recursively all archive files of diverse archive formats and search them for file name patterns

If you want something simpler that the AVFS solution, I wrote a Python script to do it called arkfind. You can actually just do

$ arkfind /path/to/search/ -g "*vacation*jpg"

It'll do this recursively, so you can look at archives inside archives to an arbitrary depth.


(Adapted from How do I recursively grep through compressed archives?)

Install AVFS, a filesystem that provides transparent access inside archives. First run this command once to set up a view of your machine's filesystem in which you can access archives as if they were directories:

mountavfs

After this, if /path/to/archive.zip is a recognized archive, then ~/.avfs/path/to/archive.zip# is a directory that appears to contain the contents of the archive.

find ~/.avfs"$PWD" \( -name '*.7z' -o -name '*.zip' -o -name '*.tar.gz' -o -name '*.tgz' \) \
     -exec sh -c '
                  find "$0#" -name "*vacation*.jpg"
                 ' {} 'Test::Version' \;

Explanations:

  • Mount the AVFS filesystem.
  • Look for archive files in ~/.avfs$PWD, which is the AVFS view of the current directory.
  • For each archive, execute the specified shell snippet (with $0 = archive name and $1 = pattern to search).
  • $0# is the directory view of the archive $0.
  • {\} rather than {} is needed in case the outer find substitutes {} inside -exec ; arguments (some do it, some don't).

Or in zsh ≥4.3:

mountavfs
ls -l ~/.avfs$PWD/**/*.(7z|tgz|tar.gz|zip)(e\''
     reply=($REPLY\#/**/*vacation*.jpg(.N))
'\')

Explanations:

  • ~/.avfs$PWD/**/*.(7z|tgz|tar.gz|zip) matches archives in the AVFS view of the current directory and its subdirectories.
  • PATTERN(e\''CODE'\') applies CODE to each match of PATTERN. The name of the matched file is in $REPLY. Setting the reply array turns the match into a list of names.
  • $REPLY\# is the directory view of the archive.
  • $REPLY\#/**/*vacation*.jpg matches *vacation*.jpg files in the archive.
  • The N glob qualifier makes the pattern expand to an empty list if there is no match.

My usual solution:

find -iname '*.zip' -exec unzip -l {} \; 2>/dev/null | grep '\.zip\|DESIRED_FILE_TO_SEARCH'

Example:

find -iname '*.zip' -exec unzip -l {} \; 2>/dev/null | grep '\.zip\|characterize.txt'

Resuls are like:

foozip1.zip:
foozip2.zip:
foozip3.zip:
    DESIRED_FILE_TO_SEARCH
foozip4.zip:
...

If you want only the zip file with hits on it:

find -iname '*.zip' -exec unzip -l {} \; 2>/dev/null | grep '\.zip\|FILENAME' | grep -B1 'FILENAME'

FILENAME here is used twice, so you can use a variable.

With find you might use PATH/TO/SEARCH

Tags:

Find

Zip

Tar

Rar

7Z