Convert glob to `find`

If the problem is that you get an argument-list-is-too-long error, use a loop, or a shell built-in. While command glob-that-matches-too-much can error out, for f in glob-that-matches-too-much does not, so you can just do:

for f in foo*bar/quux[A-Z]{.bak,}/pic[0-9][0-9][0-9][0-9]?.jpg
do
    something "$f"
done

The loop might be excruciatingly slow, but it should work.

Or:

printf "%s\0" foo*bar/quux[A-Z]{.bak,}/pic[0-9][0-9][0-9][0-9]?.jpg |
  xargs -r0 something

(printf being builtin in most shells, the above works around the limitation of the execve() system call)

$ cat /usr/share/**/* > /dev/null
zsh: argument list too long: cat
$ printf "%s\n" /usr/share/**/* | wc -l
165606

Also works with bash. I'm not sure exactly where this is documented though.


Both Vim's glob2regpat() and Python's fnmatch.translate() can convert globs to regexes, but both also use .* for *, matching across /.


find (for the -name/-path standard predicates) uses wildcard patterns just like globs (note that {a,b} is not a glob operator; after expansion, you get two globs). The main difference is the handling of slashes (and dot files and dirs not being treated specially in find). * in globs won't span several directories. */*/* will cause up to 2 levels of directories to be listed. Adding a -path './*/*/*' will match any files that are at least 3 levels deep and won't stop find from listing the contents of any directory at any depth.

For that particular

./foo*bar/quux[A-Z]{.bak,}/pic[0-9][0-9][0-9][0-9]?.jpg

couple of globs, it's easy to translate, you're wanting directories at depth 3, so you can use:

find . -mindepth 3 -maxdepth 3 \
       \( -path './foo*bar/quux[A-Z].bak/pic[0-9][0-9][0-9][0-9]?.jpg' -o \
          -path './foo*bar/quux[A-Z]/pic[0-9][0-9][0-9][0-9]?.jpg' \) \
       -exec cmd {} +

(or -depth 3 with some find implementations). Or POSIXly:

find . -path './*/*/*' -prune \
       \( -path './foo*bar/quux[A-Z].bak/pic[0-9][0-9][0-9][0-9]?.jpg' -o \
          -path './foo*bar/quux[A-Z]/pic[0-9][0-9][0-9][0-9]?.jpg' \) \
       -exec cmd {} +

Which would guarantee that those * and ? could not match / characters.

(find, contrary to globs would read the content of directories other than foo*bar ones in the current directory¹, and not sort the list of files. But if we leave aside the problem that what is matched by [A-Z] or the behaviour of */? with regards to invalid characters is unspecified, you'd get the same list of files).

But in any case, as @muru has shown, there's no need to resort to find if it's just for splitting the list of files into several runs to work around the limit of the execve() system call. Some shells like zsh (with zargs) or ksh93 (with command -x) even have builtin support for that.

With zsh (whose globs also have the equivalent of -type f and most other find predicates), for instance:

autoload zargs # if not already in ~/.zshrc
zargs ./foo*bar/quux[A-Z](|.bak)/pic[0-9][0-9][0-9][0-9]?.jpg(.) -- cmd

((|.bak) is a glob operator contrary to {,.bak}, the (.) glob qualifier is the equivalent of find's -type f, add oN in there to skip the sorting like with find, D to include dot-files (doesn't apply to this glob))


¹ For find to crawl the directory tree like globs would, you'd need something like:

find . ! -name . \( \
  \( -path './*/*' -o -name 'foo*bar' -o -prune \) \
  -path './*/*/*' -prune -name 'pic[0-9][0-9][0-9][0-9]?.jpg' -exec cmd {} + -o \
  \( ! -path './*/*' -o -name 'quux[A-Z]' -o -name 'quux[A-Z].bak' -o -prune \) \)

That is prune all directories at level 1 except the foo*bar ones, and all at level 2 except the quux[A-Z] or quux[A-Z].bak ones, and then select the pic... ones at level 3 (and prune all directories at that level).


You could write a regex for find matching your requirements:

find . -regextype egrep -regex './foo[^/]*bar/quux[A-Z](\.bak)?/pic[0-9][0-9][0-9][0-9][^/]?\.jpg'

Tags:

Find

Wildcards