How do regular expressions differ from wildcards used to filter files

Shell file name globbing and regular expressions use some of the same characters, and they have similar purposes, but you're right, they aren't compatible. File name globbing is a much less powerful system.

In file name globbing:

  • * means "zero or more characters"

  • ? means "any single character"

But in regexes, you have to use .* to mean "zero or more characters", and . means "any single character." A ? means something quite different in regexes: zero or one instance of the preceding RE element.

Square brackets ([]) appear to work the same in both systems on the system I'm typing this on, for simple cases at least. This includes things like POSIX character classes (e.g. [:alpha:]). That said, if you need your commands to work on many different system types, I recommend against using anything beyond elementary things like lists of characters (e.g. [abeq]) and maybe character ranges (e.g. [a-c]).

These differences mean the two systems are only directly interchangeable for simple cases. If you need regex matching of file names, you need to do it another way. find -regex is one option. (Notice that there is also find -name, by the way, which uses glob syntax.)


Answering to the question expressed in the original title:

Why do regular expressions differ from that used to filter files?

File name expansion predates regular expressions, already existed with most operating systems (wildcard/joker characters) and is much simpler and intuitive than the latter.

While *.txt is easily understandable by casual users, the analogous .*\.txt is something more targeted to experienced users/programmers, not to mention ^.*\.txt$ ...

Tags:

Grep

Ls