Non-greedy match with SED regex (emulate perl's .*?)

Sed regexes match the longest match. Sed has no equivalent of non-greedy.

What we want to do is match

  1. AB,
    followed by
  2. any amount of anything other than AC,
    followed by
  3. AC

Unfortunately, sed can’t do #2 — at least not for a multi-character regular expression.  Of course, for a single-character regular expression such as @ (or even [123]), we can do [^@]* or [^123]*.  And so we can work around sed’s limitations by changing all occurrences of AC to @ and then searching for

  1. AB,
    followed by
  2. any number of anything other than @,
    followed by
  3. @

like this:

sed 's/AC/@/g; s/AB[^@]*@/XXX/; s/@/AC/g'

The last part changes unmatched instances of @ back to AC.

But this is a reckless approach because the input could already contain @ characters. So, by matching them, we could get false positives.  However, since no shell variable will ever have a NUL (\x00) character in it, NUL is likely a good character to use in the above work-around instead of @:

$ echo 'ssABteAstACABnnACss' | sed 's/AC/\x00/g; s/AB[^\x00]*\x00/XXX/; s/\x00/AC/g'
ssXXXABnnACss

The use of NUL requires GNU sed. (To make sure that GNU features are enabled, the user must not have set the shell variable POSIXLY_CORRECT.)

If you are using sed with GNU's -z flag to handle NUL-separated input, such as the output of find ... -print0, then NUL will not be in the pattern space and NUL is a good choice for the substitution here.

Although NUL cannot be in a bash variable it is possible to include it in a printf command. If your input string can contain any character at all, including NUL, then see Stéphane Chazelas' answer which adds a clever escaping method.


Some sed implementations have support for that. ssed has a PCRE mode:

ssed -R 's/AB.*?AC/XXX/'

AT&T ast sed supports the *? operator as a non-greedy version of * in its extended (with -E) and augmented (with -A regexps).

sed -E 's/AB.*?AC/XXX/'
sed -A 's/AB.*?AC/XXX/'

In that implementation and those -E/-A modes, more generally, perl-like regexps can be used inside (?P:perl-like regexp here), though as seen above, it's not necessary for the *? operator.

Its augmented regexps also have conjunction and negation operators:

sed -A 's/AB(.*&(.*AC.*)!)AC/XXX/'

Portably, you can use this technique: replace the end string (here AC) with a single character that doesn't occur in either the beginning or end string (like : here) so you can do s/AB[^:]*://, and in case that character may appear in the input, use an escaping mechanism that doesn't clash with the begin and end strings.

An example:

sed 's/_/_u/g; # use _ as the escape character, escape it
     s/:/_c/g; # escape our replacement character
     s/AC/:/g; # replace the end string
     s/AB[^:]*:/XXX/; # actual replacement
     s/:/AC/g; # restore the remaining end strings
     s/_c/:/g; # revert escaping
     s/_u/_/g'

With GNU sed, an approach is to use newline as the replacement character. Because sed processes one line at a time, newline never occurs in the pattern space, so one can do:

sed 's/AC/\n/g;s/AB[^\n]*\n/XXX/;s/\n/AC/g'

That generally doesn't work with other sed implementations because they don't support [^\n]. With GNU sed you have to make sure that POSIX compatibility is not enabled (like with the POSIXLY_CORRECT environment variable).


sed - non greedy matching by Christoph Sieghart

The trick to get non greedy matching in sed is to match all characters excluding the one that terminates the match. I know, a no-brainer, but I wasted precious minutes on it and shell scripts should be, after all, quick and easy. So in case somebody else might need it:

Greedy matching

% echo "<b>foo</b>bar" | sed 's/<.*>//g'
bar

Non greedy matching

% echo "<b>foo</b>bar" | sed 's/<[^>]*>//g'
foobar