Variable-length replacement with `re.sub()`

With some help from lookahead/lookbehind it is possible to replace by char:

>>> re.sub("(=(?===)|(?<===)=|(?<==)=(?==))", "-", "=== == ======= asdlkfj")
... '--- == ------- asdlkfj'

Using re.sub, this uses some deceptive lookahead trickery and works assuming your pattern-to-replace is always followed by a newline '\n'.

print(re.sub('=(?=={2}|=?\n)', '-',  s))
def f(a, b):
    '''
    Example
    -------
    >>> from x import y
    '''
    return a == b

Details
"Replace an equal sign if it is succeeded by two equal signs or an optional equal sign and newline."

=        # equal sign if
(?=={2}  # lookahead
|        # regex OR
=?       # optional equal sign
\n       # newline
)

It's possible, but not advisable.

The way re.sub works is that it finds a complete match and then it replaces it. It doesn't replace each capture group separately, so things like re.sub(r'(=){3,}', '-', s) won't work - that'll replace the entire match with a dash, not each occurence of the = character.

>>> re.sub(r'(=){3,}', '-', '=== ===')
'- -'

So if you want to avoid a lambda, you have to write a regex that matches individual = characters - but only if there's at least 3 of them. This is, of course, much more difficult than simply matching 3 or more = characters with the simple pattern ={3,}. It requires some use of lookarounds and looks like this:

(?<===)=|(?<==)=(?==)|=(?===)

This does what you want:

>>> re.sub(r'(?<===)=|(?<==)=(?==)|=(?===)', '-', '= == === ======')
'= == --- ------'

But it's clearly much less readable than the original lambda solution.

Tags:

Python

Regex