Python Regex Engine - "look-behind requires fixed-width pattern" Error

Python lookbehind assertions need to be fixed width, but you can try this:

>>> s = '"It "does "not "make "sense", Well, "Does "it"'
>>> re.sub(r'\b\s*"(?!,|$)', '" "', s)
'"It" "does" "not" "make" "sense", Well, "Does" "it"'

Explanation:

\b      # Start the match at the end of a "word"
\s*     # Match optional whitespace
"       # Match a quote
(?!,|$) # unless it's followed by a comma or end of string

Python re lookbehinds really need to be fixed-width, and when you have alternations in a lookbehind pattern that are of different length, there are several ways to handle this situation:

  • Rewrite the pattern so that you do not have to use alternation (e.g. Tim's above answer using a word boundary, or you might also use an exact equivalent (?<=[^,])"(?!,|$) of your current pattern that requires a char other than a comma before the double quote, or a common pattern to match words enclosed with whitespace, (?<=\s|^)\w+(?=\s|$), can be written as (?<!\S)\w+(?!\S)), or
  • Split the lookbehinds:
    • Positive lookbehinds need to be alternated in a group (e.g. (?<=a|bc) should be rewritten as (?:(?<=a)|(?<=bc)))
    • If the pattern in a lookbehind is an alternation of an anchor with a single char, you can reverse the sign of the lookbehind and use a negated character class with the char inside. E.g. (?<=\s|^) matches either a whitespace or start of a string/line (if re.M is used). So, in Python re, use (?<!\S). The (?<=^|;) will be converted to (?<![^;]). And if you also want to make sure the start of a line is matched, too, add \n to the negated character class, e.g. (?<![^;\n]) (see Python Regex: Match start of line, or semi-colon, or start of string, none capturing group). Note this is not necessary with (?<!\S) as \S does not match a line feed char.
    • Negative lookbehinds can be just concatenated (e.g. (?<!^|,)"(?!,|$) should look like (?<!^)(?<!,)"(?!,|$)).

Or, simply install PyPi regex module using pip install regex (or pip3 install regex) and enjoy infinite width lookbehind.

Tags:

Python

Regex