RegEx for matching all chars except some special chars and ":)"

This is a tricky question, because you want to remove all symbols except for a certain whitelist. In addition, some of the symbols on the whitelist actually consist of two characters:

:)
:(

To handle this, we can first spare both colon : and parentheses, then selectively remove either one should it not be part of a smiley or frown face:

input = "this is, a (placeholder text). I wanna remove symbols like: ! and ? but keep @ & # & :)"
output = re.sub(r'[^\w\s:()@&#]|:(?![()])|(?<!:)[()]', '', input)
print(output)

this is a placeholder text I wanna remove symbols like  and  but keep @ & # & :)

The regex character class I used was:

[^\w\s:()@&#]

This will match any character which is not a word or whitespace character. It also spares your whitelist from the replacement. In the other two parts of the alternation, we then override this logic, by removing colon and parentheses should they not be part of a smiley face.


As others have shown, it is possible to write a regex that will succeed the way you have framed the problem. But this is a case where it's much simpler to write a regex to match what you want to keep. Then just join those parts together.

import re

rgx = re.compile(r'\w|\s|@|&|#|:\)|:\(')
orig = 'Blah!! Blah.... ### .... #@:):):) @@ Blah! Blah??? :):)#'
new = ''.join(rgx.findall(orig))
print(new)

You can try the following regex (for Python).

(\w|:\)|:\(|#|@| )

With this fake sentence:

"I want to remove certain characters but want to keep certain ones like #random, and :) and :( and something like @.

If it is found in another sentence, :), do search it :( "

It finds all the characters you mentioned in the question. You can use it to find the string that contains it and write rules to carefully remove other punctuation from this string.