Removing any single letter on a string in python

import re
text = "z 23rwqw a 34qf34 h 343fsdfd"
print re.sub(r'(?:^| )\w(?:$| )', ' ', text).strip()

or

tmp = re.sub(r'\b\w\b', ' ', input)
print re.sub(r'\s{2,}', ' ', tmp).strip()

I had a similar issue and came up with the following regex solution:

import re
pattern = r"((?<=^)|(?<= )).((?=$)|(?= ))"
text = "z 23rwqw a 34qf34 h 343fsdfd"
print(re.sub("\s+", " ", re.sub(pattern, '', text).strip()))
#23rwqw 34qf34 343fsdfd

Explanation

  • (?<=^) and (?<= ) are look-behinds for start of string and space, respectively. Match either of these conditions using | (or).
  • . matches any single character
  • ((?=$)|(?= )) is similar to the first bullet point, except it's a look-ahead for either the end of the string or a space.

Finally call re.sub("\s+", " ", my_string) to condense multiple spaces with a single space.


>>> ' '.join( [w for w in input.split() if len(w)>1] )
'23rwqw 34qf34 343fsdfd'

I hope there's a neater regex way than this, but:

>>> import re
>>> text = 'z 23rwqw a 34qf34 h 343fsdfd'

>>> re.sub('(\\b[A-Za-z] \\b|\\b [A-Za-z]\\b)', '', text)
'23rwqw 34qf34 343fsdfd'

It's a word boundary, a single letter, a space, and a word boundary. It's doubled up so it can match a single character at the start or end of the string z_ and _z leaving no space, and a character in the middle _z_ leaving one space.