Failure when filtering string list with re.match

selected_files = filter(regex.match, files)

re.match('regex') equals to re.search('^regex') or text.startswith('regex') but regex version. It only checks if the string starts with the regex.

So, use re.search() instead:

import re

files = [ '/a/b/c/la_seg_x005_y003.png',
          '/a/b/c/la_seg_x005_y003.npy',
          '/a/b/c/la_seg_x004_y003.png',
          '/a/b/c/la_seg_x004_y003.npy',
          '/a/b/c/la_seg_x003_y003.png',
          '/a/b/c/la_seg_x003_y003.npy', ]

regex = re.compile(r'_x\d+_y\d+\.npy')

selected_files = list(filter(regex.search, files))
# The list call is only required in Python 3, since filter was changed to return a generator
print(selected_files)

Output:

['/a/b/c/la_seg_x005_y003.npy',
 '/a/b/c/la_seg_x004_y003.npy',
 '/a/b/c/la_seg_x003_y003.npy']

And if you just want to get all of the .npy files, str.endswith() would be a better choice:

files = [ '/a/b/c/la_seg_x005_y003.png',
          '/a/b/c/la_seg_x005_y003.npy',
          '/a/b/c/la_seg_x004_y003.png',
          '/a/b/c/la_seg_x004_y003.npy',
          '/a/b/c/la_seg_x003_y003.png',
          '/a/b/c/la_seg_x003_y003.npy', ]


selected_files = list(filter(lambda x: x.endswith('.npy'), files))

print(selected_files)

Just use search- since match starts matching from the beginning to end (i.e. entire) of string and search matches anywhere in the string.

import re

files = [ '/a/b/c/la_seg_x005_y003.png',
          '/a/b/c/la_seg_x005_y003.npy',
          '/a/b/c/la_seg_x004_y003.png',
          '/a/b/c/la_seg_x004_y003.npy',
          '/a/b/c/la_seg_x003_y003.png',
          '/a/b/c/la_seg_x003_y003.npy', ]

regex = re.compile(r'_x\d+_y\d+\.npy')

selected_files = filter(regex.search, files)
print(selected_files)

Output-

['/a/b/c/la_seg_x005_y003.npy', '/a/b/c/la_seg_x004_y003.npy', '/a/b/c/la_seg_x003_y003.npy']

If you match, the pattern must cover the entire input. Either extend you regular expression:

regex = re.compile(r'.*_x\d+_y\d+\.npy')

Which would match:

['/a/b/c/la_seg_x005_y003.npy',
 '/a/b/c/la_seg_x004_y003.npy',
 '/a/b/c/la_seg_x003_y003.npy']

Or use re.search, which

scans through string looking for the first location where the regular expression pattern produces a match [...]


re.match() looks for a match at the beginning of the string. You can use re.search() instead.

Tags:

Python

Regex