The groups() method in regular expressions in Python

This is the most specified regexp, by groups you can see the protocol, filename I forgot the file-ext.

["](?P<protocol>http(?P<secure>s)?://)(?P<fqdn>[a-zA-Z0-9]*(?P<subdomain>(.)[a-zA-Z0-9]*)*)[/](?P<filename>([a-zA-Z.])*)["]

I the response removed because I was.

enter image description here


From the docs:

If a group matches multiple times, only the last match is accessible:

>>> m = re.match(r"(..)+", "a1b2c3")  # Matches 3 times.
>>> m.group(1)                        # Returns only the last match.
'c3'

Your group can only ever match one character, so c is the last match.

You mention that you'd expect to at least see 'abc' - if you want your group to match multiple characters, put the + inside the group:

>>> m = re.match("([abc]+)", "abc")

For re details consult docs. In your case:

group(0) stands for all matched string, hence abc, that is 3 groups a, b and c

group(i) stands for i'th group, and citing documentation

If a group matches multiple times, only the last match is accessible

hence group(1) stands for last match, c

Your + is interpreted as group repetation, if you want repeat [abc] inside group, move + into parentheses:

>>> re.match("([abc])", "abc").groups()
('a',)
>>> re.match("([abc]+)", "abc").groups()
('abc',)

Tags:

Python

Regex