Golang Regexp Named Groups and Submatches

Group names and positions are fixed:

re := regexp.MustCompile("(?P<first>[a-zA-Z]+) ")
groupNames := re.SubexpNames()
for matchNum, match := range re.FindAllStringSubmatch("Alan Turing ", -1) {
    for groupIdx, group := range match {
        name := groupNames[groupIdx]
        if name == "" {
            name = "*"
        }
        fmt.Printf("#%d text: '%s', group: '%s'\n", matchNum, group, name)
    }
}

That might be included in Go 1.14 (Q1 2020, not yet confirmed).
See "proposal: regexp: add (*Regexp).SubexpIndex #32420". Update: it has been included with commit 782fcb4 in Go 1.15 (August 2020).

// SubexpIndex returns the index of the first subexpression with the given name,
// or else -1 if there is no subexpression with that name.
//
// Note that multiple subexpressions can be written using the same name, as in
// (?P<bob>a+)(?P<bob>b+), which declares two subexpressions named "bob".
// In this case SubexpIndex returns the index of the leftmost such subexpression
// in the regular expression.
func (*Regexp) SubexpIndex(name string) int

This is discussed in CL 187919.

re := regexp.MustCompile(`(?P<first>[a-zA-Z]+) (?P<last>[a-zA-Z]+)`)
fmt.Println(re.MatchString("Alan Turing"))
matches := re.FindStringSubmatch("Alan Turing")
lastIndex := re.SubexpIndex("last")
fmt.Printf("last => %d\n", lastIndex)
fmt.Println(matches[lastIndex])

// Output:
// true
// last => 2
// Turing

Tags:

Regex

Go