Ruby regular expression non capture group

As mentioned by others, non-capturing groups still count towards the overall match. If you don't want that part in your match use a lookbehind. Rubular example

(?<=id\/number\/)([a-zA-Z0-9]{8})

(?<=pat) - Positive lookbehind assertion: ensures that the preceding characters match pat, but doesn't include those characters in the matched text

Ruby Doc Regexp

Also, the capture group around the id number is unnecessary in this case.


You have:

str = "id/number/2000GXZ2/ref=sr"

r = /
    (?:id\/number\/) # match string in a non-capture group
    ([a-zA-Z0-9]{8}) # match character in character class 8 times, in capture group 1
    /x               # extended/free-spacing regex definition mode

Then (using String#[]):

str[r]
  #=> "id/number/2000GXZ2"

returns the entire match, as it should, not just the contents of capture group 1. There are a few ways to remedy this. Consider first ones that do not use a capture group.

@jacob.m suggested putting the first part in a positive lookbehind (modified slightly from his code):

r = /
    (?<=id\/number\/) # match string in positive lookbehind
    [[:alnum:]]{8}    # match >= 1 alphameric characters
    /x

str[r]
  #=> "2000GXZ2"

An alternative is:

r = /
    id\/number\/   # match string
    \K             # forget everything matched so far
    [[:alnum:]]{8} # match 8 alphanumeric characters
    /x

str[r]
  #=> "2000GXZ2"

\K is especially useful when the match to forget is variable-length, as (in Ruby) positive lookbehinds do not work with variable-length matches.

With both of these approaches, if the part to be matched contains only numbers and capital letters, you may want to use [A-Z0-9]+ instead of [[:alnum:]] (though the latter includes Unicode letters, not just those from the English alphabet). In fact, if all the entries have the form of your example, you might be able to use:

r = /
    \d          # match a digit
    [A-Z0-9]{7} # match >= 0 capital letters or digits
    /x

str[r]
  #=> "2000GXZ2"

The other line of approach is to keep your capture group. One simple way is:

r = /
    id\/number\/     # match string
    ([[:alnum:]]{8}) # match >= 1 alphameric characters in capture group 1
    /x

str =~ r
str[r, 1] #=> "2000GXZ2"

Alternatively, you could use String#sub to replace the entire string with the contents of the capture group:

r = /
    id\/number\/     # match string
    ([[:alnum:]]{8}) # match >= 1 alphameric characters in capture group 1
    .*               # match the remainder of the string
    /x

str.sub(r, '\1')  #=> "2000GXZ2"
str.sub(r, "\\1") #=> "2000GXZ2" 
str.sub(r) { $1 } #=> "2000GXZ2"

Tags:

Ruby

Regex