connect by clause in regex_substr

The "abuse" (as Colin 't Hart put it) of connected by has a good purpose here: by using REGEXP_SUBSTR you can extract only one of the 4 matches (23,34,45,56): the regex [^,]+ matches any character sequence in the string which does not contain a comma.

If you'll try running:

SELECT REGEXP_SUBSTR ('23,34,45,56','[^,]+') as "token" 
FROM   DUAL

you'll get 23.

and if you'll try running:

SELECT REGEXP_SUBSTR ('23,34,45,56','[^,]+',1,1) as "token"
FROM   DUAL

you'll also get 23 only that now we also set two additional parameters: start looking in position 1 (which is the default), and return the 1st occurrence.

Now lets run:

SELECT REGEXP_SUBSTR ('23,34,45,56','[^,]+',1,2) as "token"
FROM   DUAL

this time we'll get 34 (2nd occurrence) and using 3 as the last parameter will return 45 and so on.

The use of recursive connected by along with level makes sure you'll receive all the relevant results (not necessarily in the original order though!):

SELECT DISTINCT REGEXP_SUBSTR ('23,34,45,56','[^,]+',1,LEVEL) as "token"
FROM   DUAL
CONNECT BY REGEXP_SUBSTR ('23,34,45,56','[^,]+',1,LEVEL) IS NOT NULL
order by 1

will return:

TOKEN
23
34
45
56

which not only contains all 4 results, but also breaks it into separate rows in the resultset!

If you'll fiddle with it - it might give you a clearer view of the subject.


connect by has nothing to do with regex_substr:

  • The first is to perform a hierarchical query, see http://docs.oracle.com/cd/B19306_01/server.102/b14200/queries003.htm

  • The second is to get a substring using regular expressions.

This query "abuses" the connect by functionality to generate rows in a query on dual. As long as the expression passed to connect by is true, it will generate a new row and increase the value of the pseudo column LEVEL.

Then LEVEL is passed to regex_substr to get the nth value when applying the regular expression.