Oracle 11g get all matched occurrences by a regular expression

This is a little late, but I needed basically the same thing and could not find a good snippet. I needed to search a free text column of a table for some terms and collect them. As this might be useful to another I have included a version based on this question. While REGEXP_SUBSTR only returns one value, Oracle also provides a REGEXP_COUNT to tell you how many matching items are present in a given string, therefore you can join this with a list of indexes to select each as follows (with examples from this query as free text from some 'source_table'):

DEFINE MATCH_EXP = "'(^|\s)[A-Za-z]{2}[0-9]{5,}'"

WITH source_table
     -- Represents some DB table with a 'free_text' column to be checked.
     AS (       ( SELECT 'Txa233141 Ta233141 Ta232 Ta233142 Ta233141 Ta233148'
                             AS free_text FROM dual )
          UNION ( SELECT 'Other stuff PH33399 mixed in OS4456908843 this line'
                             AS free_text FROM dual )
        )
   , source
     -- For some table, select rows of free text and add the number of matches
     -- in the line.
     AS ( SELECT cnt
               , free_text
          FROM ( SELECT RegExp_Count(free_text, &MATCH_EXP) AS cnt
                      , free_text 
                 FROM source_table )
          WHERE cnt > 0 )
   , iota
     -- Index generator
     AS ( SELECT RowNum AS idx
          FROM dual
          CONNECT BY RowNum <= ( SELECT Max(cnt) FROM source ) )
-- Extract the unique 'cnt' matches from each line of 'free_text'.
SELECT UNIQUE
       RegExp_SubStr(s.free_text, &MATCH_EXP, 1, i.idx) AS result
FROM   source s
  JOIN iota i
    ON ( i.idx <= s.cnt )
ORDER BY result ASC
;

It has the advantages of working for any list of selected rows and uses the CONNECT BY minimally (as this can be very slow).


How about adding a function that will loop through and return all of the values?

create or replace function regexp_substr_mr (
  p_data clob,
  p_re varchar
  )
return varchar as
  v_cnt number;
  v_results varchar(4000);
begin
  v_cnt := regexp_count(p_data, p_re, 1,'m');
  if v_cnt < 25 then
    for i in 1..v_cnt loop
      v_results := v_results || regexp_substr(p_data,p_re,1,i,'m') || chr(13) || chr(10);
    end loop;
  else 
    v_results := 'WARNING more than 25 matches found';
  end if;

  return v_results;
end;

Then just call the function as part of the select query.


REGEXP_SUBSTR only returns one value. You could turn your string into a pseudo-table and then query that for matches. There's an XML-based way of doing this that escapes me at the moment, but using connect-by works, as long as you only have one source string:

SELECT REGEXP_SUBSTR(str, '[^ ]+', 1, LEVEL) AS substr
FROM (
    SELECT 'Txa233141b Ta233141 Ta233142 Ta233147 Ta233148' AS str FROM DUAL
)
CONNECT BY LEVEL <= LENGTH(REGEXP_REPLACE(str, '[^ ]+')) + 1;

... gives you:

SUBSTR             
--------------------
Txa233141b           
Ta233141             
Ta233142             
Ta233147            
Ta233148            

... and you can filter that with a slightly simpler version of your original pattern:

SELECT substr
FROM (
    SELECT REGEXP_SUBSTR(str, '[^ ]+', 1, LEVEL) AS substr
    FROM (
        SELECT 'Txa233141b Ta233141 Ta233142 Ta233147 Ta233148' AS str
        FROM DUAL
    )
    CONNECT BY LEVEL <= LENGTH(REGEXP_REPLACE(str, '[^ ]+')) + 1
)
WHERE REGEXP_LIKE(substr, '^[A-Za-z]{2}[0-9]{5,}$');

SUBSTR             
--------------------
Ta233141             
Ta233142             
Ta233147             
Ta233148             

Which isn't very pretty, but neither is holding multiple values in one field.

Tags:

Oracle

Regex