Regex to match only commas not in parentheses?

Paul, resurrecting this question because it had a simple solution that wasn't mentioned. (Found your question while doing some research for a regex bounty quest.)

Also the existing solution checks that the comma is not followed by a parenthesis, but that does not guarantee that it is embedded in parentheses.

The regex is very simple:

\(.*?\)|(,)

The left side of the alternation matches complete set of parentheses. We will ignore these matches. The right side matches and captures commas to Group 1, and we know they are the right commas because they were not matched by the expression on the left.

In this demo, you can see the Group 1 captures in the lower right pane.

You said you want to match the commas, but you can use the same general idea to split or replace.

To match the commas, you need to inspect Group 1. This full program's only goal in life is to do just that.

import java.util.*;
import java.io.*;
import java.util.regex.*;
import java.util.List;

class Program {
public static void main (String[] args) throws java.lang.Exception  {

String subject = "12,44,foo,bar,(23,45,200),6";
Pattern regex = Pattern.compile("\\(.*?\\)|(,)");
Matcher regexMatcher = regex.matcher(subject);
List<String> group1Caps = new ArrayList<String>();

// put Group 1 captures in a list
while (regexMatcher.find()) {
if(regexMatcher.group(1) != null) {
group1Caps.add(regexMatcher.group(1));
}
} // end of building the list

// What are all the matches?
System.out.println("\n" + "*** Matches ***");
if(group1Caps.size()>0) {
for (String match : group1Caps) System.out.println(match);
}
} // end main
} // end Program

Here is a live demo

To use the same technique for splitting or replacing, see the code samples in the article in the reference.

Reference

  1. How to match pattern except in situations s1, s2, s3
  2. How to match a pattern unless...

Assuming that there can be no nested parens (otherwise, you can't use a Java Regex for this task because recursive matching is not supported):

Pattern regex = Pattern.compile(
    ",         # Match a comma\n" +
    "(?!       # only if it's not followed by...\n" +
    " [^(]*    #   any number of characters except opening parens\n" +
    " \\)      #   followed by a closing parens\n" +
    ")         # End of lookahead", 
    Pattern.COMMENTS);

This regex uses a negative lookahead assertion to ensure that the next following parenthesis (if any) is not a closing parenthesis. Only then the comma is allowed to match.

Tags:

Java

Regex