Regex: ignoring order of groups

You can use this kind of pattern:

String p = "\\b (?=[\\dPEUCG])  # to jump quickly at interesting positions       \n" +
           "(?=     # open a lookahead                                           \n" +
           "    (?> [\\d,]+ \\s* )? # perhaps the value is before                \n" +
           "    (?<currency> PLN|EUR|USD|CHF|GBP )  # capture the currency       \n" +
           "    (?:\\b|\\d) # a word boundary or a digit                         \n" +
           ")       # close the lookahead                                        \n" +
           "(?> [B-HLNPRSU]{3} \\s* )? (?<value> \\d+(?:,\\d+)? )                  ";

Pattern RegComp = Pattern.compile(p, Pattern.COMMENTS);

String s = "USD 1150,25 randomtext \n" +
           "Non works randomtext 1150,25 USD randomtext\n" +
           "Works randomtextUSD 1150,25 USD randomtext\n" +
           "Works randomtext USD 1150,25 randomtext\n" +
           "Works randomtext USD1150,25 randomtext\n" +
           "Non work randomtext 1150,25 USD randomtext";

Matcher m = RegComp.matcher(s);

while( m.find() ) {
    System.out.println(m.group("value") + " : " + m.group("currency"));
}

The idea is to capture the currency in a lookahead (that is a zero-width assertion). The lookahead is only an assertion and doesn't consume characters, and the subpattern inside describes an eventual value before. So the position of the currency doesn't change anything. The value is captured outside of the lookahead.

About \\b (?=[\\dPEUCG]): The goal of this subpattern is to filter positions in the string that are not the beginning of a word that starts with a digit or one of the first letters of the different currencies without to test the whole pattern.

Tags:

Java

Regex