RegEx for parsing chemical formulas
(PO4)2 really sits aside from all.
Let's start from simple, match items without parenthesis:
Using regex above we can successfully parse
Then we need to somehow add expression for group. Group by itself can be matched using:
So we add
Which works for given cases. But may be you have some more samples.
Note: It will have problems with nested parenthesis. Ex.
If you want to handle that case, you can use the following regex:
For Objective-C you can use the expression without lookarounds:
Or regex with repetitions (I don't know such formulas, but in case if there is anything like
A(B(CD)3E(FG)4)5 - multiple parenthesis blocks inside one.
This should just about work:
Play around with it here: http://refiddle.com/
When you encounter a parenthesis group, you don't want to parse what's inside, right?
If there are no nested parenthesis groups you can simply use
\d is a shorcut for
[^)] means anything but a parenthesis.
See demo here.