Regex to match nested json objects

This recursive Perl/PCRE regular expression should be able to match any valid JSON or JSON5 object, including nested objects and edge cases such as braces inside JSON strings or JSON5 comments:

/(\{(?:(?>[^{}"'\/]+)|(?>"(?:(?>[^\\"]+)|\\.)*")|(?>'(?:(?>[^\\']+)|\\.)*')|(?>\/\/.*\n)|(?>\/\*.*?\*\/)|(?-1))*\})/

Of course, that's a bit hard to read, so you might prefer the commented version:

m{
  (                               # Begin capture group (matching a JSON object).
    \{                              # Match opening brace for JSON object.
    (?:                             # Begin non-capturing group to contain alternations.
      (?>[^{}"'\/]+)                  # Match a non-empty string which contains no braces, quotes or slashes, without backtracking.
    |                               # Alternation; next alternative follows.
      (?>"(?:(?>[^\\"]+)|\\.)*")      # Match a double-quoted JSON string, without backtracking.
    |                               # Alternation; next alternative follows.
      (?>'(?:(?>[^\\']+)|\\.)*')      # Match a single-quoted JSON5 string, without backtracking.
    |                               # Alternation; next alternative follows.
      (?>\/\/.*\n)                    # Match a single-line JSON5 comment, without backtracking.
    |                               # Alternation; next alternative follows.
      (?>\/\*.*?\*\/)                 # Match a multi-line JSON5 comment, without backtracking.
    |                               # Alternation; next alternative follows.
      (?-1)                           # Recurse to most recent capture group, to match a nested JSON object.
    )*                              # End of non-capturing group; match zero or more repetitions of this group.
    \}                              # Match closing brace for JSON object.
  )                               # End of capture group (matching a JSON object).
}x

As others have suggested, a full-blown JSON parser is probably the way to go. If you want to match the key-value pairs in the simple examples that you have above, you could use:

(?<=\{)\s*[^{]*?(?=[\},])

For the input string

{title:'Title',  {data:'Data', {foo: 'Bar'}}}

This matches:

 1. title:'Title'
 2. data:'Data'
 3. foo: 'Bar'

Thanks to @Sanjay T. Sharma that pointed me to "brace matching" because I eventually got some understanding of greedy expressions and also thanks to others for saying initially what I shouldn't do. Fortunately it turned out it's OK to use greedy variant of expression

\\{\s*title.*\\}

because there is no non-JSON data between closing brackets.

Tags:

Java

Regex