Parse only one level of json

I think you can solve this using regex, it is working for me:

import re
pattern = re.compile('"([a-zA-Z0-9]+)"\s*:\s*(".*"|\[.*\]|\{.*\})')    
dict(re.findall(pattern, json_string))

But I dont know if this is faster, you need try using your data.

[EDIT]

Yes, it is faster. I tried the scripts below and the regex version is 5 times faster.

using json module:

import json

val='''
{
    "key1": "val1",
    "key2": ["a","b", 3],
    "key3": {"foo": 27, "bar": [1, 2, 3]}
}
'''

for n in range(100000):
    dict((k,json.dumps(v)) for k,v in json.loads(val).items())

using regex:

import re

val='''{
    "key1": "val1",
    "key2": ["a","b", 3],
    "key3": {"foo": 27, "bar": [1, 2, 3]}
}'''

pattern = re.compile('"([a-zA-Z0-9]+)"\s*:\s*(".*"|\[.*\]|\{.*\})')    
for n in range(100000):
    dict(re.findall(pattern, val))

Hardly an answer, but I only see two possibilities:

  1. Load the full JSON and dump back the values, which you have ruled out in your question
  2. Modify the content by wrapping the values in quotes, so that the JSON load yields string values

To be honest, I think there is no such thing as 'performance critical JSON parsing code', it just sounds wrong, so I'd go with the first option.

Tags:

Python

Json