Parsing json and searching through it

Seems there's a typo (missing colon) in the JSON dict provided by jro.

The correct syntax would be:

jdata = json.load('{"uri": "http:", "foo": "bar"}')

This cleared it up for me when playing with the code.

ObjectPath is a library that provides ability to query JSON and nested structures of dicts and lists. For example, you can search for all attributes called "foo" regardless how deep they are by using $..foo.

While the documentation focuses on the command line interface, you can perform the queries programmatically by using the package's Python internals. The example below assumes you've already loaded the data into Python data structures (dicts & lists). If you're starting with a JSON file or string you just need to use load or loads from the json module first.

import objectpath

data = [
    {'foo': 1, 'bar': 'a'},
    {'foo': 2, 'bar': 'b'},
    {'NoFooHere': 2, 'bar': 'c'},
    {'foo': 3, 'bar': 'd'},
]

tree_obj = objectpath.Tree(data)

tuple(tree_obj.execute('$..foo'))
# returns: (1, 2, 3)

Notice that it just skipped elements that lacked a "foo" attribute, such as the third item in the list. You can also do much more complex queries, which makes ObjectPath handy for deeply nested structures (e.g. finding where x has y that has z: $.x.y.z). I refer you to the documentation and tutorial for more information.

As json.loads simply returns a dict, you can use the operators that apply to dicts:

>>> jdata = json.load('{"uri": "http:", "foo", "bar"}')
>>> 'uri' in jdata       # Check if 'uri' is in jdata's keys
True
>>> jdata['uri']         # Will return the value belonging to the key 'uri'
u'http:'

Edit: to give an idea regarding how to loop through the data, consider the following example:

>>> import json
>>> jdata = json.loads(open ('bookmarks.json').read())
>>> for c in jdata['children'][0]['children']:
...     print 'Title: {}, URI: {}'.format(c.get('title', 'No title'),
                                          c.get('uri', 'No uri'))
...
Title: Recently Bookmarked, URI: place:folder=BOOKMARKS_MENU(...)
Title: Recent Tags, URI: place:sort=14&type=6&maxResults=10&queryType=1
Title: , URI: No uri
Title: Mozilla Firefox, URI: No uri

Inspecting the jdata data structure will allow you to navigate it as you wish. The pprint call you already have is a good starting point for this.

Edit2: Another attempt. This gets the file you mentioned in a list of dictionaries. With this, I think you should be able to adapt it to your needs.

>>> def build_structure(data, d=[]):
...     if 'children' in data:
...         for c in data['children']:
...             d.append({'title': c.get('title', 'No title'),
...                                      'uri': c.get('uri', None)})
...             build_structure(c, d)
...     return d
...
>>> pprint.pprint(build_structure(jdata))
[{'title': u'Bookmarks Menu', 'uri': None},
 {'title': u'Recently Bookmarked',
  'uri':   u'place:folder=BOOKMARKS_MENU&folder=UNFILED_BOOKMARKS&(...)'},
 {'title': u'Recent Tags',
  'uri':   u'place:sort=14&type=6&maxResults=10&queryType=1'},
 {'title': u'', 'uri': None},
 {'title': u'Mozilla Firefox', 'uri': None},
 {'title': u'Help and Tutorials',
  'uri':   u'http://www.mozilla.com/en-US/firefox/help/'},
 (...)
}]

To then "search through it for u'uri': u'http:'", do something like this:

for c in build_structure(jdata):
    if c['uri'].startswith('http:'):
        print 'Started with http'

Parsing json and searching through it

Tags:

Python

String

Grep

Json

Pprint

Related

Recent Posts