How do I count and enumerate the keys in an lmdb with python?

As Sait pointed out, you can iterate over a cursor to collect all keys. However, this may be a bit inefficient, as it would also load the values. This can be avoided, by using on the cursor.iternext() function with values=False.

with env.begin() as txn:
  keys = list(txn.cursor().iternext(values=False))

I did a short benchmark between both methods for a DB with 2^20 entries, each with a 16 B key and 1024 B value.

Retrieving keys by iterating over the cursor (including values) took 874 ms in average for 7 runs, while the second method, where only the keys are returned took 517 ms. These results may differ depending on the size of keys and values.


A way to get the total number of keys without enumerating them individually, counting also all sub databases:

with env.begin() as txn:
    length = txn.stat()['entries']

Test result with a hand-made database of size 1000000 on my laptop:

  • the method above is instantaneous (0.0 s)
  • the iteration method takes about 1 second.

Are you looking for something like this:

with env.begin() as txn:
    with txn.cursor() as curs:
        # do stuff
        print 'key is:', curs.get('key')

Update:

This may not be the fastest:

with env.begin() as txn:
   myList = [ key for key, _ in txn.cursor() ]
   print(myList)

Disclaimer: I don't know anything about the library, just searched its docs and searched for key in the docs.

Tags:

Python

Lmdb