Python how to create a dict of dict of list with defaultdict

You may have to do like this.

>>> from collections import defaultdict
>>> a=defaultdict()
>>> a["testkey"]=None
>>> a["testkey"]=defaultdict(list)
>>> a["testkey"]["list"]=["a","b","c"]
>>> a
defaultdict(None, {'testkey': defaultdict(<type 'list'>, {'list': ['a', 'b', 'c']})})

Slightly faster than using a lambda:

defaultdict(defaultdict(list).copy)

This has the same observable behavior as wim's answer, but avoids a lambda in favor of a (in CPython) bound built-in method implemented in C, which means default value generation doesn't have to execute any Python byte code or look up any names, and it runs a small amount faster. In microbenchmarks on CPython 3.5, it looks like the cost paid when a key did not exist at access time is about 5-10% lower this way than with the otherwise equivalent lambda.

Really, the reason I prefer it is because I hate lambda due to people overusing it when it's a bad idea (e.g. map/filter with lambda is always more verbose and slower than an equivalent listcomp/genexpr, but people keep doing it anyway for no discernible reason), even though in this case it hardly matters.


Update: As of 3.8, this performance improvement is gone, and the lambda is faster (~3% reduced runtime using lambda on 3.8, ~7% on 3.9), for simple microbenchmarks with ipython. If you wish to reproduce my tests, I tested:

>>> from collections import defaultdict
>>> %%timeit dd = defaultdict(lambda: defaultdict(list)); o = object
... dd[o()]

>>> %%timeit dd = defaultdict(defaultdict(list).copy); o = object
... dd[o()]

where caching o = object minimized lookup expenses and allowed us to make extremely cheap, guaranteed unique keys that we accessed (forcing auto-vivification of a list) while doing no other work.

The performance improvement in 3.8 is likely largely due to the introduction of per opcode cache for the LOAD_GLOBAL instruction, reducing the cost of looking up defaultdict and list within the lambda from a full dict lookup (two in the case of list, in built-ins) to a quick check of the version tag on the dict followed by a cheap load from the cache, reducing the cost by ~40%. The 3.9 improvement likely (not sure on this) relates to CPython's internals moving to optimize and favor vectorcall code paths more, at the expense of non-vectorcall code paths (which the defaultdict(list).copy path uses more of, relatively speaking), and even before these improvements, defaultdict(list).copy had some inefficiencies that the lambda lacked, providing some margin for improving on it.


It's a little tricky. You make a defaultdict of defaultdicts, like so:

defaultdict(lambda: defaultdict(list))