Safely iterating over WeakKeyDictionary and WeakValueDictionary

To be safe you have to keep a reference somewhere. Using the idiom:

for k,v in list(d.items()):

is not completely safe because, even though it will work most of the time, during the last iteration of the loop the list may be garbage-collected.

The correct way would be:

items = list(d.items())
for k,v in items:
    #do stuff that doesn't have a chance of destroying "items"
del items

If you use a WeakKeyDictionary you could simply store the keys, and store values if you use WeakValueDictionary.

On a side note: in python2 .items() already returns a list.

Ultimately it depends on what do you mean by "safe". If you simply mean that the iteration will proceed correctly(iterating once on all the elements), then:

for k,v in list(d.items()):

is safe, because the iteration over the dictionary is actually performed by list(d.items()), then you are only iterating over the list.

If you, instead, mean that during the iteration elements should not "disappear" from the dictionary as side-effect of the for-loop, then you must keep a strong reference until the end of the loop, and this requires you to store the list in a variable before starting the loop.

It is actually safe to iterate over a WeakKeyDictionary, WeakValueDictionary, or WeakSet in Python 2.7 or Python 3.1+. They put in an iteration guard that prevents weakref callbacks from removing references from the underlying dict or set during iteration all the way back in 2010, but the docs never got updated.

With the guard in, if an entry dies before iteration reaches it, iteration will skip that entry, but it won't result in a segfault or a RuntimeError or anything. Dead entries will be added to a list of pending removals and handled later.

Here's the guard (not threadsafe, despite the comment):

class _IterationGuard:
    # This context manager registers itself in the current iterators of the
    # weak container, such as to delay all removals until the context manager
    # exits.
    # This technique should be relatively thread-safe (since sets are).

    def __init__(self, weakcontainer):
        # Don't create cycles
        self.weakcontainer = ref(weakcontainer)

    def __enter__(self):
        w = self.weakcontainer()
        if w is not None:
            w._iterating.add(self)
        return self

    def __exit__(self, e, t, b):
        w = self.weakcontainer()
        if w is not None:
            s = w._iterating
            s.remove(self)
            if not s:
                w._commit_removals()

Here's where the WeakKeyDictionary weakref callback checks the guard:

def remove(k, selfref=ref(self)):
    self = selfref()
    if self is not None:
        if self._iterating:
            self._pending_removals.append(k)
        else:
            del self.data[k]

And here's where WeakKeyDictionary.__iter__ sets the guard:

def keys(self):
    with _IterationGuard(self):
        for wr in self.data:
            obj = wr()
            if obj is not None:
                yield obj

__iter__ = keys

The same guard is used in the other iterators.

If this guard did not exist, calling list(d.items()) wouldn't be safe either. A GC pass could happen inside the items iterator and remove items from the dict during iteration. (The fact that list is written in C would provide no protection.)

Back in 2.6 and earlier, the safest way to iterate over a WeakKeyDictionary or WeakValueDictionary would have been to use items. items would return a list, and it would use the underlying dict's items method, which would have been (mostly?) not interruptible by GC. The dict API changes in 3.0 changed how keys/values/items worked, which is probably why the guard was introduced when it was.

Safely iterating over WeakKeyDictionary and WeakValueDictionary

Tags:

Python

Dictionary

Weak References

Related

Recent Posts