Python hangs indefinitely trying to delete deeply recursive object

Update

On the bug report, a run on a giant machine showed that the time to reclaim the tree storage fell from almost 5 hours to about 70 seconds:

master:

build time 0:48:53.664428
teardown time 4:58:20.132930

patched:

build time 0:48:08.485639
teardown time 0:01:10.46670

(Proposed fix)

Here's a pull request against the CPython project that proposes to "fix this" by removing the searches entirely. It works fine for my 10x smaller test case, but I don't have access to a machine with anywhere near enough RAM to run the original. So I'm waiting for someone who does before merging the PR (who knows? there may be more than one "huge number of objects" design flaw here).

Original reply

Thank you for the nice job of providing an executable sample reproducing your problem! Alas, I can't run it - requires far more memory than I have. If I cut the number of strings by a factor of ten, I end up with about 100,000,000 Node instances in about 8GB of RAM, and it takes about 45 seconds for garbage collection to tear down the tree (Python 3.7.3). So I'm guessing you have about a billion Node instances.

I expect you're not getting responses because there's no "general problem" known here, and it requires such a hefty machine to even try it. The python-dev mailing list may be a better place to ask, or open an issue on https://bugs.python.org.

The usual cause for very slow garbage collection at the end of a run is that memory got swapped out to disk, and then it takes thousands of times longer "than normal" to read objects back into RAM, in "random" order, to tear them down. I'm assuming that's not happening here, though. If it were, CPU usage usually drops to near 0, as the process spends most of its time waiting for disk reads.

Less often, some bad pattern is hit in the underlying C library's malloc/free implementation. But that also seems unlikely here, because these objects are small enough that Python only asks C for "big chunks" of RAM and carves them up itself.

So I don't know. Because nothing can be ruled out, you should also give details about the OS you're using, and how Python was built.

Just for fun, you could try this to get some sense of how far things get before it stalls out. First add this method to Node:

def delete(self):
    global killed
    if self.lo:
        self.lo.delete()
        self.lo = None
    if self.eq:
        self.eq.delete()
        self.eq = None
    if self.hi:
        self.hi.delete()
        self.hi = None
    killed += 1
    if killed % 100000 == 0:
        print(f"{killed:,} deleted")

At the end of train(), add this:

tree.root.delete()

And replace the call to main() with:

killed = 0
main()
print(killed, "killed")

Which may or may not reveal something interesting.

DIDN'T HANG FOR SOMEONE ELSE

I posted a note about this to the python-dev mailing list, and one person so far replied privately:

I started this using Python 3.7.3 | packaged by conda-forge | (default, Mar 27 2019, 23:01:00) [GCC 7.3.0] :: Anaconda, Inc. on linux

$ python fooz.py
This gets printed!
This doesn't get printed

It did take ~80 GB of RAM and several hours, but did not get stuck.

So, unless someone else pops up who can reproduce it, we're probably out of luck here. You at least need to give more info about exactly which OS you're using, and how Python was built.


Could you try recompiling Python?

In obmalloc.c, there is ARENA_SIZE macro defined as:

#define ARENA_SIZE              (256 << 10)     /* 256KB */

This default value is not optimized for very large memory systems.

Your script takes long time for sorting arenas by number of free pools in it. It can be O(N^2) in worst case, when many arenas has same number of free pools.

Your script frees memory blocks in random order, it is near to the worst case.

N is number of arenas here. When you change ARENA_SIZE to (1024 << 10), size of arena is 4x, N become 1/4, and N^2 become 1/16.


If you can not recompile Python, you can use malloc instead of pymalloc.

$ PYTHONMALLOC=malloc python3 yourscript.py

You can override malloc with jemalloc or tcmalloc using LD_PRELOAD environment variable.