Python multiprocessing Process crashes silently

What you really want is some way to pass exceptions up to the parent process, right? Then you can handle them however you want.

If you use concurrent.futures.ProcessPoolExecutor, this is automatic. If you use multiprocessing.Pool, it's trivial. If you use explicit Process and Queue, you have to do a bit of work, but it's not that much.

For example:

def run(self):
    try:
        for i in iter(self.inputQueue.get, 'STOP'):
            # (code that does stuff)
            1 / 0 # Dumb error
            # (more code that does stuff)
            self.outputQueue.put(result)
    except Exception as e:
        self.outputQueue.put(e)

Then, your calling code can just read Exceptions off the queue like anything else. Instead of this:

yield outq.pop()

do this:

result = outq.pop()
if isinstance(result, Exception):
    raise result
yield result

(I don't know what your actual parent-process queue-reading code does, because your minimal sample just ignores the queue. But hopefully this explains the idea, even though your real code doesn't actually work like this.)

This assumes that you want to abort on any unhandled exception that makes it up to run. If you want to pass back the exception and continue on to the next i in iter, just move the try into the for, instead of around it.

This also assumes that Exceptions are not valid values. If that's an issue, the simplest solution is to just push (result, exception) tuples:

def run(self):
    try:
        for i in iter(self.inputQueue.get, 'STOP'):
            # (code that does stuff)
            1 / 0 # Dumb error
            # (more code that does stuff)
            self.outputQueue.put((result, None))
    except Exception as e:
        self.outputQueue.put((None, e))

Then, your popping code does this:

result, exception = outq.pop()
if exception:
    raise exception
yield result

You may notice that this is similar to the node.js callback style, where you pass (err, result) to every callback. Yes, it's annoying, and you're going to mess up code in that style. But you're not actually using that anywhere except in the wrapper; all of your "application-level" code that gets values off the queue or gets called inside run just sees normal returns/yields and raised exceptions.

You may even want to consider building a Future to the spec of concurrent.futures (or using that class as-is), even though you're doing your job queuing and executing manually. It's not that hard, and it gives you a very nice API, especially for debugging.

Finally, it's worth noting that most code built around workers and queues can be made a lot simpler with an executor/pool design, even if you're absolutely sure you only want one worker per queue. Just scrap all the boilerplate, and turn the loop in the Worker.run method into a function (which just returns or raises as normal, instead of appending to a queue). On the calling side, again scrap all the boilerplate and just submit or map the job function with its parameters.

Your whole example can be reduced to:

def job(i):
    # (code that does stuff)
    1 / 0 # Dumb error
    # (more code that does stuff)
    return result

with concurrent.futures.ProcessPoolExecutor(max_workers=1) as executor:
    results = executor.map(job, range(10))

And it'll automatically handle exceptions properly.


As you mentioned in the comments, the traceback for an exception doesn't trace back into the child process; it only goes as far as the manual raise result call (or, if you're using a pool or executor, the guts of the pool or executor).

The reason is that multiprocessing.Queue is built on top of pickle, and pickling exceptions doesn't pickle their tracebacks. And the reason for that is that you can't pickle tracebacks. And the reason for that is that tracebacks are full of references to the local execution context, so making them work in another process would be very hard.

So… what can you do about this? Don't go looking for a fully general solution. Instead, think about what you actually need. 90% of the time, what you want is "log the exception, with traceback, and continue" or "print the exception, with traceback, to stderr and exit(1) like the default unhandled-exception handler". For either of those, you don't need to pass an exception at all; just format it on the child side and pass a string over. If you do need something more fancy, work out exactly what you need, and pass just enough information to manually put that together. If you don't know how to format tracebacks and exceptions, see the traceback module. It's pretty simple. And this means you don't need to get into the pickle machinery at all. (Not that it's very hard to copyreg a pickler or write a holder class with a __reduce__ method or anything, but if you don't need to, why learn all that?)


I suggest such workaround for showing process's exceptions

from multiprocessing import Process
import traceback


run_old = Process.run

def run_new(*args, **kwargs):
    try:
        run_old(*args, **kwargs)
    except (KeyboardInterrupt, SystemExit):
        raise
    except:
        traceback.print_exc(file=sys.stdout)

Process.run = run_new

This is not an answer, just an extended comment. Please run this program an tell us what output (if any) you get:

from multiprocessing import Process, Queue

class Worker(Process):

    def __init__(self, inputQueue, outputQueue):

        super(Worker, self).__init__()

        self.inputQueue = inputQueue
        self.outputQueue = outputQueue

    def run(self):

        for i in iter(self.inputQueue.get, 'STOP'):

            # (code that does stuff)

            1 / 0 # Dumb error

            # (more code that does stuff)

            self.outputQueue.put(result)

if __name__ == '__main__':
    inq, outq = Queue(), Queue()
    inq.put(1)
    inq.put('STOP')
    w = Worker(inq, outq)
    w.start()

I get:

% test.py
Process Worker-1:
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/home/unutbu/pybin/test.py", line 21, in run
    1 / 0 # Dumb error
ZeroDivisionError: integer division or modulo by zero

I'm surprised (if) you get nothing.