Does python logging support multiprocessing?

It is not safe to write to a single file from multiple processes.

According to https://docs.python.org/3/howto/logging-cookbook.html#logging-to-a-single-file-from-multiple-processes

Although logging is thread-safe, and logging to a single file from multiple threads in a single process is supported, logging to a single file from multiple processes is not supported, because there is no standard way to serialize access to a single file across multiple processes in Python.

One possible solution would be to have each process write to its own file. You can achieve this by writing your own handler that adds process pid to the end of the file:

import logging.handlers
import os


class PIDFileHandler(logging.handlers.WatchedFileHandler):

    def __init__(self, filename, mode='a', encoding=None, delay=0):
        filename = self._append_pid_to_filename(filename)
        super(PIDFileHandler, self).__init__(filename, mode, encoding, delay)

    def _append_pid_to_filename(self, filename):
        pid = os.getpid()
        path, extension = os.path.splitext(filename)
        return '{0}-{1}{2}'.format(path, pid, extension)

Then you just need to call addHandler:

logger = logging.getLogger('foo')
fh = PIDFileHandler('bar.log')
logger.addHandler(fh)

Use a queue for correct handling of concurrency simultaneously recovering from errors by feeding everything to the parent process via a pipe.

from logging.handlers import RotatingFileHandler
import multiprocessing, threading, logging, sys, traceback

class MultiProcessingLog(logging.Handler):
    def __init__(self, name, mode, maxsize, rotate):
        logging.Handler.__init__(self)

        self._handler = RotatingFileHandler(name, mode, maxsize, rotate)
        self.queue = multiprocessing.Queue(-1)

        t = threading.Thread(target=self.receive)
        t.daemon = True
        t.start()

    def setFormatter(self, fmt):
        logging.Handler.setFormatter(self, fmt)
        self._handler.setFormatter(fmt)

    def receive(self):
        while True:
            try:
                record = self.queue.get()
                self._handler.emit(record)
            except (KeyboardInterrupt, SystemExit):
                raise
            except EOFError:
                break
            except:
                traceback.print_exc(file=sys.stderr)

    def send(self, s):
        self.queue.put_nowait(s)

    def _format_record(self, record):
         # ensure that exc_info and args
         # have been stringified.  Removes any chance of
         # unpickleable things inside and possibly reduces
         # message size sent over the pipe
        if record.args:
            record.msg = record.msg % record.args
            record.args = None
        if record.exc_info:
            dummy = self.format(record)
            record.exc_info = None

        return record

    def emit(self, record):
        try:
            s = self._format_record(record)
            self.send(s)
        except (KeyboardInterrupt, SystemExit):
            raise
        except:
            self.handleError(record)

    def close(self):
        self._handler.close()
        logging.Handler.close(self)

The handler does all the file writing from the parent process and uses just one thread to receive messages passed from child processes

As Matino correctly explained: logging in a multiprocessing setup is not safe, as multiple processes (who do not know anything about the other ones existing) are writing into the same file, potentially intervening with each other.

Now what happens is that every process holds an open file handle and does an "append write" into that file. The question is under what circumstances the append write is "atomic" (that is, cannot be interrupted by e.g. another process writing to the same file and intermingling his output). This problem applies to every programming language, as in the end they'll do a syscall to the kernel. This answer answers under which circumstances a shared log file is ok.

It comes down to checking your pipe buffer size, on linux that is defined in /usr/include/linux/limits.h and is 4096 bytes. For other OSes you find here a good list.

That means: If your log line is less than 4'096 bytes (if on Linux), then the append is safe, if the disk is directly attached (i.e. no network in between). But for more details please check the first link in my answer. To test this you can do logger.info('proc name %s id %s %s' % (proc.name, proc.pid, str(proc.name)*5000)) with different lenghts. With 5000 for instance I got already mixed up log lines in /tmp/test.log.

In this question there are already quite a few solutions to this, so I won't add my own solution here.

Update: Flask and multiprocessing

Web frameworks like flask will be run in multiple workers if hosted by uwsgi or nginx. In that case, multiple processes may write into one log file. Will it have problems?

The error handling in flask is done via stdout/stderr which is then cought by the webserver (uwsgi, nginx, etc.) which needs to take care that logs are written in correct fashion (see e.g. [this flask+nginx example])(http://flaviusim.com/blog/Deploying-Flask-with-nginx-uWSGI-and-Supervisor/), probably also adding process information so you can associate error lines to processes. From flasks doc:

By default as of Flask 0.11, errors are logged to your webserver’s log automatically. Warnings however are not.

So you'd still have this issue of intermingled log files if you use warn and the message exceeds the pipe buffer size.

Does python logging support multiprocessing?

Tags:

Python

Python Multiprocessing

Related

Recent Posts