Leveraging "Copy-on-Write" to Copy Data to Multiprocessing.Pool() Worker Processes

Anything sent to pool.map (and related methods) isn't actually using shared copy-on-write resources. The values are "pickled" (Python's serialization mechanism), sent over pipes to the worker processes and unpickled there, which reconstructs the object in the child from scratch. Thus, each child in this case ends up with a copy-on-write version of the original data (which it never uses, because it was told to use the copy sent via IPC), and a personal recreation of the original data that was reconstructed in the child and is not shared.

If you want to take advantage of forking's copy-on-write benefits, you can't send data (or objects referencing the data) over the pipe. You have to store them in a location that can be found from the child by accessing their own globals. So for example:

import os
import time
from multiprocessing import Pool
import numpy as np

class MyClass(object):
    def __init__(self):
        self.myAttribute = os.urandom(1024*1024*1024) # basically a big memory struct(~1GB size)

    def my_multithreaded_analysis(self):
        arg_lists = list(range(10))  # Don't pass self
        pool = Pool(processes=10)
        result = pool.map(call_method, arg_lists)
        print result

    def analyze(self, i):
        time.sleep(10)
        return i ** 2

def call_method(i):
    # Implicitly use global copy of my_instance, not one passed as an argument
    return my_instance.analyze(i)

# Constructed globally and unconditionally, so the instance exists
# prior to forking in commonly accessible location
my_instance = MyClass()


if __name__ == '__main__':
    my_instance.my_multithreaded_analysis()

By not passing self, you avoid making copies, and just use the single global object that was copy-on-write mapped into the child. If you needed more than one object, you might make a global list or dict mapping to instances of the object prior to creating the pool, then pass the index or key that can look up the object as part of the argument(s) to pool.map. The worker function then uses the index/key (which had to be pickled and sent to the child over IPC) to look up the value (copy-on-write mapped) in the global dict (also copy-on-write mapped), so you copy cheap information to lookup expensive data in the child without copying it.

If the objects are smallish, they'll end up copied even if you don't write to them. CPython is reference counted, and the reference count appears in the common object header and is updated constantly, just by referring to the object, even if it's a logically non-mutating reference. So small objects (and all the other objects allocated in the same page of memory) will be written, and therefore copied. For large objects (your hundred million element numpy array), most of it would remain shared as long as you didn't write to it, since the header only occupies one of many pages

Changed in python version 3.8: On macOS, the spawn start method is now the default. See mulitprocessing doc. Spawn is not leveraging copy-on-write.


Alternatively, to take advantage of forking's copy-on-write benefits, while preserving some semblance of encapsulation, you could leverage class-attributes and @classmethods over pure globals.

import time
from multiprocessing import Pool
import numpy as np

class MyClass(object):

    myAttribute = np.zeros(100000000) # basically a big memory struct
    # myAttribute is a class-attribute

    @classmethod
    def my_multithreaded_analysis(cls):
        arg_list = [i for i in range(10)]
        pool = Pool(processes=10)
        result = pool.map(analyze, arg_list)
        print result

    @classmethod
    def analyze(cls, i):
        time.sleep(10)
        # If you wanted, you could access cls.myAttribute w/o worry here.
        return i ** 2

""" We don't need this proxy step !
    def call_method(args):
        my_instance, i = args
        return my_instance.analyze(i)
"""

if __name__ == '__main__':
    my_instance = MyClass()
    # Note that now you can instantiate MyClass anywhere in your app,
    # While still taking advantage of copy-on-write forking
    my_instance.my_multithreaded_analysis()

Note 1: Yes, I admit that class-attributes and class-methods are glorified globals. But it buys a bit of encapsulation...

Note 2: Rather than explicitly creating your arg_lists above, you can implicitly pass the instance (self) to each task created by Pool, by passing the bound-instance method analyze(self) to Pool.map(), and shoot yourself in the foot even easier!