Python multiprocessing: How to know to use Pool or Process?

You can easily do this with pypeln:

import pypeln as pl

stage = pl.process.map(
    CreateMatrixMp, 
    range(self.numPixels), 
    workers=poolCount, 
    maxsize=2,
)

# iterate over it in the main process
for x in stage:
   # code

# or convert it to a list
data = list(stage)

I think the Pool class is typically more convenient, but it depends whether you want your results ordered or unordered.

Say you want to create 4 random strings (e.g,. could be a random user ID generator or so):

import multiprocessing as mp
import random
import string

# Define an output queue
output = mp.Queue()

# define a example function
def rand_string(length, output):
    """ Generates a random string of numbers, lower- and uppercase chars. """
    rand_str = ''.join(random.choice(
                    string.ascii_lowercase
                    + string.ascii_uppercase
                    + string.digits)
               for i in range(length))
    output.put(rand_str)

# Setup a list of processes that we want to run
processes = [mp.Process(target=rand_string, args=(5, output)) for x in range(4)]

# Run processes
for p in processes:
    p.start()

# Exit the completed processes
for p in processes:
    p.join()

# Get process results from the output queue
results = [output.get() for p in processes]

print(results)

# Output
# ['yzQfA', 'PQpqM', 'SHZYV', 'PSNkD']

Here, the order probably doesn't matter. I am not sure if there is a better way to do it, but if I want to keep track of results in the order in which the functions are called, I typically return tuples with an ID as first item, e.g.,

# define a example function
def rand_string(length, pos, output):
    """ Generates a random string of numbers, lower- and uppercase chars. """
    rand_str = ''.join(random.choice(
                    string.ascii_lowercase
                    + string.ascii_uppercase
                    + string.digits)
                for i in range(length))
    output.put((pos, rand_str))

# Setup a list of processes that we want to run
processes = [mp.Process(target=rand_string, args=(5, x, output)) for x in range(4)]

print(processes)

# Output
# [(1, '5lUya'), (3, 'QQvLr'), (0, 'KAQo6'), (2, 'nj6Q0')]

This let's me sort the results then:

results.sort()
results = [r[1] for r in results]
print(results)

# Output:
# ['KAQo6', '5lUya', 'nj6Q0', 'QQvLr']

The Pool class

Now to your question: How is this different from the Pool class? You'd typically prefer Pool.map to return ordered list of results without going through the hoop of creating tuples and sorting them by ID. Thus, I would say it is typically more efficient.

def cube(x):
    return x**3

pool = mp.Pool(processes=4)
results = pool.map(cube, range(1,7))
print(results)

# output:
# [1, 8, 27, 64, 125, 216]

Equivalently, there is also an "apply" method:

pool = mp.Pool(processes=4)
results = [pool.apply(cube, args=(x,)) for x in range(1,7)]
print(results)

# output:
# [1, 8, 27, 64, 125, 216]

Both Pool.apply and Pool.map will lock the main program until a process has finished.

Now, you also have Pool.apply_async and Pool.map_async, which return the result as soon as the process has finished, which is essentially similar to the Process class above. The advantage may be that they provide you with the convenient apply and map functionality that you know from Python's in-built apply and map

Python multiprocessing: How to know to use Pool or Process?

The Pool class

Tags:

Python

Threadpool

Multiprocessing

Related

Recent Posts