Distributed programming in Python

lazy method calls of objects

Can be anything at all really, so let's break it down:

Simple Let-Me-Call-That-Function (RPC)

Well lucky you! python has the one of greatest implementations of Remote Procedure Calls: RPyC.

Just run the server (double click a file, see the tutorial),

Open an interpreter and:

import rpyc
conn = rpyc.classic.connect("localhost")
data_obj = conn.modules.lazyme.AwesomeObject("ABCDE")
print(data_obj.calculate(10))

And a lazy version (async):

# wrap the remote function with async(), which turns the invocation asynchronous
acalc = rpyc.async(data_obj.calculate)
res = acalc(10)
print(res.ready, res.value)

Simple Data Distribution

You have a defined unit of work, say a complex image manipulation. What you do is roughly create Node(s), which does the actual work (aka, take an image, do the manipulation, and return the result), someone who collect the results (a Sink) and someone who create the work (the Distributor).

Take a look at Celery.

If it's very small scale, or if you just want to play with it, see the Pool object in the multiprocessing package:

from multiprocessing import Pool
p = Pool(5)
def f(x):
     return x*x
print(p.map(f, [1,2,3]))

And the truly-lazy version:

print(p.map_async(f, [1,2,3]))

Which returns a Result object which can be inspected for results.

Complex Data Distribution

Some multi-level more-than-just-fire&forget complex data manipulation, or a multi-step processing use case.

In such case, you should use a Message Broker such as ZeroMQ or RabbitMQ. They allow to you send 'messages' across multiple servers with great ease.

They save you from the horrors of the TCP land, but they are a bit more complex (some, like RabbitMQ, require a separate process/server for the Broker). However, they give you much more fine-grained control over the flow of data, and help you build a truly scalable application.

Lazy-Anything

While not data-distribution per se, It is the hottest trend in web server back-ends: use 'green' threads (or events, or coroutines) to delegate IO heavy tasks to a dedicated thread, while the application code is busy maxing-out the CPU.

I like Eventlet a lot, and gevent is another option.


Try Gearman http://gearman.org/

Gearman provides a generic application framework to farm out work to other machines or processes that are better suited to do the work. It allows you to do work in parallel, to load balance processing, and to call functions between languages. It can be used in a variety of applications, from high-availability web sites to the transport of database replication events. In other words, it is the nervous system for how distributed processing communicates.