asyncio: running task only if all other tasks are awaiting

When event loop runs some task, this task is being executed until it returns control back to event loop. There's usually only one reason task want to return control to the event loop: task if facing blocking operation (and thus is "not ready to continue").

It means that "every iteration of the event loop" is usually equal to "all main_tasks are on await". Code you already have will (mostly) work as you want. Only thing you should do is to make special_function() task.

There's some chance task returned control to event loop before it faced "real" blocking call and it usually looks like await asyncio.sleep(0) (like you do in special_function). It means task want to ensure all other tasks being called before continue: you probably want to respect that.

This is what I'd do:

I'd not use your special function.
Each data update needs a separate generation ID (4 byte integer), and I'd only put in the ID in shared memory.

Both processes are running independently, I assume.

The subscriber keeps the generation ID as local. When it notices the generation ID is changed in shared memory, then the read new data from the file.
Data is stored on tmpfs (/tmp) so it's on memory. You can create your own tmpfs if suited. It's fast.

Here is why:

To make sure the subscriber doesn't fetch half-baked data in shared memory, it has to be protected by semaphore. It's a PITA
By using file, you can carry variable size data. This may not apply to you. One of hard things to solve when using shared memory is to have enough space but not waste space. Using file solves this problem.
By using 4-byte int generation ID, updating ID is atomic. This is a huge advantage.

So, as one of your tasks receives new data, open a file, write to it, and after closing the file descriptor, you write out the generation ID to shared memory. Before updating generation ID, you can delete the file safely. The subscriber - if it has opened file, it will complete reading the file, and if it tries to open it, it fails to open so it has to wait for the next generation anyway. If machine crashes, /tmp is gone so you don't need to worry about cleaning up files. You can even write a new task which solo job is to delete files in /tmp that is older generations if you like.

I tried to write a test for the 'task not ready to run' condition. I think asyncio does not expose details from the scheduler. The developers have clearly stated they want to keep freedom for changing asyncio internals without breaking backward compatibility.

In asyncio.Task there is this comment (note: _step() runs the task coroutine till the next await):

# An important invariant maintained while a Task not done:
#   
# - Either _fut_waiter is None, and _step() is scheduled;
# - or _fut_waiter is some Future, and _step() is *not* scheduled.

But that internal variable is not in the API, of course.

You can get some limited access to _fut_waiter by reading the output of repr(task), but the format seems to be not reliable either, so I would not depend on somehing like this:

PENDINGMSG = 'wait_for=<Future pending '

if all(PENDINGMSG in repr(t) for t in monitored_tasks):
    do_something()

Anyway, I think you are trying to be too perfect. You want to know if there is new data in other tasks. What if the data is in asyncio buffers? Kernel buffer? Network card receive buffer? ... You could never know if new data arrives the next millisecond.

My suggestion: write all updates to a single queue. Check that queue as the only source of updates. If the queue is empty, publish the last state.

asyncio: running task only if all other tasks are awaiting

Tags:

Python

Concurrency

Python Asyncio

Related

Recent Posts