The tasks from asyncio.gather does not work concurrently

Using asyncio is different from using threads in that you cannot add it to an existing code base to make it concurrent. Specifically, code that runs in the asyncio event loop must not block - all blocking calls must be replaced with non-blocking versions that yield control to the event loop. In your case, requests.get blocks and defeats the parallelism implemented by asyncio.

To avoid this problem, you need to use an http library that is written with asyncio in mind, such as aiohttp.

I'll add a little more to user4815162342's response. The asyncio framework uses coroutines that must cede control of the thread while they do the long operation. See the diagram at the end of this section for a nice graphical representation. As user4815162342 mentioned, the requests library doesn't support asyncio. I know of two ways to make this work concurrently. First, is to do what user4815162342 suggested and switch to a library with native support for asynchronous requests. The second is to run this synchronous code in separate threads or processes. The latter is easy because of the run_in_executor function.

loop = asyncio.get_event_loop()

async def return_soup(url):
    r = await loop.run_in_executor(None, requests.get, url)
    r.encoding = "utf-8"
    return BeautifulSoup(r.text, "html.parser")

async def parseURL_async(url):    
    print("Started to download {0}".format(url))
    soup = await return_soup(url)
    print("Finished downloading {0}".format(url))

    return soup

t = [parseURL_async(url_1), parseURL_async(url_2)]
loop.run_until_complete(asyncio.gather(*t))

This solution removes some of the benefit of using asyncio, as the long operation will still probably be executed from a fixed size thread pool, but it's also much easier to start with.

The reason as mentioned in other answers is the lack of library support for coroutines.

As of Python 3.9 though, you can use the function to_thread as an alternative for I/O concurrency.

Obviously this is not exactly equivalent because as the name suggests it runs your functions in separate threads as opposed of a single thread in the event loop, but it can be a way to achieve I/O concurrency without relying on proper async support from the library.

In your example the code would be:

def return_soup(url):
    r = requests.get(url)
    r.encoding = "utf-8"
    return BeautifulSoup(r.text, "html.parser")

def parseURL_async(url):
    print("Started to download {0}".format(url))
    soup = return_soup(url)
    print("Finished downloading {0}".format(url))
    return soup

async def main():
    result_url_1, result_url_2 = await asyncio.gather(
        asyncio.to_thread(parseURL_async, url_1),
        asyncio.to_thread(parseURL_async, url_2),
    )

asyncio.run(main())

The tasks from asyncio.gather does not work concurrently

Tags:

Python

Python Asyncio

Related

Recent Posts