python requests module and connection reuse

Global functions like requests.get or requests.post create the requests.Session instance on each call. Connections made with these functions cannot be reused, because you cannot access automatically created session and use it's connection pool for subsequent requests. It's fine to use these functions if you have to do just a few requests. Otherwise you'll want to manage sessions yourself.

Here is a quick display of requests behavior when you use global get function and session.

Preparation, not really relevant to the question:

>>> import logging, requests, timeit
>>> logging.basicConfig(level=logging.DEBUG, format="%(message)s")

See, a new connection is established each time you call get:

>>> _ = requests.get("https://www.wikipedia.org")
Starting new HTTPS connection (1): www.wikipedia.org
>>> _ = requests.get("https://www.wikipedia.org")
Starting new HTTPS connection (1): www.wikipedia.org

But if you use the same session for subsequent calls, the connection gets reused:

>>> session = requests.Session()
>>> _ = session.get("https://www.wikipedia.org")
Starting new HTTPS connection (1): www.wikipedia.org
>>> _ = session.get("https://www.wikipedia.org")
>>> _ = session.get("https://www.wikipedia.org")
>>> _ = session.get("https://www.wikipedia.org")

Performance:

>>> timeit.timeit('_ = requests.get("https://www.wikipedia.org")', 'import requests', number=100)
Starting new HTTPS connection (1): www.wikipedia.org
Starting new HTTPS connection (1): www.wikipedia.org
Starting new HTTPS connection (1): www.wikipedia.org
...
Starting new HTTPS connection (1): www.wikipedia.org
Starting new HTTPS connection (1): www.wikipedia.org
Starting new HTTPS connection (1): www.wikipedia.org
52.74904417991638
>>> timeit.timeit('_ = session.get("https://www.wikipedia.org")', 'import requests; session = requests.Session()', number=100)
Starting new HTTPS connection (1): www.wikipedia.org
15.770191192626953

Works much faster when you reuse the session (and thus session's connection pool).


The requests module is stateless and if I repeatedly call get for the same URL, wouldnt it create a new connection each time?

The requests module is not stateless; it just lets you ignore the state and effectively use a global singleton state if you choose to do so.*

And it (or, rather, one of the underlying libraries, urllib3) maintains a connection pool keyed by (hostname, port) pair, so it will usually just magically reuse a connection if it can.

As the documentation says:

Excellent news — thanks to urllib3, keep-alive is 100% automatic within a session! Any requests that you make within a session will automatically reuse the appropriate connection!

Note that connections are only released back to the pool for reuse once all body data has been read; be sure to either set stream to False or read the content property of the Response object.

So, what does "if it can" mean? As the docs above imply, if you're keeping streaming response objects alive, their connections obviously can't be reused.

Also, the connection pool is really a finite cache, not infinite, so if you spam out a ton of connections and two of them are to the same server, you won't always reuse the connection, just often. But usually, that's what you actually want.


* The particular state relevant here is the transport adapter. Each session gets a transport adapter. You can specify the adapter manually, or you can specify a global default, or you can just use the default global default, which basically just wraps up a urllib3.PoolManager for managing its HTTP connections. For more information, read the docs.