Python's requests triggers Cloudflare's security while urllib does not

This really peeked my interests. The requests solution that I was able to get working.

Solution

Finally narrow down the problem. When you use requests it uses urllib3 connection pool. There seems to be some inconsistency between a regular urllib3 connection and a connection pool. A working solution:

import requests
from collections import OrderedDict
from requests import Session
import socket

# grab the address using socket.getaddrinfo
answers = socket.getaddrinfo('grimaldis.myguestaccount.com', 443)
(family, type, proto, canonname, (address, port)) = answers[0]

s = Session()
headers = OrderedDict({
    'Accept-Encoding': 'gzip, deflate, br',
    'Host': "grimaldis.myguestaccount.com",
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:77.0) Gecko/20100101 Firefox/77.0'
})
s.headers = headers
response = s.get(f"https://{address}/guest/accountlogin", headers=headers, verify=False).text
print(response)

Technical Background

So I ran both method through Burp Suite to compare the requests. Below are the raw dumps of the requests

using requests

GET /guest/accountlogin HTTP/1.1
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:77.0) Gecko/20100101 Firefox/77.0
Accept-Encoding: gzip, deflate
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Connection: close
Host: grimaldis.myguestaccount.com
Accept-Language: en-GB,en;q=0.5
Upgrade-Insecure-Requests: 1
dnt: 1


using urllib

GET /guest/accountlogin HTTP/1.1
Host: grimaldis.myguestaccount.com
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:77.0) Gecko/20100101 Firefox/77.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-Language: en-GB,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: close
Upgrade-Insecure-Requests: 1
Dnt: 1


The difference is the ordering of the headers. The difference in the dnt capitalization is not actually the problem.

So I was able to make a successful request with the following raw request:

GET /guest/accountlogin HTTP/1.1
Host: grimaldis.myguestaccount.com
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:77.0) Gecko/20100101 Firefox/77.0


So the Host header has be sent above User-Agent. So if you want to continue to to use requests. Consider using a OrderedDict to ensure the ordering of the headers.


After some debugging, and thanks to the answers of @TuanGeek, we've found out the issue with the requests library seems to come from a DNS issue on requests' part when dealing with cloudflare, a simple fix to this issue is connecting directly to the host IP as such:

import requests
from collections import OrderedDict
from requests import Session
import socket

# grab the address using socket.getaddrinfo
answers = socket.getaddrinfo('grimaldis.myguestaccount.com', 443)
(family, type, proto, canonname, (address, port)) = answers[0]

s = Session()
headers = OrderedDict({
    'Accept-Encoding': 'gzip, deflate, br',
    'Host': "grimaldis.myguestaccount.com",
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:77.0) Gecko/20100101 Firefox/77.0'
})
s.headers = headers
response = s.get(f"https://{address}/guest/accountlogin", headers=headers, verify=False).text
print(response)

Now, this fix didn't work when working with the httplib HTTPX, However I've found where the issue stems from.

The issue comes from the h11 library (used by HTTPX to handle HTTP/1.1 requests), while urllib would automatically fix the letter case of headers, h11 took a different approach by lowercasing every header. While in theory this shouldn't cause any issues, as servers should handle headers in a case-insensitive manner (and in a lot of cases they do), the reality is that HTTP is Hard™️ and services such as Cloudflare don't respect RFC2616 and requires headers to be properly capitalized.

Discussions about capitalization have been going for a while over at h11:

https://github.com/python-hyper/h11/issues/31

And have "recently" started to pop up over on HTTPX's repo as well:

https://github.com/encode/httpx/issues/538

https://github.com/encode/httpx/issues/728

Now the unsatisfactory answer to the issue between Cloudflare and HTTPX is that until something is done over on h11's side (or until Cloudflare miraculously starts respecting RFC2616), not much can be changed to how HTTPX and Cloudflare handle header capitalization.

Either use a different HTTPLIB such as aiohttp or requests-futures, try forking and patching the header capitalization with h11 yourself, or wait and hope for the issue to be dealt with properly by the h11 team.