Get file size using python-requests, while only getting the header

Send a HEAD request:

>>> import requests
>>> response = requests.head('http://example.com')
>>> response.headers
    {'connection': 'close',
 'content-encoding': 'gzip',
 'content-length': '606',
 'content-type': 'text/html; charset=UTF-8',
 'date': 'Fri, 11 Jan 2013 02:32:34 GMT',
 'last-modified': 'Fri, 04 Jan 2013 01:17:22 GMT',
 'server': 'Apache/2.2.3 (CentOS)',
 'vary': 'Accept-Encoding'}

A HEAD request is like a GET request that only downloads the headers. Note that it's up to the server to actually honor your HEAD request. Some servers will only respond to GET requests, so you'll have to send a GET request and just close the connection instead of downloading the body. Other times, the server just never specifies the total size of the file.


use requests.get(url, stream=True).headers['Content-length']

stream=True means when function returns, only the response header is downloaded, response body is not.

Both requests.get and request.head can get you headers but there's an advantage of using get

  1. get is more flexible, if you want to download the response body after inspecting the length, you can start by simply access the content property or using an iterator which will download the content in chunks
  2. "HEAD request SHOULD be identical to the information sent in response to a GET request." but its not always the case.

here is an example of getting the length of a MIT open course video

MitOpenCourseUrl = "http://www.archive.org/download/MIT6.006F11/MIT6_006F11_lec01_300k.mp4"
resHead = requests.head(MitOpenCourseUrl)
resGet = requests.get(MitOpenCourseUrl,stream=True)
resHead.headers['Content-length'] # output 169
resGet.headers['Content-length'] # output 121291539